zookeeper hadoop 集群 安装配置

张映 发表于 2019-01-25

分类目录: hadoop/spark/scala

标签:,

namenode是hadoop的核心,如果namenode宕机,整个hadoop都会挂掉。

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。

ZooKeeper的目标就是封装好复杂易出错的关键服务,将简单易用的接口和性能高效、功能稳定的系统提供给用户。

一,服务器组成

bigserver1    namenode zookeeper  journalnode

bigserver2  datanode zookeeper  journalnode

bigserver3  datanode zookeeper

testing           namenode zookeeper  journalnode

二,zookeeper 安装配置

1,下载zookeeper

http://mirrors.hust.edu.cn/apache/zookeeper/stable/

2,创建目录

# mkdir -p /bigdata/zookeeper/{data,logs}

3,配置zookeeper

# cp zoo_sample.cfg zoo.cfg

# cat zoo.cfg
ticketTime=2000
clientPort=2181
dataDir=/bigdata/zookeeper/data
dataLogDir=/bigdata/zookeeper/logs

initLimit=10
syncLimit=5
server.1=bigserver1:2888:3888
server.2=bigserver2:2888:3888
server.3=testing:2888:3888

# echo  1 > /bigdata/zookeeper/data/myid

不同的zookeeper上,myid编号是不一样的。在zookeeper的机器上分别配置。

二,配置hadoop

1,core-site.xml

<configuration>
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://bigdata1/</value>
 </property>

 <property>
 <name>hadoop.tmp.dir</name>
 <value>/bigdata/hadoop/tmp</value>
 </property>

 <property>
 <name>ha.zookeeper.quorum</name>
 <value>bigserver1:2181,bigserver2:2181,testing:2181</value>
 </property>

 <property>
 <name>ha.zookeeper.session-timeout.ms</name>
 <value>1000</value>
 <description>ms</description>
 </property>
</configuration>

2,mapred-site.xml

<configuration>
 <property>
 <name>mapreduce.jobhistory.address</name>
 <value>bigserver1:10020</value>
 </property>

 <property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>bigserver1:19888</value>
 </property>
 <property>
 <name>mapred.local.dir</name>
 <value>/bigdata/hadoop/var</value>
 </property>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>

3,hdfs-site.xml

<configuration>

 <property>
 <name>dfs.replication</name>
 <value>2</value>
 </property>

 <property>
 <name>dfs.namenode.name.dir</name>
 <value>/bigdata/hadoop/dfs/name</value>
 </property>

 <property>
 <name>dfs.datanode.data.dir</name>
 <value>/bigdata/hadoop/dfs/data</value>
 </property>

 <property>
 <name>dfs.webhdfs.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>dfs.nameservices</name>
 <value>bigdata1</value>
 </property>

 <property>
 <name>dfs.ha.namenodes.bigdata1</name>
 <value>nn1,nn2</value>
 </property>

 <property>
 <name>dfs.namenode.rpc-address.bigdata1.nn1</name>
 <value>bigserver1:9000</value>
 </property>

 <property>
 <name>dfs.namenode.http-address.bigdata1.nn1</name>
 <value>bigserver1:50070</value>
 </property>

 <property>
 <name>dfs.namenode.rpc-address.bigdata1.nn2</name>
 <value>testing:9000</value>
 </property>

 <property>
 <name>dfs.namenode.http-address.bigdata1.nn2</name>
 <value>testing:50070</value>
 </property>

 <property>
 <name>dfs.namenode.shared.edits.dir</name>
 <value>qjournal://bigserver1:8485;bigserver2:8485;testing:8485/bigdata1</value>
 </property>

 <property>
 <name>dfs.journalnode.edits.dir</name>
 <value>/bigdata/hadoop/dfs/journal</value>
 </property>

 <property>
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>dfs.client.failover.proxy.provider.bigdata1</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>

 <property>
 <name>dfs.ha.fencing.methods</name>
 <value>
 sshfence
 shell(/bin/true)
 </value>
 </property>

 <property>
 <name>dfs.ha.fencing.ssh.private-key-files</name>
 <value>/root/.ssh/id_rsa</value>
 </property>

 <property>
 <name>dfs.ha.fencing.ssh.connect-timeout</name>
 <value>30000</value>
 </property>

 <property>
 <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
 <value>60000</value>
 </property>

</configuration>

4,yarn-site.xml

<configuration>

 <property>
 <name>yarn.resourcemanager.ha.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>yarn.resourcemanager.cluster-id</name>
 <value>yarn-ha</value>
 </property>

 <property>
 <name>yarn.resourcemanager.ha.rm-ids</name>
 <value>rm1,rm2</value>
 </property>

 <property>
 <name>yarn.resourcemanager.hostname.rm1</name>
 <value>bigserver1</value>
 </property>

 <property>
 <name>yarn.resourcemanager.hostname.rm2</name>
 <value>testing</value>
 </property>

 <property>
 <name>yarn.resourcemanager.zk-address</name>
 <value>bigserver1:2181,bigserver2:2181,testing:2181</value>
 </property>

 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>

 <property>
 <name>yarn.log-aggregation-enable</name>
 <value>true</value>
 </property>

 <property>
 <name>yarn.log-aggregation.retain-seconds</name>
 <value>86400</value>
 </property>

 <property>
 <name>yarn.resourcemanager.webapp.address.rm1</name>
 <value>bigserver1:8088</value>
 </property>

 <property>
 <name>yarn.resourcemanager.webapp.address.rm2</name>
 <value>testing:8088</value>
 </property>

 <property>
 <name>yarn.resourcemanager.recovery.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>yarn.resourcemanager.store.class</name>
 <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
 </property>

<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>

<property>
 <name>yarn.nodemanager.pmem-check-enabled</name>
 <value>false</value>
</property>

<property>
 <name>yarn.nodemanager.vmem-check-enabled</name>
 <value>false</value>
</property>

<property>
 <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
 <value>true</value>
</property>

<property>
 <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
 <value>true</value>
</property>

<property>
 <name>yarn.resourcemanager.scheduler.class</name>
 <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
 <name>yarn.resourcemanager.recovery.enabled</name>
 <value>true</value>
</property>
</configuration>

5,设置zookeeper环境

# vim ~/.bashrc

export ZOOKEEPER_HOME=/bigdata/zookeeper
export PATH=$ZOOKEEPER_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin:/bigdata/hadoop/bin:$PATH

# source /.bashrc

以上配置,所有节点一样

三,启动集群

1,启动zookeeper

# cd /bigdata/zookeeper/bin
# ./zkServer.sh start

2,namenode bigserver1上面初始化

# hdfs zkfc -formatZK
# hdfs namenode -initializeSharedEdits

# cd /bigdata/hadoop/sbin/
# ./start-all.sh

hdfs已经有数据,所以我并没有hadoop namenode -format,hadoop namenode -format,一定不要轻易去操作,后果很严重,很麻烦

如果报:

10.0.0.237:8485: Journal Storage Directory /bigdata/hadoop/dfs/journal/bigdata1 not formatted

解决办法:

hdfs zkfc -formatZK

3,在namenode testing 上面初始化

# hdfs zkfc -formatZK
# hdfs namenode -bootstrapStandby

# cd /bigdata/hadoop/sbin/
# ./start-all.sh

formatZK,在二个namenode节点操作就行了。

hdfs namenode -bootstrapStandby同步数据,在新增的namenode操作就好。

4,启动后的结果

[root@bigserver1 sbin]# jps
26003 Jps
15238 DFSZKFailoverController
6264 JournalNode
5562 SecondaryNameNode
13916 ResourceManager
17869 NameNode
13326 QuorumPeerMain

[root@bigserver2 ~]# jps
7283 NodeManager
7435 QuorumPeerMain
7182 JournalNode
7071 DataNode
19551 Jps

[root@bigserver3 ~]# jps
23553 Jps
14488 NodeManager
14362 DataNode
11660 QuorumPeerMain

[root@testing ~]# jps
15362 JournalNode
16291 ResourceManager
15620 QuorumPeerMain
16181 DFSZKFailoverController
25081 Jps
22239 NameNode

# hdfs haadmin -getServiceState nn1
active
# hdfs haadmin -getServiceState nn2
standby

在配置过程中,遇到了很多问题,很急着解决问题,很多错误并没有记录下来。

五,测试zookeeper ha

1,在active的namenode机器

# ./hadoop-daemon.sh stop namenode

看一下另外一台namenode的状态,会不会由standby变成active

2,在启动了8031的端口的namenode

# ./yarn-daemon.sh stop resourcemanager

到另外一台机器看一下,8031端口会不会飘过去



转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2046.html