namenode是hadoop的核心,如果namenode宕机,整个hadoop都会挂掉。
ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。
ZooKeeper的目标就是封装好复杂易出错的关键服务,将简单易用的接口和性能高效、功能稳定的系统提供给用户。
一,服务器组成
bigserver1 namenode zookeeper journalnode
bigserver2 datanode zookeeper journalnode
bigserver3 datanode zookeeper
testing namenode zookeeper journalnode
二,zookeeper 安装配置
1,下载zookeeper
http://mirrors.hust.edu.cn/apache/zookeeper/stable/
2,创建目录
# mkdir -p /bigdata/zookeeper/{data,logs}
3,配置zookeeper
# cp zoo_sample.cfg zoo.cfg # cat zoo.cfg ticketTime=2000 clientPort=2181 dataDir=/bigdata/zookeeper/data dataLogDir=/bigdata/zookeeper/logs initLimit=10 syncLimit=5 server.1=bigserver1:2888:3888 server.2=bigserver2:2888:3888 server.3=testing:2888:3888 # echo 1 > /bigdata/zookeeper/data/myid
不同的zookeeper上,myid编号是不一样的。在zookeeper的机器上分别配置。
二,配置hadoop
1,core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata1/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/bigdata/hadoop/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>bigserver1:2181,bigserver2:2181,testing:2181</value> </property> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>1000</value> <description>ms</description> </property> </configuration>
2,mapred-site.xml
<configuration> <property> <name>mapreduce.jobhistory.address</name> <value>bigserver1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>bigserver1:19888</value> </property> <property> <name>mapred.local.dir</name> <value>/bigdata/hadoop/var</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
3,hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/bigdata/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/bigdata/hadoop/dfs/data</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.nameservices</name> <value>bigdata1</value> </property> <property> <name>dfs.ha.namenodes.bigdata1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.bigdata1.nn1</name> <value>bigserver1:9000</value> </property> <property> <name>dfs.namenode.http-address.bigdata1.nn1</name> <value>bigserver1:50070</value> </property> <property> <name>dfs.namenode.rpc-address.bigdata1.nn2</name> <value>testing:9000</value> </property> <property> <name>dfs.namenode.http-address.bigdata1.nn2</name> <value>testing:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://bigserver1:8485;bigserver2:8485;testing:8485/bigdata1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/bigdata/hadoop/dfs/journal</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.client.failover.proxy.provider.bigdata1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> <value>60000</value> </property> </configuration>
4,yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-ha</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>bigserver1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>testing</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>bigserver1:2181,bigserver2:2181,testing:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>bigserver1:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>testing:8088</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> </configuration>
5,设置zookeeper环境
# vim ~/.bashrc export ZOOKEEPER_HOME=/bigdata/zookeeper export PATH=$ZOOKEEPER_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin:/bigdata/hadoop/bin:$PATH # source /.bashrc
以上配置,所有节点一样
三,启动集群
1,启动zookeeper
# cd /bigdata/zookeeper/bin # ./zkServer.sh start
2,namenode bigserver1上面初始化
# hdfs zkfc -formatZK # hdfs namenode -initializeSharedEdits # cd /bigdata/hadoop/sbin/ # ./start-all.sh
hdfs已经有数据,所以我并没有hadoop namenode -format,hadoop namenode -format,一定不要轻易去操作,后果很严重,很麻烦
如果报:
10.0.0.237:8485: Journal Storage Directory /bigdata/hadoop/dfs/journal/bigdata1 not formatted
解决办法:
hdfs zkfc -formatZK
3,在namenode testing 上面初始化
# hdfs zkfc -formatZK # hdfs namenode -bootstrapStandby # cd /bigdata/hadoop/sbin/ # ./start-all.sh
formatZK,在二个namenode节点操作就行了。
hdfs namenode -bootstrapStandby同步数据,在新增的namenode操作就好。
4,启动后的结果
[root@bigserver1 sbin]# jps 26003 Jps 15238 DFSZKFailoverController 6264 JournalNode 5562 SecondaryNameNode 13916 ResourceManager 17869 NameNode 13326 QuorumPeerMain [root@bigserver2 ~]# jps 7283 NodeManager 7435 QuorumPeerMain 7182 JournalNode 7071 DataNode 19551 Jps [root@bigserver3 ~]# jps 23553 Jps 14488 NodeManager 14362 DataNode 11660 QuorumPeerMain [root@testing ~]# jps 15362 JournalNode 16291 ResourceManager 15620 QuorumPeerMain 16181 DFSZKFailoverController 25081 Jps 22239 NameNode # hdfs haadmin -getServiceState nn1 active # hdfs haadmin -getServiceState nn2 standby
在配置过程中,遇到了很多问题,很急着解决问题,很多错误并没有记录下来。
五,测试zookeeper ha
1,在active的namenode机器
# ./hadoop-daemon.sh stop namenode
看一下另外一台namenode的状态,会不会由standby变成active
2,在启动了8031的端口的namenode
# ./yarn-daemon.sh stop resourcemanager
到另外一台机器看一下,8031端口会不会飘过去
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2046.html