cdh在国内用的比较多。不管是cloudera+cdh或者是ambari+hdp,建议初学者不要用,还是从原生的开始。网上说原生的不稳定,配置复杂。
我用的hadoop2.7.7稳定,以及以hadoop2.7.7为基础构建的生态圈很稳定,在线率100%,到目前为止还没有出现过突发故障。
以hadoop为基础构建一套生态圈,是很复杂。但是拆分开了,采取蚂蚁搬家的方式,就简单多了。并且能了解各组件间是怎么协同工作的。
一,服务器说明
- 10.0.40.237 bigserver1 //namenode
- 10.0.40.222 bigserver2 //datanode
- 10.0.40.193 bigserver3 //datanode
- 10.0.40.200 bigserver4 //SecondaryNameNode
cdh版,根以前独立版的hadoop服务器(测试服务器)有一台机器不一样。175换成了200
二,设置hostname,所有节点
- # hostname bigserver1
- # vim /etc/hosts //添加以下内容,所有节点都一样
- 10.0.40.237 bigserver1
- 10.0.40.222 bigserver2
- 10.0.40.193 bigserver3
- 10.0.40.200 bigserver4
三,关闭防火墙和selinux,所有节点
- # systemctl stop firewalld //停止
- # systemctl disable firewalld //取消启动
- # cat /etc/sysconfig/selinux
- SELINUX=disabled //关闭
操作完,最好是所有节点重启一下
四,ssh免密码登录,所有节点
- # ssh-keygen -t rsa
- # ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.0.40.222 -p 22
- # ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.0.40.193 -p 22
- # ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.0.40.200 -p 22
- # scp ~/.ssh/id_rsa root@10.0.40.222:/root/.ssh/
- # scp ~/.ssh/id_rsa root@10.0.40.193:/root/.ssh/
- # scp ~/.ssh/id_rsa root@10.0.40.200:/root/.ssh/
- 登录到222,193,175后
- # cd ~/.ssh/
- # chmod 600 id_rsa
注意:各主机间先用私钥登录一下,这样known_hosts才会有值。
五,配置时间服务器,所有节点
请参考:centos7 hadoop2.7.7 hbase1.4安装配置详解
六,安装jdk,所有节点
- # yum install epel-release //这个库,包很多
- # yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
建议使用oracle jdk,我没用是因为以前装独立hadoop就用的openjdk。
Cloudera Enterprise Version | Supported Oracle JDK | Supported OpenJDK |
---|---|---|
5.3 -5.15 | 1.7, 1.8 | none |
5.16 and higher 5.x releases | 1.7, 1.8 | 1.8 |
6.0 | 1.8 | none |
6.1 | 1.8 | 1.8 |
6.2 | 1.8 | 1.8 |
6.3 | 1.8 | 1.8, 11.0.3 or higher |
七,cdh6.3.1组件介绍
Component | Component Version |
---|---|
Apache Avro | 1.8.2 |
Apache Flume | 1.9.0 |
Apache Hadoop | 3.0.0 |
Apache HBase | 2.1.4 |
HBase Indexer | 1.5 |
Apache Hive | 2.1.1 |
Hue | 4.3.0 |
Apache Impala | 3.2.0 |
Apache Kafka | 2.2.1 |
Kite SDK | 1.0.0 |
Apache Kudu | 1.10.0 |
Apache Solr | 7.4.0 |
Apache Oozie | 5.1.0 |
Apache Parquet | 1.9.0 |
Parquet-format | 2.4.0 |
Apache Pig | 0.17.0 |
Apache Sentry | 2.1.0 |
Apache Spark | 2.4.0 |
Apache Sqoop | 1.4.7 |
Apache ZooKeeper | 3.4.5 |
八,配置cloudera离线库,并安装cloudera-manager,bigserver1节点
1,安装nginx和createrepo
- # yum install nginx createrepo
- # mkdir -p /var/www/html/cloudera-repos //cloudera rpm路径
- # vim /etc/nginx/conf.d/cloudera.conf //nginx配置
- server
- {
- listen 80;
- server_name bigserver1;
- root /var/www/html;
- autoindex on;
- autoindex_exact_size off;
- autoindex_localtime on;
- charset utf-8;
- }
- # systemctl restart nginx //重启nginx
2,下载cloudera-manager
https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/RPMS/x86_64/
allkeys.asc
cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
cloudera-manager-server-db-2-6.3.1-1466458.el7.x86_64.rpm
oracle-j2sdk1.8-1.8.0+update181-1.x86_64.rpm
下载oracle-j2sdk1.8-1.8.0,其实是为了java版本的统一。并且可以直接yum安装。
allkeys.asc不要忘下,不然后面会报错。报错信息忘下来了。
3,创建本地离线cloudera库
- # mv *.rpm /var/www/html/cloudera-repos //rpm包移动到仓库路径下
- # mv allkeys.asc /var/www/html/cloudera-repos
- # cd /var/www/html/cloudera-repos && createrepo . //创建repo,注意有个点
- Spawning worker 0 with 2 pkgs
- Spawning worker 1 with 2 pkgs
- Workers Finished
- Saving Primary metadata
- Saving file lists metadata
- Saving other metadata
- Generating sqlite DBs
- Sqlite DBs complete
- # vim /etc/yum.repos.d/cloudera-manager.repo
- [cloudera-manager]
- name=Cloudera Manager 6.3.1
- baseurl=http://bigserver1/cloudera-repos/
- gpgcheck=0
- enabled=1
不用copy到cloudera-manager.repo到其他节点,其他节点不用安装cloudera-manager-agent,图形界面的时候,会自动copy,并安装。
不用copy到cloudera-manager.repo到其他节点,其他节点不用安装cloudera-manager-agent,图形界面的时候,会自动copy,并安装。
不用copy到cloudera-manager.repo到其他节点,其他节点不用安装cloudera-manager-agent,图形界面的时候,会自动copy,并安装。
4,安装cloudera-manager
- # yum clean all && yum makecache
- # yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
注意:如果以前单独装cdh中的组件要删除掉,确认一个纯净的安装。后面环境检查,也不会有问题
- # rpm -qa |grep impala //查找所有implala,然后删除
- # rpm -e impala-2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el7.x86_64 --nodeps
- # rpm -e impala-udf-devel-2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el7.x86_64 --nodeps
- # rpm -e impala-catalog-2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el7.x86_64 --nodeps
- # rpm -e impala-state-store-2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el7.x86_64 --nodeps
- # rpm -e impala-shell-2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el7.x86_64 --nodeps
九,配置离线parcel库,bigserver1节点
1,下载parcels
https://archive.cloudera.com/cdh6/6.3.1/parcels/
CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1
manifest.json
CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1要重命名为CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha
2,建立parcel-repo离线库
- # chown cloudera-scm:cloudera-scm -R /opt/cloudera/parcel-repo
- # ll
- 总用量 2035080
- -rw-r--r-- 1 cloudera-scm cloudera-scm 2083878000 11月 25 12:34 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
- -rw-r--r-- 1 cloudera-scm cloudera-scm 40 10月 11 16:45 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha
- -rw-r--r-- 1 cloudera-scm cloudera-scm 33887 10月 11 16:44 manifest.json
注意:/opt/cloudera/parcel-repo,这个目录,不是手动创建的。安装完cloudera-manager-server,就会有。图形界面安装时,会自动同步给其他节点。
十,安装mysql5.7
1,安装mysql,bigserver1节点
my.cnf官方推荐配置
- # vim /etc/my.cnf
- [mysqld]
- datadir=/var/lib/mysql
- socket=/var/lib/mysql/mysql.sock
- transaction-isolation = READ-COMMITTED
- # Disabling symbolic-links is recommended to prevent assorted security risks;
- # to do so, uncomment this line:
- symbolic-links = 0
- key_buffer_size = 32M
- max_allowed_packet = 32M
- thread_stack = 256K
- thread_cache_size = 64
- query_cache_limit = 8M
- query_cache_size = 64M
- query_cache_type = 1
- max_connections = 550
- #expire_logs_days = 10
- #max_binlog_size = 100M
- #log_bin should be on a disk with enough free space.
- #Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your
- #system and chown the specified folder to the mysql user.
- log_bin=/var/lib/mysql/mysql_binary_log
- #In later versions of MySQL, if you enable the binary log and do not set
- #a server_id, MySQL will not start. The server_id must be unique within
- #the replicating group.
- server_id=1
- binlog_format = mixed
- read_buffer_size = 2M
- read_rnd_buffer_size = 16M
- sort_buffer_size = 8M
- join_buffer_size = 8M
- # InnoDB settings
- innodb_file_per_table = 1
- innodb_flush_log_at_trx_commit = 2
- innodb_log_buffer_size = 64M
- innodb_buffer_pool_size = 4G
- innodb_thread_concurrency = 8
- innodb_flush_method = O_DIRECT
- innodb_log_file_size = 512M
- [mysqld_safe]
- log-error=/var/log/mysqld.log
- pid-file=/var/run/mysqld/mysqld.pid
- sql_mode=STRICT_ALL_TABLES
2,安装mysql-connector-java,所有节点
- # wget https://cdn.mysql.com//archives/mysql-connector-java-5.1/mysql-connector-java-5.1.47.tar.gz
- # tar zxvf mysql-connector-java-5.1.47.tar.gz
- # cd mysql-connector-java-5.1.47
- # cp mysql-connector-java-5.1.47-bin.jar /usr/share/java/mysql-connector-java.jar
安装独立hadoop的时候就是用yum安装mysql-connector-java没有问题。但是用了cdh6以后就不行了,yum装的版本旧了。centos7.4安装的是mysql-connector-java-5.1.25-3。后面启动hive.metastore的时候,会报以下错误:
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to retrieve schema tables from Hive Metastore DB,Not supported
3,创建数据库,并分配权限,bigserver1节点
- CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
- grant all privileges on *.* TO cdh6@'10.%' IDENTIFIED BY 'Cdh6_123';
- flush privileges;
以下是组件与库名对照表:
组件名 | 数据库名 |
---|---|
Cloudera Manager Server | scm |
Activity Monitor | amon |
Reports Manager | rman |
Hue | hue |
Hive Metastore Server | metastore |
Sentry Server | sentry |
Cloudera Navigator Audit Server | nav |
Cloudera Navigator Metadata Server | navms |
Oozie | oozie |
4,初始化数据库,在bigserver1节点
- # /opt/cloudera/cm/schema/scm_prepare_database.sh -h bigserver1 mysql scm cdh6 Cdh6_123
- JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64
- Verifying that we can write to /etc/cloudera-scm-server
- Creating SCM configuration file in /etc/cloudera-scm-server
- Executing: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
- [ main] DbCommandExecutor INFO Successfully connected to database.
- All done, your SCM database is configured correctly!
5,启动cloudera-scm-server,在bigserver1节点
- # systemctl start cloudera-scm-server
- # tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
- # netstat -tpnl |grep 7180
- tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 28344/java
注意:cloudera-scm-server启动的时间挺长,用tail查看server日志,如果7180端口起来了。说明cloudera-scm-server,启动好了。
十一,hadoop集群安装
1,登录
http://bigserver1:7180/cmf/login,默认用户名密码都是admin
2,图解
主机运行状况不良,解决办法:所有节点都要操作
- # cd /var/lib/cloudera-scm-agent/
- # rm -f cm_guid
- # service cloudera-scm-agent restart
解决办法:所有节点
- # echo never > /sys/kernel/mm/transparent_hugepage/defrag
- # echo never > /sys/kernel/mm/transparent_hugepage/enabled
- # echo vm.swappiness=10 >> /etc/sysctl.conf
- # sysctl -p
在这里要注意,什么机器装什么服务,根自己的实际情况来。
在这里要注意,solr的路径不要改动,不然会报错。
在这里要注意,solr的路径不要改动,不然会报错。
在这里要注意,solr的路径不要改动,不然会报错。
到这儿就安装完成了,但是紧紧是安装完成,后面会陆续出一些使用情况和问题。
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2249.html
【九,配置离线parcel库,bigserver1节点->2,建立parcel-repo离线库】
"chown cloudera-scm:cloudera-scm -R /opt/cloudera/parcel-repo" 这一步,cloudera-scm用户怎么创建的?
忽略上一条,检查了一下history,中间去吃了个饭,忘记执行 yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
老哥,能不能给一份allkeys.asc呀,现在官网没法下载了。谢谢啊