hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
hive默认使用derby数据库,了解了一下,测试环境可以用。真正线上还是替换成mysql还是比较靠谱。
一,hive下载
# wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz # tar zxvf apache-hive-2.3.4-bin.tar.gz # cp -r apache-hive-2.3.4-bin /bigdata/hive
根据自己需要,选择版本。
二,安装配置mysql
1,安装
下载地址:https://pan.baidu.com/s/11lQYUyIBk0Eaae7ZjehNqg
# rpm -ivh mysql57-community-release-el7-11.noarch.rpm # yum install mysql-community-server mysql mysql-community-devel
2,获取密码
# grep 'temporary password' /var/log/mysqld.log 2019-01-08T05:55:40.116097Z 1 [Note] A temporary password is generated for root@localhost: r:d+TA2tWG6g # mysql -u root -p mysql> alter user 'root'@'localhost' identified by '*******';
注意:通过update在这儿是改不了密码的
3,创建hive元数据库,分配权限
mysql> create database hive; mysql> use hive; mysql> set names 'latin1'; mysql> grant all privileges on hive.* TO hive@'10.%' IDENTIFIED BY '*******'; mysql> flush privileges;
三,配置hive
1,配置环境变量
# vim ~/.bashrc export HIVE_HOME=/bigdata/hive export PATH=$SPARK_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
关于hadoop,spark,参考前面几遍文章。
2,配置hive-env.sh
# cd /bigdata/hive/conf # cp hive-env.sh.template hive-env.sh # vim hive-env.sh export HADOOP_HOME=/bigdata/hadoop export HIVE_CONF_DIR=/bigdata/hive/conf
3,创建目录
# hdfs dfs -mkdir -p /user/hive/{warehouse,tmp,log} # hdfs dfs -chmod g+w /user/hive/{warehouse,tmp,log}
4,配置hive-site.xml
# cp hive-default.xml.template hive-site.xml # vim hive-site.xml <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://10.0.0.237:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>**********</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/user/hive/tmp</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.querylog.location</name> <value>/user/hive/log</value> </property> </configuration>
用mysql是存元数据。
5,安装mysql java连接包
# yum install mysql-connector-java # ln -s /usr/share/java/mysql-connector-java.jar /bigdata/hive/lib/mysql-connector-java.jar
6,初始化数据库
# schematool -initSchema -dbType mysql
四,启动hive,创建数据库,并进行表操作
# hive //启动 which: no hbase in (/bigdata/spark/bin:/bigdata/hive/bin:/bigdata/hadoop/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/bigdata/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/bigdata/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/home/bigdata/hive/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in H hive> show databases; //显示数据库 OK default Time taken: 6.425 seconds, Fetched: 1 row(s) hive> create database tanktest; //创建数据库 OK Time taken: 0.459 seconds hive> use tanktest; //切换数据库 OK Time taken: 0.062 seconds hive> create table test(id int , name string); //创建表 OK Time taken: 0.731 seconds hive> insert into test values(1,'tank'); //插入数据 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20190108184001_badc8fa9-7668-45ee-8cd2-f0a827bd8278 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1546864871469_0001, Tracking URL = http://bigserver1:8088/proxy/application_1546864871469_0001/ Kill Command = /bigdata/hadoop/bin/hadoop job -kill job_1546864871469_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2019-01-08 18:40:17,744 Stage-1 map = 0%, reduce = 0% 2019-01-08 18:40:33,117 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.29 sec MapReduce Total cumulative CPU time: 3 seconds 290 msec Ended Job = job_1546864871469_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://bigserver1:9000/user/hive/warehouse/tanktest.db/test/.hive-staging_hive_2019-01-08_18-40-01_692_6850214984650192052-1/-ext-10000 Loading data to table tanktest.test MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 3.29 sec HDFS Read: 4211 HDFS Write: 76 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 290 msec OK Time taken: 33.608 seconds hive> select * from test; //查寻 OK 1 tank 2 zhang 3 ying Time taken: 2.694 seconds, Fetched: 3 row(s) hive> update test set name='tank1' where id=1; //更新报错 FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive2.3.4版本是不支持更新和删除操作的,不确定高版本是不是支持。hive虽然能进行sql操作,但不是关系型数据库,观念要转变一下。
五,hive集群配置
1,从主节点同步代码到数据节点
# scp -r hive root@bigserver2:/bigdata/ # scp -r hive root@bigserver3:/bigdata/
2,配置hive-site.xml
# cat /bigdata/hive/conf/hive-site.xml <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://bigserver1:9083</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/user/hive/tmp</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.querylog.location</name> <value>/user/hive/log</value> </property> </configuration>
hive-site.xml主节点和数据节点是不一样的,其他一样
3,启动hive服务端
# hive --service metastore [root@bigserver1 hive]# jps 7192 NameNode 19481 RunJar //会多出 19562 Jps 7391 SecondaryNameNode 7551 ResourceManager
4,在客户端启动
# hive
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2031.html