hive mysql 安装配置

张映 发表于 2019-01-09

分类目录: hadoop/spark/scala

标签:, , ,

hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

hive默认使用derby数据库,了解了一下,测试环境可以用。真正线上还是替换成mysql还是比较靠谱。

一,hive下载

# wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz
# tar zxvf apache-hive-2.3.4-bin.tar.gz
# cp -r apache-hive-2.3.4-bin /bigdata/hive

根据自己需要,选择版本。

二,安装配置mysql

1,安装

下载地址:https://pan.baidu.com/s/11lQYUyIBk0Eaae7ZjehNqg

# rpm -ivh mysql57-community-release-el7-11.noarch.rpm
# yum install mysql-community-server mysql mysql-community-devel

2,获取密码

# grep 'temporary password' /var/log/mysqld.log
2019-01-08T05:55:40.116097Z 1 [Note] A temporary password is generated for root@localhost: r:d+TA2tWG6g

# mysql -u root -p 

mysql> alter user 'root'@'localhost' identified by '*******';

注意:通过update在这儿是改不了密码的

3,创建hive元数据库,分配权限

mysql> create database hive;
mysql> use hive;
mysql> set names 'latin1';
mysql> grant all privileges on hive.* TO hive@'10.%' IDENTIFIED BY '*******';
mysql> flush privileges;

三,配置hive

1,配置环境变量

# vim ~/.bashrc 

export HIVE_HOME=/bigdata/hive
export PATH=$SPARK_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH

关于hadoop,spark,参考前面几遍文章。

2,配置hive-env.sh

# cd /bigdata/hive/conf
# cp hive-env.sh.template hive-env.sh
# vim hive-env.sh
export HADOOP_HOME=/bigdata/hadoop
export HIVE_CONF_DIR=/bigdata/hive/conf

3,创建目录

# hdfs dfs -mkdir -p /user/hive/{warehouse,tmp,log}
# hdfs dfs -chmod g+w /user/hive/{warehouse,tmp,log}

4,配置hive-site.xml

# cp hive-default.xml.template hive-site.xml
# vim hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

 <property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://10.0.0.237:3306/hive?createDatabaseIfNotExist=true</value>
 </property>

 <property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
 </property>

 <property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>hive</value>
 </property>

 <property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>**********</value>
 </property>

 <property>
 <name>hive.exec.scratchdir</name>
 <value>/user/hive/tmp</value>
 </property>

 <property>
 <name>hive.metastore.warehouse.dir</name>
 <value>/user/hive/warehouse</value>
 </property>

 <property>
 <name>hive.querylog.location</name>
 <value>/user/hive/log</value>
 </property>

</configuration>

用mysql是存元数据。

5,安装mysql java连接包

# yum install mysql-connector-java
# ln -s /usr/share/java/mysql-connector-java.jar /bigdata/hive/lib/mysql-connector-java.jar

6,初始化数据库

# schematool -initSchema -dbType mysql

四,启动hive,创建数据库,并进行表操作

# hive   //启动
which: no hbase in (/bigdata/spark/bin:/bigdata/hive/bin:/bigdata/hadoop/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/bigdata/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/bigdata/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/home/bigdata/hive/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in H

hive> show databases;  //显示数据库
OK
default
Time taken: 6.425 seconds, Fetched: 1 row(s)

hive> create database tanktest;  //创建数据库
OK
Time taken: 0.459 seconds

hive> use tanktest;  //切换数据库
OK
Time taken: 0.062 seconds

hive> create table test(id int , name string);  //创建表
OK
Time taken: 0.731 seconds

hive> insert into test values(1,'tank'); //插入数据
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190108184001_badc8fa9-7668-45ee-8cd2-f0a827bd8278
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1546864871469_0001, Tracking URL = http://bigserver1:8088/proxy/application_1546864871469_0001/
Kill Command = /bigdata/hadoop/bin/hadoop job -kill job_1546864871469_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-01-08 18:40:17,744 Stage-1 map = 0%, reduce = 0%
2019-01-08 18:40:33,117 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.29 sec
MapReduce Total cumulative CPU time: 3 seconds 290 msec
Ended Job = job_1546864871469_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://bigserver1:9000/user/hive/warehouse/tanktest.db/test/.hive-staging_hive_2019-01-08_18-40-01_692_6850214984650192052-1/-ext-10000
Loading data to table tanktest.test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.29 sec HDFS Read: 4211 HDFS Write: 76 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 290 msec
OK
Time taken: 33.608 seconds

hive> select * from test;  //查寻
OK
1 tank
2 zhang
3 ying
Time taken: 2.694 seconds, Fetched: 3 row(s)

hive> update test set name='tank1' where id=1;  //更新报错
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

hive2.3.4版本是不支持更新和删除操作的,不确定高版本是不是支持。hive虽然能进行sql操作,但不是关系型数据库,观念要转变一下。

hive 数据库名存储到元数据

hive 数据库名存储到元数据

hive 表存储到元数据库

hive 表存储到元数据库

hive 数据在hdfs存储位置

hive 数据在hdfs存储位置

hive 核心运算还是 mr

hive 核心运算还是 mr

五,hive集群配置

1,从主节点同步代码到数据节点

# scp -r hive root@bigserver2:/bigdata/
# scp -r hive root@bigserver3:/bigdata/

2,配置hive-site.xml

# cat /bigdata/hive/conf/hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
 <name>hive.metastore.uris</name>
 <value>thrift://bigserver1:9083</value>
 </property>

 <property>
 <name>hive.exec.scratchdir</name>
 <value>/user/hive/tmp</value>
 </property>

 <property>
 <name>hive.metastore.warehouse.dir</name>
 <value>/user/hive/warehouse</value>
 </property>

 <property>
 <name>hive.querylog.location</name>
 <value>/user/hive/log</value>
 </property>
</configuration>

hive-site.xml主节点和数据节点是不一样的,其他一样

3,启动hive服务端

# hive --service metastore

[root@bigserver1 hive]# jps
7192 NameNode
19481 RunJar  //会多出
19562 Jps
7391 SecondaryNameNode
7551 ResourceManager

4,在客户端启动

# hive


转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2031.html