kylin 集群安装配置

张映 发表于 2019-11-13

分类目录: hadoop/spark/scala

标签:, ,

Apache Kylin不同于“大规模并行处理”(Massive Parallel Processing,MPP)架构的Hive、Presto等,Apache Kylin采用“预计算”的模式,用户只需要提前定义好查询维度,Kylin将帮助我们进行计算,并将结果存储到HBase中,为海量数据的查询和分析提供亚秒级返回,是一种典型的“空间换时间”的解决方案。Apache Kylin的出现不仅很好地解决了海量数据快速查询的问题,也避免了手动开发和维护提前计算程序带来的一系列麻烦。

说的更直白一点就是查询数据不查原始表,查结果表。

一,软件要求

Hadoop: 2.7+, 3.1+ (since v2.5)
Hive: 0.13 - 1.2.1+
HBase: 1.1+, 2.0 (since v2.5)
Spark (可选) 2.3.0+
Kafka (可选) 1.0.0+ (since v2.5)
JDK: 1.8+ (since v2.5)
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+

二,硬件要求

运行 Kylin 的服务器的最低配置为 4 core CPU,16 GB 内存和 100 GB 磁盘。 对于高负载的场景,建议使用 24 core CPU,64 GB 内存或更高的配置。如果达不到是可以运行的。

三,下载kylin

# wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.6.4/apache-kylin-2.6.4-bin-hbase1x.tar.gz
# tar -zxvf apache-kylin-2.6.4-bin-hbase1x.tar.gz
# cp -r apache-kylin-2.6.4-bin /bigdata/kylin

hbase版,是元数据库放在hbase中。

四,设置环境变量

# vim ~/.bashrc
# export KYLIN_HOME=/bigdata/kylin
# export PATH=$KYLIN_HOME/bin:$PATH
# source ~/.bashrc

五,kylin配置

# vim $KYLIN_HOME/conf/kylin.properties

kylin.metadata.url=kylin_metadata@hbase
kylin.server.mode=all
kylin.server.cluster-servers=bigserver1:7070,bigserver2:7070,bigserver3:7070
kylin.job.scheduler.default=2
kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock

服务端只有一个kylin.server.mode=all,客户端kylin.server.mode=query。
配置相同的 kylin.metadata.url 值,即配置所有的 Kylin 节点使用同一个 HBase metastore。
配置 Kylin 节点列表 kylin.server.cluster-servers,包括所有节点(包括当前节点),当事件变化时,接收变化的节点需要通知其他所有节点(包括当前节点)。
kylin.job.scheduler.default,任务数为2,kylin.job.lock启用分布式任务锁。

六,导入测试数据

# ${KYLIN_HOME}/bin/sample.sh
Retrieving hadoop conf dir...
Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data
Going to create sample tables in hive to database DEFAULT by cli
SLF4J: Class path contains multiple SLF4J bindings.

。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。

2019-11-12 17:33:26,991 INFO [close-hbase-conn] hbase.HBaseConnection:136 : Closing HBase connections...
2019-11-12 17:33:26,991 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:2251 : Closing master protocol: MasterService
2019-11-12 17:33:26,998 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:1774 : Closing zookeeper sessionid=0x103755941330014
2019-11-12 17:33:27,017 INFO [close-hbase-conn] zookeeper.ZooKeeper:684 : Session: 0x103755941330014 closed
2019-11-12 17:33:27,017 INFO [main-EventThread] zookeeper.ClientCnxn:519 : EventThread shut down for session: 0x103755941330014
Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect

七,启动kylin集群

[root@bigserver1 kylin]# kylin.sh start
Retrieving hadoop conf dir...
KYLIN_HOME is set to /bigdata/kylin
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
Start to check whether we need to migrate acl tables
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
SLF4J: Class path contains multiple SLF4J bindings.

。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。

A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /bigdata/kylin/logs/kylin.log
Web UI is at http://bigserver1:7070/kylin  

# netstat -tpnl |grep 7070
tcp6 0 0 :::7070 :::* LISTEN 17395/java 

[root@bigserver1 kylin]# jps
18064 NameNode
18145 JournalNode
17490 QuorumPeerMain
17395 RunJar //会多出来一个
17300 HMaster
18244 DFSZKFailoverController
5302 Kafka
2583 JobHistoryServer
20537 Jps
5150 ResourceManager
8462 HRegionServer

启动没报错,端口起来了,RunJar也有就说明启动正常了。

八,nginx配置

# cat kylin.conf
upstream kylin {
    server bigserver1:7070;
    server bigserver2:7070;
    server bigserver3:7070;
    ip_hash;
}  

server {
    listen       80;
    server_name  kylin.xxxx.com;
    index index.html;   

    location / {
        proxy_pass http://kylin;
        proxy_redirect                      off;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    }  

    error_page 404 /404.html;
    location = /40x.html {
    }  

    error_page 500 502 503 504 /50x.html;  

    location = /50x.html {
    }
	access_log /var/log/nginx/kylin.access.log;
	error_log /var/log/nginx/kylin.error.log;

}

重启nginx

九,遇到的问题及解决

1,mr历史记录没开

org.apache.kylin.engine.mr.exception.MapReduceException: Exception: java.net.ConnectException: Call From bigserver1/10.0.40.237 to bigserver1:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
java.net.ConnectException: Call From bigserver1/10.0.40.237 to bigserver1:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

解决办法:

# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

这个报错,不影响kylin的启动,但是会影响,Cubes的运行,会卡住

2,kylin https报错

十一月 13, 2019 10:23:17 上午 org.apache.coyote.AbstractProtocol init
严重: Failed to initialize end point associated with ProtocolHandler ["http-bio-7443"]
java.io.FileNotFoundException: /bigdata/kylin/tomcat/conf/.keystore (没有那个文件或目录)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)

因为没有配置tomcat的https导出这个错误产生。

解决办法:

# vim $KYLIN_HOME/tomcat/conf/server.xml

<Connector port="7070" protocol="HTTP/1.1"
 connectionTimeout="20000"
 redirectPort="7443" //把这行去掉
 compression="on"
 compressionMinSize="2048"
 noCompressionUserAgents="gozilla,traviata"
 compressableMimeType="text/html,text/xml,text/javascript,application/javascript,application/json,text/css,text/plain"
/>

//注释掉以下内容
<!--<Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol"
 maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
 keystoreFile="conf/.keystore" keystorePass="changeit"
 clientAuth="false" sslProtocol="TLS" />-->

这个报错,不会影响启动,也不会影响使用。

kylin 数据来自hive

kylin 数据来自hive



转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2231.html