Apache Kylin不同于“大规模并行处理”(Massive Parallel Processing,MPP)架构的Hive、Presto等,Apache Kylin采用“预计算”的模式,用户只需要提前定义好查询维度,Kylin将帮助我们进行计算,并将结果存储到HBase中,为海量数据的查询和分析提供亚秒级返回,是一种典型的“空间换时间”的解决方案。Apache Kylin的出现不仅很好地解决了海量数据快速查询的问题,也避免了手动开发和维护提前计算程序带来的一系列麻烦。
说的更直白一点就是查询数据不查原始表,查结果表。
一,软件要求
Hadoop: 2.7+, 3.1+ (since v2.5)
Hive: 0.13 - 1.2.1+
HBase: 1.1+, 2.0 (since v2.5)
Spark (可选) 2.3.0+
Kafka (可选) 1.0.0+ (since v2.5)
JDK: 1.8+ (since v2.5)
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
二,硬件要求
运行 Kylin 的服务器的最低配置为 4 core CPU,16 GB 内存和 100 GB 磁盘。 对于高负载的场景,建议使用 24 core CPU,64 GB 内存或更高的配置。如果达不到是可以运行的。
三,下载kylin
- # wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.6.4/apache-kylin-2.6.4-bin-hbase1x.tar.gz
- # tar -zxvf apache-kylin-2.6.4-bin-hbase1x.tar.gz
- # cp -r apache-kylin-2.6.4-bin /bigdata/kylin
hbase版,是元数据库放在hbase中。
四,设置环境变量
- # vim ~/.bashrc
- # export KYLIN_HOME=/bigdata/kylin
- # export PATH=$KYLIN_HOME/bin:$PATH
- # source ~/.bashrc
五,kylin配置
- # vim $KYLIN_HOME/conf/kylin.properties
- kylin.metadata.url=kylin_metadata@hbase
- kylin.server.mode=all
- kylin.server.cluster-servers=bigserver1:7070,bigserver2:7070,bigserver3:7070
- kylin.job.scheduler.default=2
- kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
服务端只有一个kylin.server.mode=all,客户端kylin.server.mode=query。
配置相同的 kylin.metadata.url 值,即配置所有的 Kylin 节点使用同一个 HBase metastore。
配置 Kylin 节点列表 kylin.server.cluster-servers,包括所有节点(包括当前节点),当事件变化时,接收变化的节点需要通知其他所有节点(包括当前节点)。
kylin.job.scheduler.default,任务数为2,kylin.job.lock启用分布式任务锁。
六,导入测试数据
- # ${KYLIN_HOME}/bin/sample.sh
- Retrieving hadoop conf dir...
- Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data
- Going to create sample tables in hive to database DEFAULT by cli
- SLF4J: Class path contains multiple SLF4J bindings.
- 。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。
- 2019-11-12 17:33:26,991 INFO [close-hbase-conn] hbase.HBaseConnection:136 : Closing HBase connections...
- 2019-11-12 17:33:26,991 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:2251 : Closing master protocol: MasterService
- 2019-11-12 17:33:26,998 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:1774 : Closing zookeeper sessionid=0x103755941330014
- 2019-11-12 17:33:27,017 INFO [close-hbase-conn] zookeeper.ZooKeeper:684 : Session: 0x103755941330014 closed
- 2019-11-12 17:33:27,017 INFO [main-EventThread] zookeeper.ClientCnxn:519 : EventThread shut down for session: 0x103755941330014
- Sample cube is created successfully in project 'learn_kylin'.
- Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect
七,启动kylin集群
- [root@bigserver1 kylin]# kylin.sh start
- Retrieving hadoop conf dir...
- KYLIN_HOME is set to /bigdata/kylin
- Retrieving hive dependency...
- Retrieving hbase dependency...
- Retrieving hadoop conf dir...
- Retrieving kafka dependency...
- Retrieving Spark dependency...
- Start to check whether we need to migrate acl tables
- Retrieving hive dependency...
- Retrieving hbase dependency...
- Retrieving hadoop conf dir...
- Retrieving kafka dependency...
- Retrieving Spark dependency...
- SLF4J: Class path contains multiple SLF4J bindings.
- 。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。
- A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
- Check the log at /bigdata/kylin/logs/kylin.log
- Web UI is at http://bigserver1:7070/kylin
- # netstat -tpnl |grep 7070
- tcp6 0 0 :::7070 :::* LISTEN 17395/java
- [root@bigserver1 kylin]# jps
- 18064 NameNode
- 18145 JournalNode
- 17490 QuorumPeerMain
- 17395 RunJar //会多出来一个
- 17300 HMaster
- 18244 DFSZKFailoverController
- 5302 Kafka
- 2583 JobHistoryServer
- 20537 Jps
- 5150 ResourceManager
- 8462 HRegionServer
启动没报错,端口起来了,RunJar也有就说明启动正常了。
八,nginx配置
- # cat kylin.conf
- upstream kylin {
- server bigserver1:7070;
- server bigserver2:7070;
- server bigserver3:7070;
- ip_hash;
- }
- server {
- listen 80;
- server_name kylin.xxxx.com;
- index index.html;
- location / {
- proxy_pass http://kylin;
- proxy_redirect off;
- proxy_set_header Host $host;
- proxy_set_header X-Real-IP $remote_addr;
- proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
- }
- error_page 404 /404.html;
- location = /40x.html {
- }
- error_page 500 502 503 504 /50x.html;
- location = /50x.html {
- }
- access_log /var/log/nginx/kylin.access.log;
- error_log /var/log/nginx/kylin.error.log;
- }
重启nginx
九,遇到的问题及解决
1,mr历史记录没开
org.apache.kylin.engine.mr.exception.MapReduceException: Exception: java.net.ConnectException: Call From bigserver1/10.0.40.237 to bigserver1:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
java.net.ConnectException: Call From bigserver1/10.0.40.237 to bigserver1:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决办法:
- # $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
这个报错,不影响kylin的启动,但是会影响,Cubes的运行,会卡住
2,kylin https报错
十一月 13, 2019 10:23:17 上午 org.apache.coyote.AbstractProtocol init
严重: Failed to initialize end point associated with ProtocolHandler ["http-bio-7443"]
java.io.FileNotFoundException: /bigdata/kylin/tomcat/conf/.keystore (没有那个文件或目录)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
因为没有配置tomcat的https导出这个错误产生。
解决办法:
- # vim $KYLIN_HOME/tomcat/conf/server.xml
- <Connector port="7070" protocol="HTTP/1.1"
- connectionTimeout="20000"
- redirectPort="7443" //把这行去掉
- compression="on"
- compressionMinSize="2048"
- noCompressionUserAgents="gozilla,traviata"
- compressableMimeType="text/html,text/xml,text/javascript,application/javascript,application/json,text/css,text/plain"
- />
- //注释掉以下内容
- <!--<Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol"
- maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
- keystoreFile="conf/.keystore" keystorePass="changeit"
- clientAuth="false" sslProtocol="TLS" />-->
这个报错,不会影响启动,也不会影响使用。
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2231.html