Apache Kylin不同于“大规模并行处理”(Massive Parallel Processing,MPP)架构的Hive、Presto等,Apache Kylin采用“预计算”的模式,用户只需要提前定义好查询维度,Kylin将帮助我们进行计算,并将结果存储到HBase中,为海量数据的查询和分析提供亚秒级返回,是一种典型的“空间换时间”的解决方案。Apache Kylin的出现不仅很好地解决了海量数据快速查询的问题,也避免了手动开发和维护提前计算程序带来的一系列麻烦。
说的更直白一点就是查询数据不查原始表,查结果表。
一,软件要求
Hadoop: 2.7+, 3.1+ (since v2.5)
Hive: 0.13 - 1.2.1+
HBase: 1.1+, 2.0 (since v2.5)
Spark (可选) 2.3.0+
Kafka (可选) 1.0.0+ (since v2.5)
JDK: 1.8+ (since v2.5)
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
二,硬件要求
运行 Kylin 的服务器的最低配置为 4 core CPU,16 GB 内存和 100 GB 磁盘。 对于高负载的场景,建议使用 24 core CPU,64 GB 内存或更高的配置。如果达不到是可以运行的。
三,下载kylin
# wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.6.4/apache-kylin-2.6.4-bin-hbase1x.tar.gz # tar -zxvf apache-kylin-2.6.4-bin-hbase1x.tar.gz # cp -r apache-kylin-2.6.4-bin /bigdata/kylin
hbase版,是元数据库放在hbase中。
四,设置环境变量
# vim ~/.bashrc # export KYLIN_HOME=/bigdata/kylin # export PATH=$KYLIN_HOME/bin:$PATH # source ~/.bashrc
五,kylin配置
# vim $KYLIN_HOME/conf/kylin.properties kylin.metadata.url=kylin_metadata@hbase kylin.server.mode=all kylin.server.cluster-servers=bigserver1:7070,bigserver2:7070,bigserver3:7070 kylin.job.scheduler.default=2 kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
服务端只有一个kylin.server.mode=all,客户端kylin.server.mode=query。
配置相同的 kylin.metadata.url 值,即配置所有的 Kylin 节点使用同一个 HBase metastore。
配置 Kylin 节点列表 kylin.server.cluster-servers,包括所有节点(包括当前节点),当事件变化时,接收变化的节点需要通知其他所有节点(包括当前节点)。
kylin.job.scheduler.default,任务数为2,kylin.job.lock启用分布式任务锁。
六,导入测试数据
# ${KYLIN_HOME}/bin/sample.sh Retrieving hadoop conf dir... Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data Going to create sample tables in hive to database DEFAULT by cli SLF4J: Class path contains multiple SLF4J bindings. 。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。 2019-11-12 17:33:26,991 INFO [close-hbase-conn] hbase.HBaseConnection:136 : Closing HBase connections... 2019-11-12 17:33:26,991 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:2251 : Closing master protocol: MasterService 2019-11-12 17:33:26,998 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:1774 : Closing zookeeper sessionid=0x103755941330014 2019-11-12 17:33:27,017 INFO [close-hbase-conn] zookeeper.ZooKeeper:684 : Session: 0x103755941330014 closed 2019-11-12 17:33:27,017 INFO [main-EventThread] zookeeper.ClientCnxn:519 : EventThread shut down for session: 0x103755941330014 Sample cube is created successfully in project 'learn_kylin'. Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect
七,启动kylin集群
[root@bigserver1 kylin]# kylin.sh start Retrieving hadoop conf dir... KYLIN_HOME is set to /bigdata/kylin Retrieving hive dependency... Retrieving hbase dependency... Retrieving hadoop conf dir... Retrieving kafka dependency... Retrieving Spark dependency... Start to check whether we need to migrate acl tables Retrieving hive dependency... Retrieving hbase dependency... Retrieving hadoop conf dir... Retrieving kafka dependency... Retrieving Spark dependency... SLF4J: Class path contains multiple SLF4J bindings. 。。。。。。。。。。。。。。。。。。。。省略。。。。。。。。。。。。。。。。。。。。。。。。 A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' Check the log at /bigdata/kylin/logs/kylin.log Web UI is at http://bigserver1:7070/kylin # netstat -tpnl |grep 7070 tcp6 0 0 :::7070 :::* LISTEN 17395/java [root@bigserver1 kylin]# jps 18064 NameNode 18145 JournalNode 17490 QuorumPeerMain 17395 RunJar //会多出来一个 17300 HMaster 18244 DFSZKFailoverController 5302 Kafka 2583 JobHistoryServer 20537 Jps 5150 ResourceManager 8462 HRegionServer
启动没报错,端口起来了,RunJar也有就说明启动正常了。
八,nginx配置
# cat kylin.conf upstream kylin { server bigserver1:7070; server bigserver2:7070; server bigserver3:7070; ip_hash; } server { listen 80; server_name kylin.xxxx.com; index index.html; location / { proxy_pass http://kylin; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } error_page 404 /404.html; location = /40x.html { } error_page 500 502 503 504 /50x.html; location = /50x.html { } access_log /var/log/nginx/kylin.access.log; error_log /var/log/nginx/kylin.error.log; }
重启nginx
九,遇到的问题及解决
1,mr历史记录没开
org.apache.kylin.engine.mr.exception.MapReduceException: Exception: java.net.ConnectException: Call From bigserver1/10.0.40.237 to bigserver1:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
java.net.ConnectException: Call From bigserver1/10.0.40.237 to bigserver1:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决办法:
# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
这个报错,不影响kylin的启动,但是会影响,Cubes的运行,会卡住
2,kylin https报错
十一月 13, 2019 10:23:17 上午 org.apache.coyote.AbstractProtocol init
严重: Failed to initialize end point associated with ProtocolHandler ["http-bio-7443"]
java.io.FileNotFoundException: /bigdata/kylin/tomcat/conf/.keystore (没有那个文件或目录)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
因为没有配置tomcat的https导出这个错误产生。
解决办法:
# vim $KYLIN_HOME/tomcat/conf/server.xml <Connector port="7070" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="7443" //把这行去掉 compression="on" compressionMinSize="2048" noCompressionUserAgents="gozilla,traviata" compressableMimeType="text/html,text/xml,text/javascript,application/javascript,application/json,text/css,text/plain" /> //注释掉以下内容 <!--<Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol" maxThreads="150" SSLEnabled="true" scheme="https" secure="true" keystoreFile="conf/.keystore" keystorePass="changeit" clientAuth="false" sslProtocol="TLS" />-->
这个报错,不会影响启动,也不会影响使用。
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2231.html