spark,flink都能进行流处理和批处理。spark的文章写了好多,请在本博客中去搜索。flink的安装,请参考:cdh6 flink 安装
一,有常驻进程的flink(一直是running的进程),去执行任务
1,启动flink,并分配资源
./yarn-session.sh -n 2 -jm 1024 -tm 1024 ./yarn-session.sh -id application_1584936998803_0066
上面是命令行模式下,启动并分配资源。也可以通过cloudera manager的管理后台,启动flink,启动后,就已经绑定了,flink的running进程,也就是绑定application id
2,提交任务
[root@bigserver2 bin]# yarn application -list WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 20/03/23 15:01:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm188 Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):4 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1584936998803_0064 Flink session cluster Apache Flink root root.users.root ACCEPTED UNDEFINED 0% N/A application_1584936998803_0067 Flink session cluster Apache Flink root root.users.root ACCEPTED UNDEFINED 0% N/A application_1584936998803_0066 Flink session cluster Apache Flink root root.users.root RUNNING UNDEFINED 100% http://bigserver2:8081 application_1584936998803_0065 Flink session cluster Apache Flink root root.users.root ACCEPTED UNDEFINED 0% N/A [root@bigserver2 bin]# ./flink run ../examples/streaming/WordCount.jar --input hdfs://bigdata1/test/word --output hdfs://bigdata1/test/word4 Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. Starting execution of program Program execution finished Job with JobID 6f6e63e88d4093fcca08a5f8f6cb58a1 has finished. Job Runtime: 16254 ms
注意:在cloudera manager所在的机器一直提交不成功。datanode,namenode都可以提交成功。
查看application错误日志发现:
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed
该错误是因为,kerberos认证失败,cdh6,并没有启动kerberos。所以该错误可以忽略。但是如果已经开启动了kerberos,这个问题就要解决了。
3,指定application id提交任务
[root@bigserver5 bin]# ./flink run -yid application_1584936998803_0013 ../examples/streaming/WordCount.jar --input hdfs://bigdata1/test/word --output hdfs://bigdata1/test/word1 Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. 2020-03-23 14:12:46,020 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2020-03-23 14:12:46,020 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2020-03-23 14:12:46,024 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2020-03-23 14:12:46,065 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm188 2020-03-23 14:12:46,104 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Found application JobManager host name 'bigserver3' and port '8081' from supplied application id 'application_1584936998803_0013' Starting execution of program Program execution finished Job with JobID 595d31cb57ded04387241090f1a7349c has finished. Job Runtime: 11990 ms
指定了application id,在集群中的任何一台机器执行都可以。
二,提交flink任务时,再获取资源,这一点根spark-submit很像
1,停止flink的集群,也就是说,yarn application -list,没东西。这一步很重要。如果不做,任务提交不成功,会报以一下错误。
2020-03-23 13:49:43,510 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
错误的原因:cloudera manager,所在的机器8030端口,已被running的application id占用。
2,提交任务
[root@bigserver2 bin]# ./start-cluster.sh //启动集群,不然报 拒绝连接: localhost/127.0.0.1:8081 [root@bigserver2 bin]# ./flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 1024 ../examples/streaming/WordCount.jar --input hdfs://bigdata1/test/word --output hdfs://bigdata1/test/word_res1 Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. 2020-03-23 15:30:14,804 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2020-03-23 15:30:14,804 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2020-03-23 15:30:14,808 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - The argument yn is deprecated in will be ignored. 2020-03-23 15:30:14,808 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - The argument yn is deprecated in will be ignored. 2020-03-23 15:30:14,897 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm188 2020-03-23 15:30:14,930 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2020-03-23 15:30:14,993 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, numberTaskManagers=1, slotsPerTaskManager=1} 2020-03-23 15:30:15,305 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/opt/cloudera/parcels/FLINK-1.9.1-BIN-SCALA_2.12/lib/flink/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2020-03-23 15:30:30,820 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1584936998803_0069 2020-03-23 15:30:31,071 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1584936998803_0069 2020-03-23 15:30:31,072 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2020-03-23 15:30:31,075 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2020-03-23 15:30:36,860 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. Starting execution of program Program execution finished Job with JobID 7a3b8837bcc214eab0732361e73b55a1 has finished. Job Runtime: 24120 ms
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2384.html