flink on yarn 提交任务的二种方式

张映发表于 2020-03-23

spark，flink都能进行流处理和批处理。spark的文章写了好多，请在本博客中去搜索。flink的安装，请参考：cdh6 flink 安装

一，有常驻进程的flink（一直是running的进程），去执行任务

1，启动flink，并分配资源

./yarn-session.sh -n 2 -jm 1024 -tm 1024
./yarn-session.sh -id application_1584936998803_0066

上面是命令行模式下，启动并分配资源。也可以通过cloudera manager的管理后台，启动flink，启动后，就已经绑定了，flink的running进程，也就是绑定application id

flink application running 已启动

2，提交任务

[root@bigserver2 bin]# yarn application -list
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
20/03/23 15:01:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm188
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):4
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL
application_1584936998803_0064	Flink session cluster	        Apache Flink	      root	root.users.root	          ACCEPTED	         UNDEFINED	             0%	                                N/A
application_1584936998803_0067	Flink session cluster	        Apache Flink	      root	root.users.root	          ACCEPTED	         UNDEFINED	             0%	                                N/A
application_1584936998803_0066	Flink session cluster	        Apache Flink	      root	root.users.root	           RUNNING	         UNDEFINED	           100%	             http://bigserver2:8081
application_1584936998803_0065	Flink session cluster	        Apache Flink	      root	root.users.root	          ACCEPTED	         UNDEFINED	             0%	                                N/A

[root@bigserver2 bin]# ./flink run ../examples/streaming/WordCount.jar --input hdfs://bigdata1/test/word --output hdfs://bigdata1/test/word4
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
Starting execution of program
Program execution finished
Job with JobID 6f6e63e88d4093fcca08a5f8f6cb58a1 has finished.
Job Runtime: 16254 ms

注意：在cloudera manager所在的机器一直提交不成功。datanode，namenode都可以提交成功。

查看application错误日志发现：

org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed

该错误是因为，kerberos认证失败，cdh6，并没有启动kerberos。所以该错误可以忽略。但是如果已经开启动了kerberos，这个问题就要解决了。

3，指定application id提交任务

[root@bigserver5 bin]# ./flink run -yid application_1584936998803_0013  ../examples/streaming/WordCount.jar --input hdfs://bigdata1/test/word --output hdfs://bigdata1/test/word1
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
2020-03-23 14:12:46,020 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-03-23 14:12:46,020 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-03-23 14:12:46,024 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-03-23 14:12:46,065 INFO  org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing over to rm188
2020-03-23 14:12:46,104 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found application JobManager host name 'bigserver3' and port '8081' from supplied application id 'application_1584936998803_0013'
Starting execution of program
Program execution finished
Job with JobID 595d31cb57ded04387241090f1a7349c has finished.
Job Runtime: 11990 ms

指定了application id，在集群中的任何一台机器执行都可以。

flink提交任务执行中

二，提交flink任务时，再获取资源，这一点根spark-submit很像

1，停止flink的集群，也就是说，yarn application -list，没东西。这一步很重要。如果不做，任务提交不成功，会报以一下错误。

2020-03-23 13:49:43,510 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

错误的原因：cloudera manager，所在的机器8030端口，已被running的application id占用。

2，提交任务


[root@bigserver2 bin]#  ./start-cluster.sh   //启动集群，不然报 拒绝连接: localhost/127.0.0.1:8081
[root@bigserver2 bin]# ./flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 1024 ../examples/streaming/WordCount.jar --input hdfs://bigdata1/test/word --output hdfs://bigdata1/test/word_res1
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
2020-03-23 15:30:14,804 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-03-23 15:30:14,804 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-03-23 15:30:14,808 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The argument yn is deprecated in will be ignored.
2020-03-23 15:30:14,808 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The argument yn is deprecated in will be ignored.
2020-03-23 15:30:14,897 INFO  org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing over to rm188
2020-03-23 15:30:14,930 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-03-23 15:30:14,993 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, numberTaskManagers=1, slotsPerTaskManager=1}
2020-03-23 15:30:15,305 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - The configuration directory ('/opt/cloudera/parcels/FLINK-1.9.1-BIN-SCALA_2.12/lib/flink/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.
2020-03-23 15:30:30,820 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Submitting application master application_1584936998803_0069
2020-03-23 15:30:31,071 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1584936998803_0069
2020-03-23 15:30:31,072 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Waiting for the cluster to be allocated
2020-03-23 15:30:31,075 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deploying cluster, current state ACCEPTED
2020-03-23 15:30:36,860 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - YARN application has been deployed successfully.
Starting execution of program
Program execution finished
Job with JobID 7a3b8837bcc214eab0732361e73b55a1 has finished.
Job Runtime: 24120 ms

转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2384.html

留下评论

抱歉，发表回复评论您必须登录。

海底苍鹰(tank)博客

－－一步，二步，三步，N步，二行脚印

赞助本站

关于我

留言板

开发手册

linux命令

首页

flink on yarn 提交任务的二种方式

留下评论

分类目录

最近文章

最近评论和留言

登录

海底苍鹰(tank)博客

－－一步，二步，三步，N步，二行脚印

赞助本站 关于我 留言板 开发手册 linux命令 首页

flink on yarn 提交任务的二种方式

留下评论

分类目录

最近文章

最近评论和留言

登录

赞助本站

关于我

留言板

开发手册

linux命令

首页