spark sql 数据 导出 csv

张映 发表于 2019-07-11

分类目录: hadoop/spark/scala

标签:, ,

没找到合适spark sql的客户端,所以不能像navicat,heidisql等那样,可以把可视化数据导成csv,excel等。但是可以通过spark-shell导出数据。

[root@bigserver1 bin]# spark-shell --master yarn
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://bigserver1:4040
Spark context available as 'sc' (master = yarn, app id = application_1558346064103_0072).
Spark session available as 'spark'.
Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.4.0
 /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val android = "select imei,count(*) as total from tanktest.user where imei!='__IMEI__' and imei!='0' and imei!='' group by imei order by rand()"
android: String = select imei,count(*) as total from tanktest.user where imei!='__IMEI__' and imei!='0' and imei!='' group by imei order by rand()

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@40d0bff1

scala> val android_sql = sqlContext.sql(android)
android_sql: org.apache.spark.sql.DataFrame = [imei: string, total: bigint]

scala> android_sql.write.format("com.databricks.spark.csv").option("header","true").save("/bigdata/export/android.csv")
[Stage 3:================================================> (178 + 4) / 200]

注意:

/bigdata/export/android.csv,这个hdfs的路径,不是操作系统的路径。如下图

spark sql 查询数据导出

spark sql 查询数据导出



转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/hadoop/2149.html