【发布时间】:2018-12-06 13:17:03
【问题描述】:
我在一个月的窗口中尝试平均函数,但我无法获得所需的结果,请在下面找到我正在使用的代码和数据集。 你能帮我找出我做错了什么吗?
代码:
val df= monthlyFilesDF.groupBy($"COL1", $"COL2",window($"EventTime","1 month").alias("month"))
.agg(avg("COL4").alias("avg_COL4"), avg("COL5").alias("avg_COL5"),avg("COL6").alias("avg_COL6"))
.withColumn("month", lit($"month").cast(StringType))
.withColumn("avg_COL4", lit($"avg_COL5").cast(StringType))
.withColumn("avg_COL5", lit($"avg_COL5").cast(StringType))
.withColumn("avg_COL6", lit($"avg_COl6").cast(StringType))
.show(10,false)
样本数据集:
+------------+--------------+---------------+-----------------+---------------+--------------+---------------+
|COL1 |COL2 |COL3 |EventTime |COL4 |COL5 |COL6 |
+------------+--------------+---------------+-----------------+---------------+--------------+---------------+
|ServiceCent4 |AP-1-IOO-PPP |241.206.155.172|06-12-18:17:42:34|162 |53 |1544098354885 |
|ServiceCent1 |AP-1-SPG-QQQ |178.182.57.167 |06-12-18:17:42:34|110 |30 |1544098354885 |
|ServiceCent4 |AP-1-SPG-DDD |180.201.249.252|06-12-18:17:42:34|245 |19 |1544098354885 |
|ServiceCent3 |AP-1-SPG-SSS |210.193.251.211|06-12-18:17:42:34|10 |88 |1544098354885 |
|ServiceCent4 |AP-2-SPG-GGG |45.25.186.173 |06-12-18:17:42:34|219 |12 |1544098354886 |
|ServiceCent3 |AP-4-SPG-UI |234.60.84.236 |06-12-18:17:42:34|216 |39 |1544098354886 |
|ServiceCent4 |AP-3-SPG-HUH |101.244.98.173 |06-12-18:17:42:34|112 |26 |1544098354886 |
|ServiceCent4 |AP-4-SPG-GVF |203.169.206.12 |06-12-18:17:42:34|115 |40 |1544098354886 |
|ServiceCent4 |AP-0-SPG-JOD |156.158.45.6 |06-12-18:17:42:34|156 |76 |1544098354886 |
|ServiceCent4 |AP-1-SPG-13 |96.189.94.4 |06-12-18:17:42:34|119 |57 |1544098354886 |
+------------+--------------+---------------+-----------------+---------------+--------------+---------------+
输出
+------------+--------------+-----+------------+-----------------+--------------+
|COL1 |COL2 |month|avg_COL4 |avg_COL5 | avg_CO6|
+------------+--------------+-----+------------+-----------------+--------------+
+------------+--------------+-----+------------+-----------------+--------------+
【问题讨论】:
-
您的预期结果是什么?如果你想得到每个月的平均值,你可以按月分组,不需要窗口。
-
你得到错误..
java.lang.IllegalArgumentException: Intervals greater than a month is not supported (1 month). -
哪个版本的火花?。
-
spark 2.2 -- 不,我没有收到此错误,我只是没有数据,但是当我按照@MichaelWest 的建议使用月份时,我能够获取数据但不能获取月份信息..I需要月份信息以及结果。
标签: scala apache-spark apache-spark-sql