【发布时间】:2020-10-28 11:47:56
【问题描述】:
我在 spark 上有 3 个数据帧:dataframe1、dataframe2 和 dataframe3。
我想根据条件将 dataframe1 与其他 dataframe 连接起来。
我使用以下代码:
Dataset <Row> df= dataframe1.filter(when(col("diffDate").lt(3888),dataframe1.join(dataframe2,
dataframe2.col("id_device").equalTo(dataframe1.col("id_device")).
and(dataframe2.col("id_vehicule").equalTo(dataframe1.col("id_vehicule"))).
and(dataframe2.col("tracking_time").lt(dataframe1.col("tracking_time")))).orderBy(dataframe2.col("tracking_time").desc())).
otherwise(dataframe1.join(dataframe3,
dataframe3.col("id_device").equalTo(dataframe1.col("id_device")).
and(dataframe3.col("id_vehicule").equalTo(dataframe1.col("id_vehicule"))).
and(dataframe3.col("tracking_time").lt(dataframe1.col("tracking_time")))).orderBy(dataframe3.col("tracking_time").desc())));
但我得到了这个例外
Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset
编辑
输入数据帧:
数据框1
+-----------+-------------+-------------+-------------+
| diffDate |id_device |id_vehicule |tracking_time|
+-----------+-------------+-------------+-------------+
|222 |1 |5 |2020-05-30 |
|4700 |8 |9 |2019-03-01 |
+-----------+-------------+-------------+-------------+
数据框2
+-----------+-------------+-------------+-------------+
|id_device |id_vehicule |tracking_time|longitude |
+-----------+-------------+-------------+-------------+
|1 |5 |2020-05-12 | 33.21111 |
|8 |9 |2019-03-01 |20.2222 |
+-----------+-------------+-------------+-------------+
数据框3
+-----------+-------------+-------------+-------------+
|id_device |id_vehicule |tracking_time|latitude |
+-----------+-------------+-------------+-------------+
|1 |5 |2020-05-12 | 40.333 |
|8 |9 |2019-02-28 |2.00000 |
+-----------+-------------+-------------+-------------+
当 diffDate
+-----------+-------------+-------------+-------------+-----------+-------------+-------------+------------+
| diffDate |id_device |id_vehicule |tracking_time|id_device |id_vehicule |tracking_time|longitude|
+-----------+-------------+-------------+-------------+ +-----------+-------------+-------------+-------------+
|222 |1 |5 |2020-05-30 | 1 |5 |2020-05-12 | 33.21111 |
-----------+--------------+---------------+----------+----------+--------+-----------+--------------+-----------+
当 diffDate > 3888
+-----------+-------------+-------------+-------------+-----------+-------------+-------------+------------+
| diffDate |id_device |id_vehicule |tracking_time|id_device |id_vehicule |tracking_time|latitude|
+-----------+-------------+-------------+-------------+ +-----------+-------------+-------------+-------------+
|4700 |9 |5 |2019-03-01 | 8 |9 |2019-02-28 | 2.00000 |
-----------+--------------+---------------+----------+----------+--------+-----------+--------------+-----------+
我需要你的帮助
谢谢。
【问题讨论】:
-
您可以发布示例输入和预期输出吗?
标签: java sql dataframe apache-spark