【发布时间】:2020-06-22 18:25:18
【问题描述】:
我想使用 Scala 语言编写 Spark 代码来过滤掉要填充的行。
我已经有一个 spark sql 查询,但想将其转换为 Spark Scala 代码。
在查询中,我在同一个数据帧上执行内部连接,并应用了一些过滤条件,例如 2 个日期字段之间的差异应该在 1 到 9 的范围内。
Spark 查询是不言自明的,因此我不解释它。
spark.sql("select * from df1 where Container not in(select a.Container from df1 a inner join df1 b
on a.ContainerEquipmentNumber = b.ContainerEquipmentNumber
where a.EquipmentType <> b.EquipmentType
and a.transport_mode = 'Ocean'
and b.transport_mode = 'Ocean'
and DATEDIFF(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(a.ETD,'yyyy-MM-dd'),'yyyy-MM-dd')),TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(b.ETD,'yyyy-MM-dd'),'yyyy-MM-dd')))
between 1 and 9) order by ContainerEquipmentNumber , ETD desc ")
我的 Spark 代码
val DF11 = DF0
val DF22 = DF0
DF11.join(DF22, DF11("ContainerEquipmentNumber") =!= DF22("ContainerEquipmentNumber")
&& DF11("EquipmentType")===DF22("EquipmentType")==="Ocean"
&& DATEDIFF(DF11("ETD"), DF22("ETD")),
"inner")
但上面的代码根本不起作用。
有人可以帮我实现与 Spark SQL 具有相似功能的 Spark Scala 代码吗?
提前致谢。
|ConsigneeName|Consignee |pre_location_city |pre_location_country|pre_location_region|pre_location_locode|origin_location_city|origin_location_country|origin_location_sitename|origin_location_region|origin_location_locode|destination_location_city|destination_location_country|destination_location_sitename|destination_location_region|destination_location_locode|post_location_city|post_location_country |post_location_region|post_location_locode|main_transport_mode|pre_transport_mode|post_transport_mode|ContainerEquipmentNumber|EquipmentType|PONumber |MODSSONumber|Carrier|CarrierName|ETA |ETD |Source|Servicetype|ContainerVolume|freight_weight|Shipment_Number|weight_unit|CBLNumber |Shipper|HBLNumber|TEU |Tradelane|Booking_number|Year|Month|Day|
+-------------+----------+------------------+--------------------+-------------------+-------------------+--------------------+-----------------------+------------------------+----------------------+----------------------+-------------------------+----------------------------+-----------------------------+---------------------------+---------------------------+------------------+------------------------+--------------------+--------------------+-------------------+------------------+-------------------+------------------------+-------------+-----------+------------+-------+-----------+----------+----------+------+-----------+---------------+--------------+---------------+-----------+-----------------+-------+---------+----+---------+--------------+----+-----+---+
|ITC |GBSYNGGBI |SENEFFE |Belgium |EUROPE & AME |BESEF |ANTWERP |Belgium |null |EUROPE & AME |null |CARTAGENA |Colombia |null |LATIN AMERICA |COCTG |CARTAGENA |Columbia |null |COCTG |Ocean |Truck |Truck |TCLU5174641 |20DRY |G0085381229|ZRH0047428 |DHLU |null |2019-05-14|2019-04-30|GBI |CFSCFS |3.96 |2115.352 |ZRH0046385 |kg |DHLU/ANRA12657 |null |null |null|null |null |2020|6 |19 |
|ITC |GBSYNGGBI |SCHOENEBECK (ELBE)|Germany |null |null |HAMBURG |Germany |null |EUROPE & AME |null |CARTAGENA |Columbia |null |LATIN AMERICA |COCTG |CARTAGENA |Columbia |null |COCTG |Ocean |Truck |Truck |FCIU2693429 |20DRY |G0085405241|ZRH0058227 |HLCU |null |2019-12-03|2019-11-17|GBI |CYCY |13.92 |10095.04 |ZRH0054021 |kg |HLCU/RTM191082779|null |null |null|null |null |2020|6 |19 |
|ITC |GBSYNGGBI |OINOFYTA |Greece |EUROPE & AME |GROFY |PIRAEUS |Greece |null |EUROPE & AME |null |ALTAMIRA |Mexico (East/Gulf Coast) |null |null |null |MATAMOROS |Mexico (East/Gulf Coast)|null |MXMAM |Ocean |Truck |Truck |UACU4054126 |20DRY |G0085388341|ZRH0049718 |HLCU |null |2019-07-01|2019-05-22|GBI |CYCY |27.36 |11209.6 |ZRH0046408 |kg |HLCU/RTM190441160|null |null |null|null |null |2020|6 |19 |
|ITC |CHSYCLEGAL|JINAN |China |ASIA PACIFIC |CNJNN |QINGDAO |China |null |ASIA PACIFIC |null |MELBOURNE |Australia |null |ASIA PACIFIC |AUMEL |TOTTENHAM |Australia |ASIA PACIFIC |AUTOT |Ocean |Truck |Truck |CMAU3159388 |20DRY |G6500024081|TST1073545 |ANNU |null |2019-02-23|2019-02-06|DEX |CYCY |20 |20826 |TST0579524 |kg |ANNU/WDSM006090 |null |null |null|null |null |2020|6 |19 |
|ITC |CHSYCLEGAL|Jinan |China |null |null |QINGDAO |China |null |ASIA PACIFIC |null |MELBOURNE |Australia |null |ASIA PACIFIC |AUMEL |TOTTENHAM |Australia |ASIA PACIFIC |AUTOT |Ocean |Truck |Truck |UETU2722010 |20DRY |G6500029924|TST1135194 |HLCU |null |2019-12-03|2019-11-17|DEX |CYCY |25 |20826 |TST0606019 |kg |HLCU/TA1191101846|null |null |null|null |null |2020|6 |19 |
+-------------+----------+------------------+--------------------+-------------------+-------------------+--------------------+-----------------------+------------------------+----------------------+----------------------+-------------------------+----------------------------+-----------------------------+---------------------------+---------------------------+------------------+------------------------+--------------------+--------------------+-------------------+------------------+-------------------+------------------------+-------------+-----------+------------+-------+-----------+----------+----------+------+-----------+---------------+--------------+---------------+-----------+-----------------+-------+---------+----+---------+--------------+----+-----+---+
only showing top 5 rows
【问题讨论】:
标签: scala apache-spark apache-spark-sql