【发布时间】:2021-12-26 14:11:06
【问题描述】:
我是 pyspark 的新手,我需要帮助才能在 df 中搜索。
我有 df1 和学生数据如下
+---------+----------+--------------------+
|studentid| course | registration_date |
+---------+----------+--------------------+
| 348| 2| 15-11-2021 |
| 567| 1| 05-11-2021 |
| 595| 3| 15-10-2021 |
| 580| 2| 06-11-2021 |
| 448| 4| 15-09-2021 |
+---------+----------+--------------------+
df2.有关于注册时间的信息如下
+--------+------------+------------+
| period | start_date | end_date |
+--------+------------+------------+
| 1| 01-09-2021 | 15-09-2021 |
| 2| 16-09-2021 | 30-09-2021 |
| 3| 01-10-2021 | 15-10-2021 |
| 4| 16-10-2021 | 31-10-2021 |
| 5| 01-11-2021 | 15-11-2021 |
| 6| 16-11-2021 | 30-11-2021 |
+--------+------------+------------+
我需要逐行迭代 df1,获取学生 registration_date 并使用此日期,转到 df2 并获取条件为 df2.start_date 结果将是新的df,如下所示
+---------+----------+--------------------+--------+------------+------------+
|studentid| course | registration_date | period | start_date | end_date |
+---------+----------+--------------------+--------+------------+------------+
| 348| 2| 15-11-2021 | 5| 01-11-2021 | 15-11-2021 |
| 567| 1| 05-11-2021 | 5| 01-11-2021 | 15-11-2021 |
| 595| 3| 15-10-2021 | 3| 01-10-2021 | 15-10-2021 |
| 580| 2| 06-11-2021 | 5| 01-11-2021 | 15-11-2021 |
| 448| 4| 15-09-2021 | 1| 01-09-2021 | 15-09-2021 |
+---------+----------+--------------------+--------+------------+------------+
【问题讨论】:
标签: python dataframe apache-spark pyspark