【发布时间】:2021-01-11 13:38:41
【问题描述】:
我的输入火花数据框是;
Date Client Values Ratios
2020-10-26 1 NULL 0.21
2020-10-27 1 NULL 0.1
2020-10-28 1 NULL 0.25
2020-10-29 1 6 0.3
2020-10-30 1 NULL 0.4
2020-10-31 1 NULL 0.5
2020-11-01 1 1 0.3
2020-11-02 1 NULL 0.13
2020-11-03 1 NULL 0.67
2020-11-04 1 NULL 0.54
2020-11-05 1 NULL 0.2
2020-11-06 1 NULL 0.21
2020-11-07 1 2 0.7
2020-11-08 1 9 0.75
2020-11-09 1 NULL 0.9
2020-10-26 2 NULL 0.71
2020-10-27 2 NULL 0.19
2020-10-28 2 NULL 0.3
2020-10-29 2 10 0.34
2020-10-30 2 6 0.35
2020-10-31 2 NULL 0.93
2020-11-01 2 NULL 0.45
2020-11-02 2 NULL 0.43
2020-11-03 2 NULL 0.09
2020-11-04 2 NULL 0.39
2020-11-05 2 3 0.41
我想创建一个“Ratios_latest”列。为此,我应该按每个客户的“价值”列的值下降。 Anda 这将是我的“Ratio_latest”列值。我根据上面的数据分享了想要的输出;
Date Client Values Ratios Ratios_latest
2020-10-26 1 NULL 0.21 NULL
2020-10-27 1 NULL 0.1 NULL
2020-10-28 1 NULL 0.25 NULL
2020-10-29 1 6 0.3 0.54 -> After 6 rows later "Ratios" Column value is 0.54
2020-10-30 1 NULL 0.4 NULL
2020-10-31 1 NULL 0.5 NULL
2020-11-01 1 1 0.3 0.13 -> After 1 rows later "Ratios" Column value is 0.13
2020-11-02 1 NULL 0.13 NULL
2020-11-03 1 NULL 0.67 NULL
2020-11-04 1 NULL 0.54 NULL
2020-11-05 1 NULL 0.2 NULL
2020-11-06 1 NULL 0.21 NULL
2020-11-07 1 2 0.7 0.9 -> After 2 rows later "Ratios" Column value is 0.9
2020-11-08 1 9 0.75 NULL -> This value is null because it is for each client.
2020-11-09 1 NULL 0.9 NULL
2020-10-26 2 NULL 0.71 NULL
2020-10-27 2 NULL 0.19 NULL
2020-10-28 2 NULL 0.3 NULL
2020-10-29 2 10 0.34 0.98 -> After 10 rows later "Ratios" Column value is 0.98
2020-10-30 2 6 0.35 0.41 -> After 6 rows later "Ratios" Column value is 0.41
2020-10-31 2 NULL 0.93 NULL
2020-11-01 2 NULL 0.45 NULL
2020-11-02 2 NULL 0.43 NULL
2020-11-03 2 NULL 0.09 NULL
2020-11-04 2 NULL 0.39 NULL
2020-11-05 2 3 0.41 NULL
你能帮我解决这个问题吗?
【问题讨论】:
标签: apache-spark pyspark apache-spark-sql