【发布时间】:2020-06-16 10:39:41
【问题描述】:
我有两张表,如下图:
表 1 -
表 2 -
我希望我的输出类似于下表,并在循环中进行以下计算,直到表 2 上的所有周都经过表 1 的所有天:
table3(week1_1) = table2(week1) * table1(day1_ratio)
table3(week1_2) = table2(week1) * table1(day2_ratio)
如何做到这一点?
感谢所有帮助!
谢谢
【问题讨论】:
我有两张表,如下图:
表 1 -
表 2 -
我希望我的输出类似于下表,并在循环中进行以下计算,直到表 2 上的所有周都经过表 1 的所有天:
table3(week1_1) = table2(week1) * table1(day1_ratio)
table3(week1_2) = table2(week1) * table1(day2_ratio)
如何做到这一点?
感谢所有帮助!
谢谢
【问题讨论】:
试试这个-
用 scala 编写,但只需极少改动即可移植到 pyspark
val table1 = Seq(
("o1", "i1", 1, 0.6),
("o1", "i1", 2, 0.4)
).toDF("outlet", "item", "day", "ratio")
table1.show(false)
/**
* +------+----+---+-----+
* |outlet|item|day|ratio|
* +------+----+---+-----+
* |o1 |i1 |1 |0.6 |
* |o1 |i1 |2 |0.4 |
* +------+----+---+-----+
*/
val table2 = Seq(
("o1", "i1", 4, 5, 6, 8)
).toDF("outlet", "item", "week1", "week2", "week3", "week4")
table2.show(false)
/**
* +------+----+-----+-----+-----+-----+
* |outlet|item|week1|week2|week3|week4|
* +------+----+-----+-----+-----+-----+
* |o1 |i1 |4 |5 |6 |8 |
* +------+----+-----+-----+-----+-----+
*/
table1.join(table2, Seq("outlet", "item"))
.groupBy("outlet", "item")
.pivot("day")
.agg(
first($"week1" * $"ratio").as("week1"),
first($"week2" * $"ratio").as("week2"),
first($"week3" * $"ratio").as("week3"),
first($"week4" * $"ratio").as("week4")
).show(false)
/**
* +------+----+-------+-------+------------------+-------+-------+-------+------------------+-------+
* |outlet|item|1_week1|1_week2|1_week3 |1_week4|2_week1|2_week2|2_week3 |2_week4|
* +------+----+-------+-------+------------------+-------+-------+-------+------------------+-------+
* |o1 |i1 |2.4 |3.0 |3.5999999999999996|4.8 |1.6 |2.0 |2.4000000000000004|3.2 |
* +------+----+-------+-------+------------------+-------+-------+-------+------------------+-------+
*/
from pyspark.sql import functions as F
table1.join(table2, ['outlet', 'item'])
.groupBy("outlet", "item")
.pivot("day")
.agg(
F.first(df.week1 * df.ratio).alias("week1"),
F.first(df.week2 * df.ratio).alias("week2"),
F.first(df.week3 * df.ratio).alias("week3"),
F.first(df.week4 * df.ratio).alias("week4")
).show(truncate=False)
【讨论】:
from pyspark.sql import functions as F table1.join(table2, ['outlet', 'item']) .groupBy("outlet", "item") .pivot("day") .agg( F.first(df.week1 * df.ratio).alias("week1"), F.first(df.week2 * df.ratio).alias("week2"), F.first(df.week3 * df.ratio).alias("week3"), F.first(df.week4 * df.ratio).alias("week4") ).show(truncate=False)。尚未在 pyspark 上执行,但它应该可以工作