【发布时间】:2019-12-27 12:43:27
【问题描述】:
我有一个这样的 DataFrame:
from pyspark.sql import SparkSession
from pyspark import Row
spark = SparkSession.builder \
.appName('DataFrame') \
.master('local[*]') \
.getOrCreate()
df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
Row(a=2, b='', c=['0', '1'], d='bar'),
Row(a=3, b='', c=['0', '1'], d='foo')])
| a| b| c| d|
+---+---+------+---+
| 1| |[0, 1]|foo|
| 2| |[0, 1]|bar|
| 3| |[0, 1]|foo|
+---+---+------+---+
我想用"c" 列的第一个元素创建"e" 列,用"c" 列的第二个元素创建"f" 列,如下所示:
|a |b |c |d |e |f |
+---+---+------+---+---+---+
|1 | |[0, 1]|foo|0 |1 |
|2 | |[0, 1]|bar|0 |1 |
|3 | |[0, 1]|foo|0 |1 |
+---+---+------+---+---+---+
【问题讨论】: