【发布时间】:2021-12-17 04:35:16
【问题描述】:
我有一个具有以下架构的 spark 数据框:
stat_chiamate
|
chiamate_ricevute: struct (nullable = true)
| | |-- h_0: string (nullable = true)
| | |-- h_1: string (nullable = true)
| | |-- h_10: string (nullable = true)
| | |-- h_11: string (nullable = true)
| | |-- h_12: string (nullable = true)
| | |-- h_13: string (nullable = true)
| | |-- h_14: string (nullable = true)
| | |-- h_15: string (nullable = true)
| | |-- h_16: string (nullable = true)
| | |-- h_17: string (nullable = true)
| | |-- h_18: string (nullable = true)
| | |-- h_19: string (nullable = true)
| | |-- h_2: string (nullable = true)
| | |-- h_20: string (nullable = true)
| | |-- h_21: string (nullable = true)
| | |-- h_22: string (nullable = true)
| | |-- h_23: string (nullable = true)
| | |-- h_3: string (nullable = true)
| | |-- h_4: string (nullable = true)
| | |-- h_5: string (nullable = true)
| | |-- h_6: string (nullable = true)
| | |-- h_7: string (nullable = true)
| | |-- h_8: string (nullable = true)
| | |-- h_9: string (nullable = true)
| | |-- n_totale: string (nullable = true)
我想要一个像这样的数据框:
stat_chiamate: struct (nullable = true)
|
chiamate_ricevute: Array
|-- element(String)
其中chiamate_ricevute 是字段值的列表,例如:
h_0= 0
h_1= 1
h_2= 2
.
.
.
h_23=23
n_totale=412
我想要:
[0,1,2....,23] <-- I don't want n_totale values
在我的代码中,我使用df.select("stat_chiamate.chiamate_ricevute.*").schema.fieldNames()[:-1],但我只有一个fieldsName,但我该如何使用它们?
df=df.select(F.array(*[field for field in
df.select("stat_chiamate.chiamate_ricevute.*").schema.fieldNames() if field.startswith("h_")]).alias("CIRCO"))
【问题讨论】:
标签: dataframe apache-spark pyspark struct