【发布时间】:2019-04-06 09:50:37
【问题描述】:
我有如下输入数据-
Customer_ID,General,General
Channel,Nominal,Character
WeekDateSunday,Discrete,Numeric
RevenueWeekN01,Continuous,Numeric
RevenueWeekN02,Continuous,Numeric
RevenueWeekN03,Continuous,Numeric
RevenueWeekN04,Continuous,Numeric
RevenueWeekN05,Continuous,Numeric
RevenueWeekN06,Continuous,Numeric
RevenueWeekN07,Continuous,Numeric
RevenueWeekN08,Continuous,Numeric
我需要如下输出数据,只需添加一列(此列是基于第3列的structField):
Customer_ID,General,General, StructFieldType
Channel,Nominal,Character, StructField(Channel,StringType(), True)
WeekDateSunday,Discrete,Numeric, StructField(WeekDateSunday,DoubleType(), True)
RevenueWeekN01,Continuous,Numeric, StructField(RevenueWeekN01,DoubleType(), True)
RevenueWeekN02,Continuous,Numeric, StructField(RevenueWeekN02,DoubleType(), True)
RevenueWeekN03,Continuous,Numeric, StructField(RevenueWeekN03,DoubleType(), True)
RevenueWeekN04,Continuous,Numeric, StructField(RevenueWeekN04,DoubleType(), True)
RevenueWeekN05,Continuous,Numeric, StructField(RevenueWeekN05,DoubleType(), True)
RevenueWeekN06,Continuous,Numeric, StructField(RevenueWeekN06,DoubleType(), True)
RevenueWeekN07,Continuous,Numeric StructField(RevenueWeekN06,DoubleType(), True)
RevenueWeekN08,Continuous,Numeric StructField(RevenueWeekN06,DoubleType(), True)
以下是我用过的代码,对吗?
data_type.withColumn('structformat',when(col("Description") == 'Numeric', StructField(col("Field_Name"),DoubleType(), True)).otherwise(StructField(col("Field_Name"),StringType(), True)).show()
执行时报错-
AssertionError: field name should be string
【问题讨论】:
-
我想您写的是正确的,请详细说明您的问题并具体说明。
-
当我执行上面的代码时,它在下面抛出,我应该在 withColumn 方法中包含一些东西。 AssertionError: 字段名应该是字符串@Sundeep
标签: apache-spark dataframe pyspark schema