【问题标题】:Add coulmn with StructField types to the dataframe in pyspark将具有 StructField 类型的列添加到 pyspark 中的数据框
【发布时间】:2019-04-06 09:50:37
【问题描述】:

我有如下输入数据-

Customer_ID,General,General

Channel,Nominal,Character

WeekDateSunday,Discrete,Numeric

RevenueWeekN01,Continuous,Numeric

RevenueWeekN02,Continuous,Numeric

RevenueWeekN03,Continuous,Numeric

RevenueWeekN04,Continuous,Numeric

RevenueWeekN05,Continuous,Numeric

RevenueWeekN06,Continuous,Numeric

RevenueWeekN07,Continuous,Numeric

RevenueWeekN08,Continuous,Numeric

我需要如下输出数据,只需添加一列(此列是基于第3列的structField):

Customer_ID,General,General, StructFieldType 

Channel,Nominal,Character, StructField(Channel,StringType(), True) 

WeekDateSunday,Discrete,Numeric, StructField(WeekDateSunday,DoubleType(), True) 

RevenueWeekN01,Continuous,Numeric, StructField(RevenueWeekN01,DoubleType(), True) 

RevenueWeekN02,Continuous,Numeric, StructField(RevenueWeekN02,DoubleType(), True) 

RevenueWeekN03,Continuous,Numeric, StructField(RevenueWeekN03,DoubleType(), True) 

RevenueWeekN04,Continuous,Numeric, StructField(RevenueWeekN04,DoubleType(), True) 

RevenueWeekN05,Continuous,Numeric, StructField(RevenueWeekN05,DoubleType(), True) 

RevenueWeekN06,Continuous,Numeric, StructField(RevenueWeekN06,DoubleType(), True) 

RevenueWeekN07,Continuous,Numeric StructField(RevenueWeekN06,DoubleType(), True) 

RevenueWeekN08,Continuous,Numeric StructField(RevenueWeekN06,DoubleType(), True)

以下是我用过的代码,对吗?

data_type.withColumn('structformat',when(col("Description") == 'Numeric', StructField(col("Field_Name"),DoubleType(), True)).otherwise(StructField(col("Field_Name"),StringType(), True)).show()

执行时报错-

AssertionError: field name should be string

【问题讨论】:

  • 我想您写的是正确的,请详细说明您的问题并具体说明。
  • 当我执行上面的代码时,它在下面抛出,我应该在 withColumn 方法中包含一些东西。 AssertionError: 字段名应该是字符串@Sundeep

标签: apache-spark dataframe pyspark schema


【解决方案1】:

可能错误在于您有单引号,只需将其更改为双引号,您就可以摆脱错误

data_type.withColumn("structformat",when(col("Description") == "Numeric", StructField(col("Field_Name"),DoubleType(), True)).otherwise(StructField(col("Field_Name"),StringType(), True)).show()

仍有任何问题请发表评论,如果有帮助,请批准答案。

编辑:

Customer_ID,General,General, StructFieldType 

Channel,Nominal,Character, StructField("Channel",StringType(), True) 

WeekDateSunday,Discrete,Numeric, StructField("WeekDateSunday",DoubleType(), True) 

RevenueWeekN01,Continuous,Numeric, StructField("RevenueWeekN01",DoubleType(), True) 

RevenueWeekN02,Continuous,Numeric, StructField("RevenueWeekN02",DoubleType(), True) 

RevenueWeekN03,Continuous,Numeric, StructField("RevenueWeekN03",DoubleType(), True) 

RevenueWeekN04,Continuous,Numeric, StructField("RevenueWeekN04",DoubleType(), True) 

RevenueWeekN05,Continuous,Numeric, StructField("RevenueWeekN05",DoubleType(), True) 

RevenueWeekN06,Continuous,Numeric, StructField("RevenueWeekN06",DoubleType(), True) 

RevenueWeekN07,Continuous,Numeric StructField("RevenueWeekN06",DoubleType(), True) 

RevenueWeekN08,Continuous,Numeric StructField("RevenueWeekN06",DoubleType(), True)

试试这个

【讨论】:

  • 还是出现同样的错误,是否可以在列中添加 StructFiled?
  • 尝试添加的新代码,我想它应该期望字符串作为参数
  • @Sundeep,跟单引号没关系。
  • @prazy 据我所知,整个代码都可以正常工作,可能是什么问题?
  • 以上代码有效是什么意思?你能告诉我你的DataFrame吗?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-02-08
  • 2022-01-25
  • 1970-01-01
  • 2023-03-22
  • 2017-06-04
相关资源
最近更新 更多