【发布时间】:2022-01-22 07:41:03
【问题描述】:
以下是我的代码,我如何将 withcolumn 的最佳代码编写为可重用代码,因为条件非常相似,我也可能在其他数据帧中使用 withcolumn。
val sample = df1.alias("a").join(df2.alias("b")
,Seq("ReportingSetID","ProductLine","PlanID","ReportType","NumeratorID","MemberID","ServDate"))
.withColumn("IsNumeratorSupplemental",
when (col("a.IsNumerator")===1 && col("b.IsNumerator")===0,1).otherwise(0))
.withColumn("IsExclusionSupplemental",
when (col("a.IsExclusion")===1 && col("b.IsExclusion")===0,1).otherwise(0))
.withColumn("IsSubExclusionSupplemental",
when (col("a.IsSubExclusion")===1 && col("b.IsSubExclusion")===0,1).otherwise(0))
.withColumn("IsRequiredExclusionSupplemental",
when (col("a.IsRequiredExclusion")===1 && col("b.IsRequiredExclusion")===0,1).otherwise(0))
.filter(!col("b.MeasureID").isin("44","70"))
【问题讨论】:
标签: dataframe scala function apache-spark reusability