【发布时间】:2017-02-10 20:53:08
【问题描述】:
我有一个包含此类数据的数据框:
unit,sensitivity currency,trading desk ,portfolio ,issuer ,bucket ,underlying ,delta ,converted sensitivity
ES ,USD ,EQ DERIVATIVES,ESEQRED_LH_MIDX ,5GOY ,5 ,repo ,0.00002 ,0.00002
ES ,USD ,EQ DERIVATIVES,IND_GLOBAL1 ,no_localizado ,8 ,repo ,-0.16962 ,-0.15198
ES ,EUR ,EQ DERIVATIVES,ESEQ_UKFLOWN ,IGN2 ,8 ,repo ,-0.00253 ,-0.00253
ES ,USD ,EQ DERIVATIVES,BASKETS1 ,9YFV ,5 ,spot ,-1003.64501 ,-899.24586
我必须对这些数据进行聚合操作,如下所示:
val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'")
.groupBy("unit","trading desk","portfolio","issuer","bucket","underlying")
.agg(sum("converted_sensitivity"))
但是我发现我在聚合总和上失去了精度,所以我如何确定“converted_sensitivity”的每个值都转换为 BigDecimal(25,5),然后再对新的聚合进行求和运算列?
非常感谢。
【问题讨论】:
-
您可以先进行映射操作来计算列的 BigDecimal 版本,然后在下一个操作中将它们相加。我想这会介于 .groupBy 和 .agg 之间
标签: scala apache-spark spark-dataframe