【发布时间】:2015-09-15 22:57:36
【问题描述】:
我是 Spark 和 Scala 的新手。
我有一个 org.apache.spark.rdd.RDD[Array[String]] 类型的 RDD。
这是来自myRdd.take(3) 的列表。
Array(Array(1, 2524474, CBSGPRS, 1, 2015-09-09 10:42:03, 0, 47880, 302001131103734, NAT, "", 502161081073570, "", BLANK, UNK, "", "", "", MV_PVC, BLANK, 1, "", 0, 475078439, 41131;0;0, "", 102651;0;0, 3|3), Array(2, 2524516, CBSGPRS, 1, 2015-09-09 23:42:14, 0, 1260, 302001131104272, NAT, "", 502161081074085, "", BLANK, UNK, "", "", "", MV_PVC, BLANK, 1, "", 0, 2044745984, 3652;0;0, "", 8636;0;0, 3|3), Array(3, 2524545, CBSGPRS, 1, 2015-09-09 14:56:55, 0, 32886, 302001131101629, NAT, "", 502161081071599, "", BLANK, UNK, "", "", "", MV_PVC, BLANK, 1, "", 0, 1956194307, 14164657;0;0, "", 18231194;0;0, 3|3))
我正在尝试将其映射如下..
var gprsMap = frows.collect().map{ tuple =>
// bind variables to the tuple
var (recKey, origRecKey, recTypeId, durSpanId, timestamp, prevConvDur, convDur,
msisdn, callType, aPtyCellId, aPtyImsi, aPtyMsrn, bPtyNbr, bPtyNbrTypeId,
bPtyCellId, bPtyImsi, bPtyMsrn, inTrgId, outTrgId, callStatusId, suppSvcId, provChgAmt,
genFld1, genFld2, genFld3, genFld4, genFld5) = tuple
var dtm = timestamp.split(" ");
var idx = timestamp indexOf ' '
var dt = timestamp slice(0, idx)
var tm = timestamp slice(idx + 1, timestamp.length)
// return the results tuple
((dtm(0), msisdn, callType, recTypeId, provChgAmt), (convDur))
}
我不断收到错误:
错误:对象 Tuple27 不是包 scala 的成员。
我不确定错误是什么。有人可以帮忙吗?
【问题讨论】:
标签: scala mapreduce apache-spark