【问题标题】:How do I group Cassandra rows in Scala如何在 Scala 中对 Cassandra 行进行分组
【发布时间】:2016-04-04 10:05:12
【问题描述】:

鉴于我有一些 SparkSql RDD 结果:

CassandraRow{location_id: 163169767097254, context: drinking beer}
CassandraRow{location_id: 376101312892, context: drinking beer}
CassandraRow{location_id: 218866401458875, context: drinking beer}
CassandraRow{location_id: 163169767097254, context: drinking beer}
CassandraRow{location_id: 103760882995742, context: drinking beer}
CassandraRow{location_id: 214680441881239, context: drinking beer}
CassandraRow{location_id: 376101312892, context: ice creams}
CassandraRow{location_id: 193809797319052, context: drinking beer}
CassandraRow{location_id: 106017852771295, context: drinking beer}
CassandraRow{location_id: 166686436690629, context: drinking beer}
CassandraRow{location_id: 203328349712668, context: drinking beer}
CassandraRow{location_id: 103760882995742, context: vacations}
CassandraRow{location_id: 203328349712668, context: drinking beer}
CassandraRow{location_id: 214680441881239, context: drinking beer}
CassandraRow{location_id: 214680441881239, context: drinking beer}
CassandraRow{location_id: 376101312892, context: drinking beer}
CassandraRow{location_id: 166686436690629, context: vacations}
CassandraRow{location_id: 218866401458875, context: ice creams}

我想将它们按location_id 分组以获得如下结果:

List(
  218866401458875 -> List(ice creams, drinking beer),
  166686436690629 -> List(vacations, drinking beer),
  376101312892 -> List(ice creams, drinking beer)
  // and so on
)

到目前为止,这是我的代码:

val mappedContext = context.map(c => (c.getLong("location_id"), c.getString("context")))
val grouped = mappedContext.groupBy(_._1).mapValues(_.map(_._2))
    grouped.foreach(k => {
    println(k)

})

但我得到了:

(203328349712668,[Ljava.lang.String;@682d0831)
(106017852771295,[Ljava.lang.String;@3ed36a4c)
(193809797319052,[Ljava.lang.String;@1649ca17)
(214680441881239,[Ljava.lang.String;@9c9d648)
(103760882995742,[Ljava.lang.String;@9253bc0)
(376101312892,[Ljava.lang.String;@2c01a1a2)
(166686436690629,[Ljava.lang.String;@74c0cf47)
(163169767097254,[Ljava.lang.String;@33fc3c01)
(218866401458875,[Ljava.lang.String;@13a7ea85)
(500767133335647,[Ljava.lang.String;@328a55e2)

【问题讨论】:

    标签: scala apache-spark cassandra apache-spark-sql


    【解决方案1】:

    你应该试试groupByKey(不是groupBy):

    context.map(c => c.getLong("location_id") -> c.getString("context")).groupByKey.foreach(println)
    

    【讨论】:

      猜你喜欢
      • 2018-06-21
      • 1970-01-01
      • 2016-11-09
      • 2020-04-03
      • 2010-11-11
      • 1970-01-01
      • 2019-11-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多