【发布时间】:2019-04-28 00:50:51
【问题描述】:
Dstream twitter 示例 -- flatmap twitter_id 和文本
Scala 和 Spark Streaming 的新手。尝试扩展示例 twitter 流代码以将推文拆分为单词,但保持这些单词与 twitter id 连接。
setupLogging()
val tweets = TwitterUtils.createStream(ssc, None)
val statuses = tweets.map(status => status.getText())
val tweetwords = statuses.flatmap((tweetText => tweetText.split(" ")
tweetwords.print
//get running list of words from tweets.
This
is
my
tweet
"#mytweet"
//instead want the same list with an twitter_id attached
val statuses = tweetmap{status => (status.getUser().getID(), status.getText())}
val tweetwords = statuses.flatmap( ????? This is where I am lost )
//this is what I want
tweetwords.print
1523523, This
1523523, is
1523523, my
1523523, tweet
1523523, #mytweet
我对其他方法持开放态度,包括数据帧/数据集.. 谢谢!
【问题讨论】:
标签: apache-spark-sql spark-streaming