【问题标题】:Apache Flume not Proceeding with Twitter StreamApache Flume 不处理 Twitter 流
【发布时间】:2016-02-29 16:10:31
【问题描述】:

尝试在 HortonWorks 中获取 Apache Flume 推文。 (使用教程点参考)

Flume 配置正确。这是flume.conf

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <1bL3mTHJBheYNye8cE4vHKBZ8>
TwitterAgent.sources.Twitter.consumerSecret = <iO3f2GxrrRjtF88eA4AH6AHncz4VbmxxT22fHWzuxCLaejoxdD>
TwitterAgent.sources.Twitter.accessToken = <22976784986-nfj6qEkECeNfs3AeDLDCqtlMOCl9B1iHb8cgIF>
TwitterAgent.sources.Twitter.accessTokenSecret = <jnNPtmBxlGA8hQq5ZyxjCJLdyiKN97Xa1JTifpmp5BREf>

TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/usr/lib/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 1000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

在我运行水槽代理之后:

bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent

此时屏幕冻结。让我知道如何进行?还是由于沙盒,我应该等待很长时间? enter image description here

【问题讨论】:

    标签: apache hdfs flume flume-ng flume-twitter


    【解决方案1】:

    试试这个flume.conf

    #flume.conf for twitter
    
    TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS
    
    TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
    TwitterAgent.sources.Twitter.channels = MemChannel
    TwitterAgent.sources.Twitter.consumerKey = <required>
    TwitterAgent.sources.Twitter.consumerSecret = <required>
    TwitterAgent.sources.Twitter.accessToken = <required> 
    TwitterAgent.sources.Twitter.accessTokenSecret = <required> 
    TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
    
    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/usr/lib/flume/tweets
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
    
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100
    

    运行 Flume 代理的命令

    bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent

    如果您有疑问,可以查看官方页面: https://github.com/cloudera/cdh-twitter-example/blob/master/flume-sources/flume.conf

    如果您想学习使用 Flume 和 Spark 获取数据: https://www.dezyre.com//hadoop-tutorial/flume-hadoop-twitter-data-extraction

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多