【问题标题】:Exception follows. org.apache.flume.FlumeException: Unable to load source type in flume twitter analysis例外如下。 org.apache.flume.FlumeException:无法在flume twitter分析中加载源类型
【发布时间】:2015-06-24 05:56:38
【问题描述】:

我正在尝试使用 Flume 和 Hive 进行 Twitter 分析。为了从 twitter 获取推文,我在 flume.conf 文件中设置了所有必需的参数(consumerKey、consumerSecret、accessToken 和 accessTokenSecret)。

TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS   TwitterAgent.sources.Twitter.type =
com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret 
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics,
bigdata, cloudera, data science, data scientiest, business
intelligence, mapreduce, data warehouse, data warehousing, mahout,
hbase, nosql, newsql, businessintelligence, cloudcomputing  
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path
= hdfs://localhost:9000/user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000  
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

我已经使用 bash rc 设置了水槽 tar 球和水槽源快照 jar 文件的类路径。

export FLUME_HOME=/home/students/apache-flume-1.4.0-bin 
export FLUME_SRC=/home/students/flume-sources-1.0-SNAPSHOT.jar 
export PATH=$FLUME_HOME/bin:$FLUME_SRC/bin:$PATH

当我运行水槽代理时

flume-ng agent --conf-file twitter_flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent

我可以看到下面的日志跟踪,没有任何反应

15/06/23 23:41:55 INFO source.DefaultSourceFactory: Creating instance
of source Twitter, type com.cloudera.flume.source.TwitterSource
15/06/23 23:41:55 ERROR
node.PollingPropertiesFileConfigurationProvider: Failed to load
configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load source type:
com.cloudera.flume.source.TwitterSource, class:
com.cloudera.flume.source.TwitterSource     at
org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:67)
    at
org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:40)
    at
org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
    at
org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
    at
org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744) Caused by:
java.lang.ClassNotFoundException:
com.cloudera.flume.source.TwitterSource     at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)  at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)  at
java.security.AccessController.doPrivileged(Native Method)  at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)  at
java.lang.ClassLoader.loadClass(ClassLoader.java:425)   at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)   at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)   at
java.lang.Class.forName0(Native Method)     at
java.lang.Class.forName(Class.java:190)     at
org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:65)
    ... 11 more

我可以知道为什么我已经设置了水槽 source.jar 时会抛出这个错误。请帮我解决这个问题。

【问题讨论】:

标签: linux hadoop hive flume flume-twitter


【解决方案1】:

您没有设置类路径,而是设置了 PATH(用于查找可执行二进制文件,而不是 Java .jar 文件)。

您可以在 Flume conf 目录下的 flume-env.sh 文件中设置 FLUME_CLASSPATH 变量;或在命令行中添加-classpath &lt;path/to/the/jar&gt; 选项。

【讨论】:

    【解决方案2】:

    请在 Cloudera 上找到 Flume Twitter 设置:

    1.这是文件/usr/lib/flume-ng/conf/flume.conf

    TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS
    TwitterAgent.sources.Twitter.type= com.cloudera.flume.source.TwitterSource
    TwitterAgent.sources.Twitter.channels = MemChannel
    
    TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
    TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxx
    TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxx
    TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxx
    
    TwitterAgent.sources.Twitter.keywords = Hadoop,BigData  
    TwitterAgent.sinks.HDFS.channel = MemChannel 
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/cloudera/flume/tweets/
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100
    

    2.将下面的flume-env.sh.template文件重命名为flume-env.sh

    ~]$ sudo cp /usr/lib/flume-ng/conf/flume-env.sh.template /usr/lib/flume-ng/conf/flume-env.sh

    3.flume-env.sh 文件中的 JAVA_HOMEFLUME_CLASSPATH 设置为:

    导出 JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

    FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"

    4. 如果您在系统上没有找到 "/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"然后从 google 下载 apache-flume-1.6.0-bin 并将其 lib 文件夹复制到当前 lib 文件夹。

    确保 flume-sources-1.0-SNAPSHOT.jar 文件应该在 lib 文件夹中可用。

    4.1.重命名旧的lib文件夹

    4.2. 下载并放到 cloudera 桌面上并执行以下操作:

    ~]$ sudo mv /usr/lib/flume-ng/lib /usr/lib/flume-ng/lib_cloudera

    ~]$ sudo mv /home/cloudera/Desktop/apache-flume-1.6.0-bin/lib /usr/lib/flume-ng/lib

    5. 现在运行 Flume Agent 命令:

    ~]$ flume-ng agent --conf-file /usr/lib/flume-ng/conf/flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent

    这应该会成功运行。 万事如意。

    【讨论】:

      【解决方案3】:

      我认为com.cloudera.flume.source.TwitterSource 不再工作。试试org.apache.flume.source.twitter.TwitterSource

      【讨论】:

        【解决方案4】:

        抱歉,它确实有效,但请确保您的水槽/lib 中有所有 jar。按照以下步骤操作:http://bigdatanalysis.blogspot.com.es/2014/02/collecting-tweets-in-hadoop-using-flume.html

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2016-05-20
          • 1970-01-01
          • 1970-01-01
          • 2021-05-08
          • 1970-01-01
          相关资源
          最近更新 更多