【问题标题】:Unable to ingest data from flume to hdfs hadoop for logs无法将数据从水槽摄取到 hdfs hadoop 以获取日志
【发布时间】:2015-04-05 10:22:22
【问题描述】:

我正在使用以下配置将数据从日志文件推送到 hdfs。

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity=5000
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /home/training/Downloads/log.txt
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.batchSize=10
agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/user/flume/data/log.txt
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

我没有收到错误消息,但我仍然无法在 hdfs 中找到输出。 在中断时,我可以看到接收器中断异常和该日志文件的一些数据。 我正在运行以下命令:

flume-ng agent --conf /etc/flume-ng/conf/ --conf-file /etc/flume-ng/conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent;

【问题讨论】:

    标签: apache hadoop hdfs flume


    【解决方案1】:

    我遇到了类似的问题。就我而言,现在它正在工作。下面是conf文件:

    #Exec Source
    execAgent.sources=e
    execAgent.channels=memchannel
    execAgent.sinks=HDFS
    #channels
    execAgent.channels.memchannel.type=file
    execAgent.channels.memchannel.capacity = 20000
    execAgent.channels.memchannel.transactionCapacity = 1000
    #Define Source
    execAgent.sources.e.type=org.apache.flume.source.ExecSource
    execAgent.sources.e.channels=memchannel
    execAgent.sources.e.shell=/bin/bash -c
    execAgent.sources.e.fileHeader=false
    execAgent.sources.e.fileSuffix=.txt
    execAgent.sources.e.command=cat /home/sample.txt
    #Define Sink
    execAgent.sinks.HDFS.type=hdfs
    execAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/user/flume/
    execAgent.sinks.HDFS.hdfs.fileType=DataStream
    execAgent.sinks.HDFS.hdfs.writeFormat=Text
    execAgent.sinks.HDFS.hdfs.batchSize=1000
    execAgent.sinks.HDFS.hdfs.rollSize=268435
    execAgent.sinks.HDFS.hdfs.rollInterval=0
    #Bind Source Sink Channel
    execAgent.sources.e.channels=memchannel
    execAgent.sinks.HDFS.channel=memchannel
    

    【讨论】:

      【解决方案2】:

      我建议在HDFS中放置文件时使用前缀配置:

      agent.sinks.hdfs-sink.hdfs.filePrefix = log.out

      【讨论】:

        【解决方案3】:

        @bhavesh - 你确定,日志文件(agent.sources.tail-source.command = tail -F /home/training/Downloads/log.txt)一直在追加数据吗?由于您使用了带有 -F 的 Tail 命令,因此只有更改的数据(文件内)才会转储到 HDFS 中

        【讨论】:

        • 你没有先理解我的问题..现在故事太老了。
        猜你喜欢
        • 1970-01-01
        • 2018-04-13
        • 1970-01-01
        • 1970-01-01
        • 2020-10-29
        • 1970-01-01
        • 1970-01-01
        • 2017-03-31
        • 2015-08-23
        相关资源
        最近更新 更多