【问题标题】:Flume agent - using tail -FFlume 代理 - 使用 tail -F
【发布时间】:2013-05-28 10:30:47
【问题描述】:

我是 Apache Flume 的新手。 我创建了我的代理,例如:

agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1

agent.sources.exec-source.type=exec
agent.sources.exec-source.command=tail -F /var/log/apache2/access.log

agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.hdfs.path=hdfs://<Host-Name of name node>/
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess

agent.channels.ch1.type=memory
agent.channels.ch1.capacity=1000

agent.sources.exec-source.channels=ch1
agent.sinks.hdfs-sink.channel=ch1

我得到的输出是:

13/01/22 17:31:48 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1
13/01/22 17:31:48 INFO node.FlumeNode: Flume node starting - agent
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting
13/01/22 17:31:48 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 9
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:conf/flume_exec.conf
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Added sinks: hdfs-sink Agent: agent
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/01/22 17:31:48 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration  for agents: [agent]
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: Creating channels
13/01/22 17:31:48 INFO properties.PropertiesFileConfigurationProvider: created channel ch1
13/01/22 17:31:48 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs-sink, type: hdfs
13/01/22 17:31:48 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{exec-source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec-source,state:IDLE} }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@715d44 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} }
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel ch1
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: ch1, registered successfully.
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: ch1 started
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink hdfs-sink
13/01/22 17:31:48 INFO nodemanager.DefaultLogicalNodeManager: Starting Source exec-source
13/01/22 17:31:48 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: hdfs-sink, registered successfully.
13/01/22 17:31:48 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started

但它不会将日志写入 HDFS。

当我运行 cat /var/log/apache2/access.log 而不是 tail –F /var/log/apache2/access.log 时,它会运行并且我的文件是在 HDFS 上创建的。

请帮帮我。

【问题讨论】:

标签: flume


【解决方案1】:

"tail -F" 默认情况下仅在开始时打印文件的最后 10 行。看起来 10 行不足以填满 HDFS 块,所以你看不到 Flume 写的任何东西。 你可以:

  • 尝试“tail -n $X -F”在开始时打印最后 X 行(X 的值将根据 HDFS 设置中的块大小而有所不同)
  • 等到 access.log 在 Flume 运行时增长到足够大(同样,等待时间取决于块的大小和 access.log 增长的速率;我认为在生产模式下它会足够快)李>
  • 在您的 flume.conf 中添加以下行。无论写入数据的大小如何(假设它不为零),它将强制 Flume 每 10 秒滚动一次新文件:

    agent.sinks.hdfs-sink.hdfs.rollInterval = 10

    agent.sinks.hdfs_sink.hdfs.rollSize = 0

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-08-03
    • 2017-05-10
    相关资源
    最近更新 更多