【问题标题】:flume load csv files excels to hdfs sink水槽加载 csv 文件优于 hdfs 接收器
【发布时间】:2017-03-25 00:38:16
【问题描述】:

我已将我的 Flume 源配置为 Spooldir 类型。我有很多 CSV 文件,.xl3 和 .xls,我希望我的 Flume 代理将所有文件从 spooldir 加载到 HDFS 接收器。然而水槽代理返回异常

这是我对水槽源的配置:

agent.sources.s1.type = spooldir
agent.sources.s1.spoolDir = /my-directory
agent.sources.s1.basenameHeader = true
agent.sources.batchSize = 10000

还有我的 HDFS 接收器:

agent.sinks.sk1.type = hdfs 
agent.sinks.sk1.hdfs.path = hdfs://...:8020/user/importflume/%Y/%m/%d/%H 
agent.sinks.sk1.hdfs.filePrefix = %{basename}
agent.sinks.sk1.hdfs.rollSize = 0
agent.sinks.sk1.hdfs.rollCount = 0
agent.sinks.sk1.hdfs.useLocalTimeStamp = true
agent.sinks.sk1.hdfs.batchsize =    10000
agent.sinks.sk1.hdfs.fileType = DataStream
agent.sinks.sk1.serializer = avro_event
agent.sinks.sk1.serializer.compressionCodec = snappy

【问题讨论】:

    标签: excel csv hadoop hdfs flume


    【解决方案1】:

    您可以将以下配置用于假脱机目录。只需在以下配置中提供本地文件系统和 HDFS 位置的路径即可。

    #Flume Configuration Starts
    # Define a file channel called fileChannel on agent1
    agent1.channels.fileChannel1_1.type = file 
    # on linux FS
    agent1.channels.fileChannel1_1.capacity = 200000
    agent1.channels.fileChannel1_1.transactionCapacity = 1000
    # Define a source for agent1
    agent1.sources.source1_1.type = spooldir
    # on linux FS
    #Spooldir in my case is /home/hadoop/Desktop/flume_sink
    agent1.sources.source1_1.spoolDir = 'path'
    agent1.sources.source1_1.fileHeader = false
    agent1.sources.source1_1.fileSuffix = .COMPLETED
    agent1.sinks.hdfs-sink1_1.type = hdfs
    
    #Sink is /flume_import under hdfs
    
    agent1.sinks.hdfs-sink1_1.hdfs.path = hdfs://'path'
    agent1.sinks.hdfs-sink1_1.hdfs.batchSize = 1000
    agent1.sinks.hdfs-sink1_1.hdfs.rollSize = 268435456
    agent1.sinks.hdfs-sink1_1.hdfs.rollInterval = 0
    agent1.sinks.hdfs-sink1_1.hdfs.rollCount = 50000000
    agent1.sinks.hdfs-sink1_1.hdfs.writeFormat=Text
    
    agent1.sinks.hdfs-sink1_1.hdfs.fileType = DataStream
    agent1.sources.source1_1.channels = fileChannel1_1
    agent1.sinks.hdfs-sink1_1.channel = fileChannel1_1
    
    agent1.sinks =  hdfs-sink1_1
    agent1.sources = source1_1
    agent1.channels = fileChannel1_1
    

    您也可以参考 Flume spool 目录上的 this blog 了解更多信息。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-09-07
      • 2018-11-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多