【问题标题】:Flume and HDFS Integration ,HDFS IO errorFlume 和 HDFS 集成,HDFS IO 错误
【发布时间】:2015-03-17 19:38:20
【问题描述】:

我正在尝试将 FLUME 与 HDFS 集成,而我的 FLUME 配置文件是

hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel

hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111

hdfs-agent.sinks.hdfs-write.type = FILE_ROLL
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:50020/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel.

我的核心站点文件是

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost</value>
    </property>
</configuration>

当我尝试运行水槽代理时,它正在启动并且能够从 nc 命令读取,但是在写入 hdfs 时,我得到了以下异常。我尝试使用hadoop dfsadmin -safemode leave 以安全模式启动,但我仍然遇到以下异常。

2014-02-14 10:31:53,785 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating hdfs://127.0.0.1:50020/user/oracle/flume/FlumeData.1392354113707.tmp
2014-02-14 10:31:54,011 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.io.IOException: Call to /127.0.0.1:50020 failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089)
        at org.apache.hadoop.ipc.Client.call(Client.java:1057)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1489)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1523)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1505)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
        at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
        at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
        at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
        at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:781)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:689)

如果在任何属性文件中配置了错误,请告诉我,以便它可以正常工作。

如果我为此使用了正确的端口,请告诉我

我的目标是整合flume和hadoop。 我有一个用于 hadoop 的单节点服务器设置

【问题讨论】:

    标签: hadoop hdfs flume


    【解决方案1】:

    您必须在 fs.default.name 中提供端口号

    例子:

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:9001</value>
        </property>
    </configuration>
    

    然后编辑 Flume 配置文件如下

    hdfs-agent.sources= netcat-collect
    hdfs-agent.sinks = hdfs-write
    hdfs-agent.channels= memoryChannel
    
    hdfs-agent.sources.netcat-collect.type = netcat
    hdfs-agent.sources.netcat-collect.bind = localhost
    hdfs-agent.sources.netcat-collect.port = 11111
    
    hdfs-agent.sinks.hdfs-write.type = hdfs
    hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume
    hdfs-agent.sinks.hdfs-write.rollInterval = 30
    hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
    hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream
    
    hdfs-agent.channels.memoryChannel.type = memory
    hdfs-agent.channels.memoryChannel.capacity=10000
    hdfs-agent.sources.netcat-collect.channels=memoryChannel
    hdfs-agent.sinks.hdfs-write.channel=memoryChannel
    

    变化:

    hdfs-agent.sinks.hdfs-write.type = hdfs(sink type as hdfs)
    hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume(port number) 
    hdfs-agent.sinks.hdfs-write.channel=memoryChannel(Removed the dot symbol after memoryChannel)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-11-27
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多