【问题标题】:Flume agent: add host to message, then publish to a kafka topicFlume 代理:将主机添加到消息中,然后发布到 kafka 主题
【发布时间】:2015-11-16 22:48:34
【问题描述】:

我们开始通过将消息发布到 Kafka 主题来整合来自应用程序的事件日志数据。尽管我们可以直接从应用程序写入 Kafka,但我们选择将其视为通用问题并使用 Flume 代理。这提供了一些灵活性:如果我们想从服务器捕获其他内容,我们可以尾随不同的源并发布到不同的 Kafka 主题。

我们创建了一个 Flume 代理配置文件来跟踪日志并发布到 Kafka 主题:

tier1.sources  = source1
tier1.channels = channel1
tier1.sinks = sink1

tier1.sources.source1.type = exec
tier1.sources.source1.command = tail -F /var/log/some_log.log
tier1.sources.source1.channels = channel1

tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000

tier1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink1.topic = some_log
tier1.sinks.sink1.brokerList = hadoop01:9092,hadoop02.com:9092,hadoop03.com:9092
tier1.sinks.sink1.channel = channel1
tier1.sinks.sink1.batchSize = 20

不幸的是,消息本身并没有指定生成它们的主机。如果我们有一个应用程序在多个主机上运行并且发生错误,我们无法确定是哪个主机生成了该消息。

我注意到,如果 Flume 直接写入 HDFS,我们可以use a Flume interceptor 写入特定的 HDFS 位置。尽管我们可能可以对 Kafka 做类似的事情,即为每个服务器创建一个新主题,但这可能会变得笨拙。我们最终会得到数千个主题。

Flume 在发布到 Kafka 主题时可以附加/包含原始主机的主机名吗?

【问题讨论】:

    标签: hadoop apache-kafka flume flume-ng


    【解决方案1】:

    您可以创建一个自定义 TCP 源,它读取客户端地址并将其添加到标头中。

    @Override
        public void configure(Context context) {
            port = context.getInteger("port");
            buffer = context.getInteger("buffer");
    
            try{
                serverSocket = new ServerSocket(port);
                logger.info("FlumeTCP source initialized");
            }catch(Exception e) {
                logger.error("FlumeTCP source failed to initialize");
            }
        }
    
    @Override
        public void start() {
            try {
                clientSocket = serverSocket.accept();
                receiveBuffer = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
                logger.info("Connection established with client : " + clientSocket.getRemoteSocketAddress());
                final ChannelProcessor channel = getChannelProcessor();
                final Map<String, String> headers = new HashMap<String, String>();
                headers.put("hostname", clientSocket.getRemoteSocketAddress().toString());
                String line = "";
                List<Event> events = new ArrayList<Event>();
    
                while ((line = receiveBuffer.readLine()) != null) {
                    Event event = EventBuilder.withBody(
                            line, Charset.defaultCharset(),headers);
    
                    logger.info("Event created");
                    events.add(event);
                    if (events.size() == buffer) {
                        channel.processEventBatch(events);
                    }
                }
            } catch (Exception e) {
    
            }
            super.start();
        }
    

    flume-conf.properties 可以配置为:

    # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  You may obtain a copy of the License at
    #
    #  http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing,
    # software distributed under the License is distributed on an
    # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    # KIND, either express or implied.  See the License for the
    # specific language governing permissions and limitations
    # under the License.
    
    
    # The configuration file needs to define the sources, 
    # the channels and the sinks.
    # Sources, channels and sinks are defined per agent, 
    # in this case called 'agent'
    
    agent.sources = CustomTcpSource
    agent.channels = memoryChannel
    agent.sinks = loggerSink
    
    # For each one of the sources, the type is defined
    agent.sources.CustomTcpSource.type = com.vishnu.flume.source.CustomFlumeTCPSource
    agent.sources.CustomTcpSource.port = 4443
    agent.sources.CustomTcpSource.buffer = 1
    
    
    # The channel can be defined as follows.
    agent.sources.CustomTcpSource.channels = memoryChannel
    
    # Each sink's type must be defined
    agent.sinks.loggerSink.type = logger
    
    #Specify the channel the sink should use
    agent.sinks.loggerSink.channel = memoryChannel
    
    # Each channel's type is defined.
    agent.channels.memoryChannel.type = memory
    
    # Other config values specific to each type of channel(sink or source)
    # can be defined as well
    # In this case, it specifies the capacity of the memory channel
    agent.channels.memoryChannel.capacity = 100
    

    我发送了一条测试消息来测试它,它看起来像:

    Event: { headers:{hostname=/127.0.0.1:50999} body: 74 65 73 74 20 6D 65 73 73 61 67 65             test message }
    

    我已将项目上传到我的github

    【讨论】:

      【解决方案2】:

      如果您使用的是 exec 源,则没有什么可以阻止您运行智能命令将主机名添加到日志文件内容的前缀中。

      注意:如果命令使用管道之类的东西,您还需要像这样指定外壳:

      tier1.sources.source1.type = exec
      tier1.sources.source1.shell = /bin/sh -c
      tier1.sources.source1.command =  tail -F /var/log/auth.log | sed --unbuffered "s/^/$(hostname) /"
      

      消息如下所示:

      frb.hi.inet 2015-11-17 08:39:39.432 INFO [...]
      

      ...frb.hi.inet我们是我的主机名。

      【讨论】:

        猜你喜欢
        • 2016-01-17
        • 1970-01-01
        • 2020-01-18
        • 2018-10-27
        • 2017-03-05
        • 2018-06-20
        • 2020-06-24
        • 2013-02-26
        • 1970-01-01
        相关资源
        最近更新 更多