【问题标题】:Packet count in hadoop ( with Mapreduce )hadoop 中的数据包计数(使用 Mapreduce)
【发布时间】:2015-03-30 09:02:24
【问题描述】:

事情已经完成:


从以下链接安装 Hadoop:

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_4_4.html


安装了 Hping3 以生成洪水请求:

sudo hping3 -c 10000 -d 120 -S -w 64 -p 8000 --flood --rand-source 192.168.1.12

安装了 snort 以记录上述请求:

sudo snort -ved -h 192.168.1.0/24 -l .

这会生成日志文件 snort.log.1427021231

我可以用它来阅读它

sudo snort -r snort.log.1427021231

给出表单的输出:

=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= +=+=+=+=+=+=+=+=+=+=+=+=+=+

03/22-16:17:14.259633 192.168.1.12:8000 -> 117.247.194.105:46639 TCP TTL:64 TOS:0x0 ID:0 IpLen:20 DgmLen:44 DF AS Seq: 0x6EEE4A6B Ack: 0x6DF6015B Win: 0x7210 TcpLen: 24 TCP 选项 (1) => MSS: 1460 =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ =+=+=+=+=+=+=+=+=+=+=+=+


我用过

hdfs dfs -put <localsrc> ... <dst>

将此日志文件复制到 HDFS。

现在,我需要帮助:

如何统计日志文件中源IP地址、目的IP地址、端口地址、协议、时间戳的总数。

(我必须编写自己的 Map reduce 程序吗?或者有一个库。)


我也找到了

https://github.com/ssallys/p3

但无法使其运行。查看了 JAR 文件的内容,但无法运行它。

ratan@lenovo:~/Desktop$ hadoop jar ./p3lite.jar p3.pcap.examples.PacketCount

Exception in thread "main" java.lang.ClassNotFoundException:        nflow.runner.Runner
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:201)

谢谢。

【问题讨论】:

    标签: hadoop mapreduce packet-capture snort hping


    【解决方案1】:

    快速搜索后,您可能需要自定义 MapReduce 作业。

    该算法类似于以下伪代码:

    Parse the file line by line (or parse every n lines if logs are more than one line long).
    
    in the mapper, use regex to figure out if something is a source IP, destination IP etc.
    
    output these with key value structure of <Type, count> 
        type is the type of text that was matched (ex. source IP)
        count is the number of times it was matched in the record
    
    have reducer sum all of the values from the mappers, and get global totals for each type of information you want
    
    write to file in desired format.
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多