【问题标题】:How to parse the JSON output of a running command?如何解析正在运行的命令的 JSON 输出?
【发布时间】:2018-03-08 16:17:17
【问题描述】:

总结:我想在输出时解析tshark的JSON输出。

到目前为止,我正在逐行解析正常输出,并且每一行都有完整的信息。因此,这是一个问题

p = subprocess.Popen("/usr/bin/tshark", stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
     for line in p.stdout:
         event = decode_event(line)

tshark 也可以通过-T json 开关输出漂亮打印的JSON(我只是给出第一个数据包,输出是一个列表):

[
  {
    "_index": "packets-2018-03-08",
    "_type": "pcap_file",
    "_score": null,
    "_source": {
      "layers": {
        "frame": {
          "frame.interface_id": "0",
          "frame.encap_type": "1",
          "frame.time": "Mar  8, 2018 16:17:20.478658037 CET",
          "frame.offset_shift": "0.000000000",
          "frame.time_epoch": "1520522240.478658037",
          "frame.time_delta": "0.000113952",
          "frame.time_delta_displayed": "0.000113952",
          "frame.time_relative": "3.351515496",
          "frame.number": "11133",
          "frame.len": "60",
          "frame.cap_len": "60",
          "frame.marked": "0",
          "frame.ignored": "0",
          "frame.protocols": "eth:ethertype:ip:tcp"
        },
        "eth": {
          "eth.dst": "00:50:56:bb:40:70",
          "eth.dst_tree": {
            "eth.dst_resolved": "Vmware_bb:40:70",
            "eth.addr": "00:50:56:bb:40:70",
            "eth.addr_resolved": "Vmware_bb:40:70",
            "eth.lg": "0",
            "eth.ig": "0"
          },
          "eth.src": "64:a0:e7:42:af:41",
          "eth.src_tree": {
            "eth.src_resolved": "Cisco_42:af:41",
            "eth.addr": "64:a0:e7:42:af:41",
            "eth.addr_resolved": "Cisco_42:af:41",
            "eth.lg": "0",
            "eth.ig": "0"
          },
          "eth.type": "0x00000800",
          "eth.padding": "00:00:00:00:00:00"
        },
        "ip": {
          "ip.version": "4",
          "ip.hdr_len": "20",
          "ip.dsfield": "0x00000000",
          "ip.dsfield_tree": {
            "ip.dsfield.dscp": "0",
            "ip.dsfield.ecn": "0"
          },
          "ip.len": "40",
          "ip.id": "0x00005a57",
          "ip.flags": "0x00000002",
          "ip.flags_tree": {
            "ip.flags.rb": "0",
            "ip.flags.df": "1",
            "ip.flags.mf": "0"
          },
          "ip.frag_offset": "0",
          "ip.ttl": "125",
          "ip.proto": "6",
          "ip.checksum": "0x0000dd25",
          "ip.checksum.status": "2",
          "ip.src": "10.237.78.2",
          "ip.addr": "10.237.78.2",
          "ip.src_host": "10.237.78.2",
          "ip.host": "10.237.78.2",
          "ip.dst": "10.81.99.19",
          "ip.addr": "10.81.99.19",
          "ip.dst_host": "10.81.99.19",
          "ip.host": "10.81.99.19",
          "Source GeoIP: Unknown": "",
          "Destination GeoIP: Unknown": ""
        },
        "tcp": {
          "tcp.srcport": "31316",
          "tcp.dstport": "22",
          "tcp.port": "31316",
          "tcp.port": "22",
          "tcp.stream": "0",
          "tcp.len": "0",
          "tcp.seq": "3025",
          "tcp.ack": "774293",
          "tcp.hdr_len": "20",
          "tcp.flags": "0x00000010",
          "tcp.flags_tree": {
            "tcp.flags.res": "0",
            "tcp.flags.ns": "0",
            "tcp.flags.cwr": "0",
            "tcp.flags.ecn": "0",
            "tcp.flags.urg": "0",
            "tcp.flags.ack": "1",
            "tcp.flags.push": "0",
            "tcp.flags.reset": "0",
            "tcp.flags.syn": "0",
            "tcp.flags.fin": "0",
            "tcp.flags.str": "\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7A\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7\u00c2\u00b7"
          },
          "tcp.window_size_value": "2047",
          "tcp.window_size": "2047",
          "tcp.window_size_scalefactor": "-1",
          "tcp.checksum": "0x000073f4",
          "tcp.checksum.status": "2",
          "tcp.urgent_pointer": "0",
          "tcp.analysis": {
            "tcp.analysis.acks_frame": "11126",
            "tcp.analysis.ack_rtt": "0.000426928"
          }
        }
      }
    }
  },
  <next packet>

解析这样的流的正确方法是什么?

在搜索流解析时,我发现了一些库(特别是 NAYA),但它们需要像对象这样的文件。

似乎StringIO() 比较合适,但我不知道如何将它与stdout 联系起来?


根据@omu_negru 请求,特别是在NAYA 的情况下,直接附加stdout,如

import naya
import subprocess

def handle_message(event):
    print(event)

cmd = "/usr/bin/tshark -i eth0 -T json"
proc = subprocess.Popen(cmd, bufsize=0, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
messages = naya.stream_array(proc.stdout)
for message in messages:
    handle_message(message)

引发异常

Traceback (most recent call last):
  File "/root/dev/readtshark.py", line 12, in <module>
    for message in messages:
  File "/usr/local/lib/python3.5/dist-packages/naya/json.py", line 544, in stream_array
    token_type, token = next(token_stream)
ValueError: too many values to unpack (expected 2)

【问题讨论】:

  • 你真的不想将它连接到标准输出吗?
  • 使用jq 比 Python 快,只需将这个的输出通过管道传输到那个
  • @omu_negru:是的,当然,已更正。谢谢。
  • 在这种情况下,stdout 是一个类似对象的文件,你应该能够将它传递给 api....
  • @eagle:解码后的数据包会发生很多事情,这只是开始,所以它需要一个 Python 脚本

标签: python json python-3.x parsing stream


【解决方案1】:

实际工作版本

#!/usr/bin/python3
# tshark.py
import json, sys, time

output = sys.stdin
acc = '{'

def skip(output):
    while True:
        l = output.readline()
        if l.strip() != '{':
            continue
        else:
            break


skip(output)
print("starting")
while True:
    l = output.readline()
    if l.strip() != '':
        acc += l.strip()
    try:
        o = json.loads(acc)
        print(o)
        skip(output)
        acc = '{'
    except:
        pass

sudo tshark -i wlp3s0 -T json | ./tshark.py一起发布

【讨论】:

  • 谢谢 - 你的回答启发了我另一个解决方案(也作为答案发布)
【解决方案2】:

@omu_negru 的回答给了我一个想法,我最终使用了下面的解决方案。

这基本上是对 JSON 进行解码的持续尝试,一旦解码,它就是我进一步处理的事件(这里,仅打印)

import subprocess
import json


def handle_message(event):
    print(event)

cmd = "/usr/bin/tshark -n -T json not broadcast and not multicast"
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
# skip first lines, until the [ which starts JSON
for line in proc.stdout:
    if line.decode().startswith('['):
        break
    else:
        continue

buffer = ""
for line in proc.stdout:
    # remove empty and "connection" lines (a comma)
    if not line.decode().strip(', \n'):
        continue
    buffer += line.decode('utf-8')
    try:
        event = json.loads(buffer)
    except json.decoder.JSONDecodeError:
        pass
    else:
        print(event)
        buffer = ""

【讨论】:

    猜你喜欢
    • 2011-07-19
    • 1970-01-01
    • 1970-01-01
    • 2018-10-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-09-01
    相关资源
    最近更新 更多