【问题标题】:Parsing JSON in python 2.7's inbuilt JSON parser [closed]在 python 2.7 的内置 JSON 解析器中解析 JSON [关闭]
【发布时间】:2020-01-18 15:51:35
【问题描述】:

我确信这个问题已经被问过一百万次了。我已经阅读了其他一些内容,并且正在努力寻找答案。

我正在大量查询 RIPE api,在 Debian 9 中使用以下 curl 命令:


file="servers-to-ripe.txt"
while IFS= read -r line
do
# Hostnames -> corresponding IPs
  dig=$(./ip_extrapolate2 $line| grep -v $resolving_server)
  curl --silent  "https://stat.ripe.net/data/address-space-usage/data.json?resource="$dig"&data=asn_name" >> servers.json
done <"$file"

这给了我一些 JSON 输出,与所述服务器的所有权有关。我最初使用 jq CLI 解析器,但无济于事。

因此导致我改用 Python 编写它。以下是列表中的前两个对象:

{
    "status": "ok", 
    "server_id": "app002", 
    "status_code": 200, 
    "version": "0.4", 
    "cached": false, 
    "see_also": [], 
    "time": "2020-01-18T02:44:39.610258", 
    "messages": [
        [
            "info", 
            "IP address (185.230.125.107) has been changed to the closest encompassing prefix/range (185.230.125.0/24) found in RIPE DB"
        ]
    ], 
    "data_call_status": "supported - connecting to ursa", 
    "process_time": 216, 
    "build_version": "2020.1.13.174", 
    "query_id": "20200118024439-c225c628-6317-430d-8244-64f805701675", 
    "data": {
        "assignments": [], 
        "query_time": "2020-01-16T00:00:00", 
        "ip_stats": [
            {
                "status": "LIR Free", 
                "ips": 256
            }
        ], 
        "resource": "185.230.125.0/24", 
        "allocations": [
            {
                "allocation": "185.230.124.0/22", 
                "status": "ALLOCATED PA", 
                "asn_name": "RO-M247EUROPE-OCT-20171108", 
                "assignments": 0
            }
        ]
    }
}{
    "status": "ok", 
    "server_id": "app018", 
    "status_code": 200, 
    "version": "0.4", 
    "cached": false, 
    "see_also": [], 
    "time": "2020-01-18T02:44:40.104775", 
    "messages": [
        [
            "info", 
            "IP address (45.9.249.67) has been changed to the closest encompassing prefix/range (45.9.249.0/24) found in RIPE DB"
        ]
    ], 
    "data_call_status": "supported - connecting to ursa", 
    "process_time": 180, 
    "build_version": "2020.1.13.174", 
    "query_id": "20200118024439-33ce2ee1-33a2-42c2-8d9e-acbc92996fe5", 
    "data": {
        "assignments": [
            {
                "status": "ASSIGNED PA", 
                "parent_allocation": "45.9.248.0/22", 
                "address_range": "45.9.249.0/24", 
                "asn_name": "M247-Dubai"
            }
        ], 
        "query_time": "2020-01-16T00:00:00", 
        "ip_stats": [
            {
                "status": "ASSIGNED PA", 
                "ips": 256
            }
        ], 
        "resource": "45.9.249.0/24", 
        "allocations": [
            {
                "allocation": "45.9.248.0/22", 
                "status": "ALLOCATED PA", 
                "asn_name": "RO-M247-APR1901-20190423", 
                "assignments": 1
            }
        ]
    }
}{

我试图只提取 asn_name 和 IP 范围。

我已经修改了 Python (2.7) 的内置 json 解析器。这是我尝试过的:

#!/usr/bin/python
import json

input_file = open ('servers.json')
json_array = json.load(input_file)
servers = []

for item in json_array:
  server_asn_name = {"asn":None, "resource":None}
  server_asn_name['asn'] = item['asn_name']
  server_asn_name['resource'] = item["resource"]
  servers.append(server_asn_name)

print(server_asn_name)

还有一些其他的,但这可能是我迄今为止得到的最接近的。任何建议将不胜感激:)

【问题讨论】:

  • 请注意,一旦您加载了 JSON,它就是常规的 dictlist 和/或原始类型。因此,您可以像在手动创建的列表或字典中一样查找数据。看看如何从 JSON 中提取数据是一个红鲱鱼。
  • 您能说明一下您使用的是哪个版本的 Python 吗?我也不确定这里到底是什么问题。
  • 这是 Python 2.7.13,在 Debian 9 上

标签: python json parsing


【解决方案1】:

你的json文件长这样,假设文件名是:servers.json

[
  {
    "status": "ok",
    "server_id": "app002",
    "status_code": 200,
    "version": "0.4",
    "cached": false,
    "see_also": [],
    "time": "2020-01-18T02:44:39.610258",
    "messages": [
      [
        "info",
        "IP address (185.230.125.107) has been changed to the closest encompassing prefix/range (185.230.125.0/24) found in RIPE DB"
      ]
    ],
    "data_call_status": "supported - connecting to ursa",
    "process_time": 216,
    "build_version": "2020.1.13.174",
    "query_id": "20200118024439-c225c628-6317-430d-8244-64f805701675",
    "data": {
      "assignments": [],
      "query_time": "2020-01-16T00:00:00",
      "ip_stats": [
        {
          "status": "LIR Free",
          "ips": 256
        }
      ],
      "resource": "185.230.125.0/24",
      "allocations": [
        {
          "allocation": "185.230.124.0/22",
          "status": "ALLOCATED PA",
          "asn_name": "RO-M247EUROPE-OCT-20171108",
          "assignments": 0
        }
      ]
    }
  },
  {
    "status": "ok",
    "server_id": "app018",
    "status_code": 200,
    "version": "0.4",
    "cached": false,
    "see_also": [],
    "time": "2020-01-18T02:44:40.104775",
    "messages": [
      [
        "info",
        "IP address (45.9.249.67) has been changed to the closest encompassing prefix/range (45.9.249.0/24) found in RIPE DB"
      ]
    ],
    "data_call_status": "supported - connecting to ursa",
    "process_time": 180,
    "build_version": "2020.1.13.174",
    "query_id": "20200118024439-33ce2ee1-33a2-42c2-8d9e-acbc92996fe5",
    "data": {
      "assignments": [
        {
          "status": "ASSIGNED PA",
          "parent_allocation": "45.9.248.0/22",
          "address_range": "45.9.249.0/24",
          "asn_name": "M247-Dubai"
        }
      ],
      "query_time": "2020-01-16T00:00:00",
      "ip_stats": [
        {
          "status": "ASSIGNED PA",
          "ips": 256
        }
      ],
      "resource": "45.9.249.0/24",
      "allocations": [
        {
          "allocation": "45.9.248.0/22",
          "status": "ALLOCATED PA",
          "asn_name": "RO-M247-APR1901-20190423",
          "assignments": 1
        }
      ]
    }
  }
]

创建一个名为servers_from_json的新函数,它以file_name为参数,该函数将返回一个服务器列表,其中只有您想要的ip和asn字段,如下所示:

import json


def servers_from_json(file_name):
    with open(file_name, 'r') as f:
        data = json.loads(f.read())
        servers = [{'asn': item['data']['resource'], 'resource': item['data']['allocations'][0]['asn_name']} for item in data]
        return servers


servers = servers_from_json('servers.json')
print(servers) # => [{'asn': '185.230.125.0/24', 'resource': 'RO-M247EUROPE-OCT-20171108'}, {'asn': '45.9.249.0/24', 'resource': 'RO-M247-APR1901-20190423'}]

应该给你正确的结果

【讨论】:

  • 请原谅我听起来很愚蠢——这是我第一次不得不处理 JSON 数据。通常只需在内部构建具有非标准化输出的所有内容。那么,当您说可以简化时,是否必须手动完成?我是否必须亲自筛选所有 5570 个条目,还是有一些非常明显的东西我错过了。
  • 我尝试直接向上滑动您提供的那条线,但我遇到了更多错误 - 最值得注意的是,引发 ValueError(errmsg("Extra data", s, end, len(s)) )
  • 对不起英语不是我的母语,“简化”这个词的意思是“你的 json 文件结构可以被认为是......”。答案也更新了。
  • 好的,这绝对是进步。所以,我刷了你提供的确切格式并且它有效。非常感谢您在这里的耐心等待。问题绝对是格式。我将不得不在这里修修补补。我拥有的数据集没有遵循相同的格式。我隔离的第一件事是每个对象都没有用逗号分隔。我在整个数据集上运行了一个查找替换,以在每个数据集之间实现一个逗号。这似乎没有解决,但我这里肯定有更多的方向。
  • 太好了,非常感谢您在这里的帮助。我会将其标记为已解决,因为我肯定拥有自己完成此任务所需的工具。谢谢!
猜你喜欢
  • 2014-01-30
  • 2018-08-29
  • 1970-01-01
  • 2012-08-12
  • 1970-01-01
  • 2020-09-20
  • 1970-01-01
  • 1970-01-01
  • 2014-04-24
相关资源
最近更新 更多