【问题标题】:Scraping JSON arrays nested tags抓取 JSON 数组嵌套标签
【发布时间】:2016-06-23 19:29:06
【问题描述】:

我正在尝试从 JSON 文件中抓取数据。我能够从一些标签中抓取数据,但很少有嵌套标签出现问题。以下是文件中的示例 -

{"orders":[{
  "order_id":9000,
  "flight_start":"2017-06-15T05:00:00.000Z",
  "flight_end":"2017-06-22T05:00:00.000Z",
  "spots":[{
      "spot_id":7354259,
      "spot_length":15}],
  "constraints":{
      "forbid":[{
        "network":"BRVO"},
        {"network":"DSE"},
        {"network":"ESPN"},
        {"network":"DFC"},
        {"hours":[2,6],
         "days_of_week":["Monday","Tuesday","Thursday","Friday"]},
        {"hours":[2,6],
         "days_of_week":["Saturday","Sunday"]}],
      "allocation":[{
         "hours":[6,9],
         "impressions":{
             "min":0.05,
             "max":0.05},
         "days_of_week":["Monday","Tuesday","Wednesday","Thursday","Friday"]},{
         "hours":[20,0],
         "impressions":{"min":0.5,"max":0.5},
         "days_of_week":["Monday","Tuesday","Wednesday","Thursday","Friday"]},{
         "budget":{
             "min":1,
             "max":1},
         "spot_length":15}]}}]}

我无法从网络标签中抓取所有值,它只会从每个订单的所有网络标签中返回最高值。

我正在使用以下代码 -

 import urllib
 import json
 url = 'http://vw-test.elasticbeanstalk.com/test'
 json_obj = urllib.request.urlopen(url).read().decode('UTF-8')
 data = json.loads(json_obj)
 for i in data["orders"]:
     k = i["order_id"]
     j = i["flight_start"]
     l = i["flight_end"]
     m = i ['spots']
     for  value in m:    
         a = value["spot_length"]
         b = value["spot_id"]
     n = i["constraints"]
     c = n["forbid"]
     d = c[0]
     e = d["network"]
     print(e)

如果有人能帮我解决这个问题,我将非常感激。

【问题讨论】:

    标签: python arrays json web-scraping urllib


    【解决方案1】:

    您问题中的 json 数据不完整。做出一些假设,这可能有效:

    for i in data["orders"]:
        k = i["order_id"]
        j = i["flight_start"]
        l = i["flight_end"]
        m = i ['spots']
        for  value in m:
            a = value["spot_length"]
            b = value["spot_id"]
        n = i["constraints"]
        c = n["forbid"]
        d = c[0]
        networks = [d["network"] for d in c if "network" in d]
        print(networks)
    

    【讨论】:

    • 是的,这行得通,非常感谢。顺便说一句,如果你想看一下,代码中给出了 json 文件的链接。
    • 不客气。我的意思是您的问题中显示的示例数据需要以"spot_length":15}]}}]} 结尾才能格式正确。
    • 是的,您对此感到抱歉,我已在问题中解决了它。非常感谢
    猜你喜欢
    • 2014-05-11
    • 1970-01-01
    • 2021-10-08
    • 2019-07-23
    • 1970-01-01
    • 2020-11-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多