【问题标题】:Iterating through JSON Array in Python Match Array of Values在 Python 匹配值数组中遍历 JSON 数组
【发布时间】:2020-01-23 18:54:21
【问题描述】:

我正在与songkick api found here 合作,并且非常接近完成我正在处理的一个程序,以获取有关一些特定艺术家即将举行的演出的信息。我输入了一组metro_areas,我只想跟踪和输出这些城市及其随附的 ID 中的节目。我要提取的其他信息是演出日期、艺术家姓名、场地名称。基本上现在我的程序能够从artist_ids 列表中提取每个节目,我已经输入了迭代请求 url 中的 id,以获取参数中的日期范围。我当前的输出如下所示:

['Date', 'Artist Name', 'Venue Name', 'City', 'metroArea ID']

['FEB - 7', 'Rosalia', 'NOTO', 'Philadelphia, PA, US', 5202]
['FEB - 8', 'Rosalia', 'Audio', 'San Francisco, CA, US', 26330]
['FEB - 8', 'Kid Cudi', 'Shady Park', 'Tempe, AZ, US', 23068]
['FEB - 8', 'Kid Cudi', 'Madison Square Garden', 'New York City, NY, US', 7644]

我希望输出是这样的:

['FEB - 8', 'Rosalia', 'Audio', 'San Francisco, CA, US', 26330]
['FEB - 8', 'Kid Cudi', 'Madison Square Garden', 'New York City, NY, US', 7644]

基于我在程序开始时定义的这个数组来匹配songkick 对象数组中的metro_areas。

metro_areas = [
    ('Los Angeles', '17835'),
    ('San Francisco', '26330'),
    ('New York City', '7644'),
    ('Seattle', '2846'),
    ('Nashville', '11104')
]

这是我为每个艺术家 ID 提取的 json 对象数组:

{
  "resultsPage": {
    "results": {
      "event": [
        {
          "id":11129128,
          "type":"Concert",
          "uri":"http://www.songkick.com/concerts/11129128-wild-flag-at-fillmore?utm_source=PARTNER_ID&utm_medium=partner",
          "displayName":"Wild Flag at The Fillmore (April 18, 2012)",
          "start": {
            "time":"20:00:00",
            "date":"2012-04-18",
            "datetime":"2012-04-18T20:00:00-0800"
          },
          "performance": [
            {
              "artist": {
                "id":29835,
                "uri":"http://www.songkick.com/artists/29835-wild-flag?utm_source=PARTNER_ID&utm_medium=partner",
                "displayName":"Wild Flag",
                "identifier": []
              },
              "id":21579303,
              "displayName":"Wild Flag",
              "billingIndex":1,
              "billing":"headline"
            }
          ],
          "location": {
            "city":"San Francisco, CA, US",
            "lng":-122.4332937,
            "lat":37.7842398
          },
          "venue": {
            "id":6239,
            "displayName":"The Fillmore",
            "uri":"http://www.songkick.com/venues/6239-fillmore?utm_source=PARTNER_ID&utm_medium=partner",
            "lng":-122.4332937,
            "lat":37.7842398,
            "metroArea": {
              "id":26330,
              "uri":"http://www.songkick.com/metro-areas/26330-us-sf-bay-area?utm_source=PARTNER_ID&utm_medium=partner",
              "displayName":"SF Bay Area",
              "country": { "displayName":"US" },
              "state": { "displayName":"CA" }
            }
          },
          "status":"ok",
          "popularity":0.012763
        }, ....
      ]
    },
    "totalEntries":24,
    "perPage":50,
    "page":1,
    "status":"ok"
  }
}

更多代码可查看我如何从 Songkick 请求中的 JSON 获取输出。

metro_areas = [
                    ('Los Angeles','17835'),
                    ('San Francisco', '26330'),
                    ('New York City','7644'),
                    ('Seattle','2846'),
                    ('Nashville','11104')
               ]



# artists we want to track
artist_ids = [
    ('Rosalia', '4610868'), ('EARTHGANG', '5720759'), ('Kid Cudi', '8630279'), ('Kanye West', '5566863'),
    ('Ludacris', '398291'), ('Hayley Williams', '10087966')
]

# Fetch existing events in each metro area
for artist_id in artist_ids:

params = {
        'apikey': 'API_KEY',
        'min_date': '2020-02-01',
        'max_date': '2020-02-08',
        # 'type': 'Concert'
    }

r = requests.get('https://api.songkick.com/api/3.0/artists/' + artist_id[1] + '/calendar.json', params=params)
    response = r.json()

shows = response['resultsPage']['results']

    for show in shows:
        try:
            shows = shows['event']

            formatted_shows = [{
                'artistID': [perf['artist']['id'] for perf in s['performance']],
                'date': s['start']['date'],
                'name': [perf['artist']['displayName'] for perf in s['performance']],
                'metroArea': s['venue']['metroArea']['id'],
                'city': s['location']['city'],
                'venue': s['venue']['displayName']
                }
                for s in shows if len(s['performance']) > 0
            ]
            for sub in formatted_shows:
                if sub['artistID'] == artist_id[1]:
                    sub['name'] = artist_id[0]
new_show = artist_id[1]
                new_show_name = artist_id[0]
                new_date = sub['date']
                new_date_time = new_date = datetime.strptime(new_date, '%Y-%m-%d')
                date_time_fin = new_date_time.strftime('%b - %-d').upper()

                formatted_show_final = [date_time_fin, new_show_name, sub['venue'], sub['city'], sub['metroArea']
                print(formatted_show_final)


长话短说,我需要找到一种方法来遍历我列出的每个 Metro_areas id(洛杉矶、旧金山、纽约、西雅图、纳什维尔),并且只为每个请求迭代输出与 'metroArea': s['venue']['metroArea']['id'] 匹配的节目。

【问题讨论】:

  • 你为什么不改用Upcoming Events by Metro Area API
  • @SunnyPatel 如果我这样做,我将不得不做与我在这里尝试做的相同的事情,但要使用艺术家列表及其随附的 ID。我的艺术家名单比都市地区名单大得多,所以我认为这种方式会更容易。
  • 如果您的都市区较少,您提出的请求就会减少,这样会更好。 :)
  • @SunnyPatel 我不确定是否会出现这种情况,因为每页有 50 个节目的限制,所以我必须分页,因为我正在寻找跨越 a 的日期范围整月。无论哪种方式,我都必须匹配并仅输出带有我预定义的 metro_areas id 数组或预定义的 artist_ids 数组的节目。
  • 你甚至没有把完整的代码放在这里。请提供完整代码

标签: python arrays json dictionary match


【解决方案1】:

如果我很好理解这个问题,请在第二个 for 循环中添加:if sub['metroArea'] in [area[1] for area in metro_areas]:

    for show in shows:
        try:
            shows = shows['event']
            formatted_shows = [{
                'artistID': [perf['artist']['id'] for perf in s['performance']],
                'date': s['start']['date'],
                'name': [perf['artist']['displayName'] for perf in s['performance']],
                'metroArea': s['venue']['metroArea']['id'],
                'city': s['location']['city'],
                'venue': s['venue']['displayName']
                }
                for s in shows if len(s['performance']) > 0
            ]
            for sub in formatted_shows:
                #Modified here to apply str() function to transform #sub['metroArea'] to string instead of int value
                if str(sub['metroArea']) in [area[1] for area in metro_areas]:
                    if sub['artistID'] == artist_id[1]:
                        sub['name'] = artist_id[0]
                    new_show = artist_id[1]
                    new_show_name = artist_id[0]
                    new_date = sub['date']
                    new_date_time = new_date = datetime.strptime(new_date, '%Y-%m-%d')
                    date_time_fin = new_date_time.strftime('%b - %-d').upper()
                    formatted_show_final = [date_time_fin, new_show_name, sub['venue'], sub['city'], sub['metroArea']]
                    print(formatted_show_final)

【讨论】:

  • 感谢您提供此代码 sn-p,它可能会提供一些有限的即时帮助。 proper explanation 将通过展示为什么这是解决问题的好方法,并使其对有其他类似问题的未来读者更有用,从而大大提高其长期价值。请edit您的回答添加一些解释,包括您所做的假设。
  • 非常感谢!这在将 MetroArea 从 formatted_shows 转换为要比较的字符串后完美运行!非常感谢!
  • 是的,我忘了把它转换成字符串。我修改了代码
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2017-11-01
  • 1970-01-01
  • 2017-12-16
  • 1970-01-01
  • 2011-02-05
  • 2021-01-20
相关资源
最近更新 更多