【问题标题】:Iterating through a list and Dictionary and Turning into DataFrame遍历列表和字典并转换为 DataFrame
【发布时间】:2017-12-13 15:16:26
【问题描述】:

我正在尝试从我抓取的这些数据中获取特定列表以将其转换为 Pandas 数据框,但我收到以下错误:

TypeError: 列表索引必须是整数或切片,而不是 str

这是数据;

r = requests.get(url).json()

print(r)

输出:

[{'category': {'id': 34,
   'name': 'Tech',
   'shortname': 'tech',
   'sort_name': 'Tech'},
  'city': 'Edinburgh',
  'country': 'GB',
  'created': 1450173286000,
  'description': "<p>We're passionate about security, as are you.</p>\n<p>We want to invite the security community in Scotland to engage 5 or 6 times a year to discuss all things security. A informal forum to share ideas, make contacts, encourage debate.</p>\n<p>Our MeetUps will 100% NOT be sales-led. There will be no vendors, no sponsors, no obligation to talk to anyone, nor cost any money to attend.</p>\n<p>They will be hosted at a number of venues, but there will be no hosting-company focus, we merely organise and host the events, with a choice of speakers as well as the obligatory refreshments!</p>",
  'id': 19213863,
  'join_mode': 'open',
  'key_photo': {'base_url': 'https://secure.meetupstatic.com',
   'highres_link': 'https://secure.meetupstatic.com/photos/event/d/e/c/4/highres_445137028.jpeg',
   'id': 445137028,
   'photo_link': 'https://secure.meetupstatic.com/photos/event/d/e/c/4/600_445137028.jpeg',
   'thumb_link': 'https://secure.meetupstatic.com/photos/event/d/e/c/4/thumb_445137028.jpeg',
   'type': 'event'},
  'lat': 55.94,
  'link': 'https://www.meetup.com/Security-MeetUp-Scotland/',
  'localized_country_name': 'United Kingdom',
  'localized_location': 'Edinburgh, United Kingdom',
  'lon': -3.2,
  'members': 1059,
  'meta_category': {'category_ids': [34],
   'id': 292,
   'name': 'Tech',
   'photo': {'base_url': 'https://secure.meetupstatic.com',
    'highres_link': 'https://secure.meetupstatic.com/photos/event/2/e/a/d/highres_450131949.jpeg',
    'id': 450131949,
    'photo_link': 'https://secure.meetupstatic.com/photos/event/2/e/a/d/600_450131949.jpeg',
    'thumb_link': 'https://secure.meetupstatic.com/photos/event/2/e/a/d/thumb_450131949.jpeg',
    'type': 'event'},
   'shortname': 'tech',
   'sort_name': 'Tech'},
  'name': 'Security MeetUp Scotland',
  'next_event': {'id': '245752465',
   'name': 'Security Scotland Chapter 10 - hosted by Skyscanner!',
   'time': 1516820400000,
   'utc_offset': 0,
   'yes_rsvp_count': 130},
  'organizer': {'bio': 'I do Security stuff. Currently at Capital One.\nOrganiser of Security Scotland: https://www.meetup.com/Security-MeetUp-Scotland\nHusband, proud daddy, guitarist, drummer, muso, ex-DJ/producer, Fleetwood Mac aficionado, Scottish Leeds fan.',
   'id': 192768669,
   'name': 'Stu Hirst',
   'photo': {'base_url': 'https://secure.meetupstatic.com',
    'highres_link': 'https://secure.meetupstatic.com/photos/member/8/d/9/7/highres_250956247.jpeg',
    'id': 250956247,
    'photo_link': 'https://secure.meetupstatic.com/photos/member/8/d/9/7/member_250956247.jpeg',
    'thumb_link': 'https://secure.meetupstatic.com/photos/member/8/d/9/7/thumb_250956247.jpeg',
    'type': 'member'}},
  'score': 1.0,
  'state': 'U8',
  'status': 'active',
  'timezone': 'Europe/London',
  'urlname': 'Security-MeetUp-Scotland',
  'visibility': 'public',
  'who': 'Scot Security Folks'},
{'category': {'id': 34, ...
 ]

我知道这有很多词典,我想获取主要的。我试过这样;

for item in r['category']:
    print (item['name'])
    print (item['city'])
    print (item['members'])

for item in r['meta_category']:
print (item['name'])
print (item['country'])
print (item['status'])

还有更多,但那是我收到错误的时候。你能帮我用'name', 'city, 'country', 'lat', 'lon', 'description', 'members', 'status', 'url-name''category''meta_category'创建一个DataFrame吗

【问题讨论】:

  • 数据是否来自api.meetup.com/2/…
  • 另外,还有一件事。您可能还没有检查或理解响应的结构。从您的代码中,这将永远不会起作用。看看r['results'],看看你会得到什么。
  • 是的,就是这个
  • 另外,我看到了resultsresults 内部是一个字典列表。在每个字典中,我只看到category
  • 嗯...与r['results'] 我得到TypeError: list indices must be integers or slices, not str

标签: python json pandas dictionary dataframe


【解决方案1】:

所以 json,正如您已经知道的那样,包含一个字典列表,因此,也可以有“子字典”。在您的情况下,您有一个包含很多键和一些较小键的大字典,例如 'category'

category 有 4 个键 (id, name, shortname, sort_name)

当您尝试遍历 r['category'] 时,您发现了一个错误,因为您想在类别字典中找到的键不在其中!但在大的(在我的情况下称为数据)

所以下面的工作正如人们所期望的那样:

import json

data = json.load(open('test.json'))
columns = ['city', 'country', 'lat', 'lon',
           'description', 'members', 'status', 'urlname']

for meetup in data:
    print(meetup['category']['name'])
    for item in columns:
        print(meetup[item])

现在你只需要弄清楚你想如何将数据传输到 df 中,这可以通过 csv 来完成。如果不清楚,我也可以帮助您

【讨论】:

  • 在这种情况下,'test.json' 是否与 `url.json()' 相同,因为我遇到了错误
  • 数据应该等于你的 json 对象(我认为它的'r'在你的情况下)
  • 是的,在这种情况下,数据是 'r'。即使在尝试之后我仍然收到此错误TypeError: list indices must be integers or slices, not str 我认为问题在于 API url 调用
  • 编辑:因为你有一个 dict-objects 列表,所以你还需要遍历这些对象
猜你喜欢
  • 2020-11-07
  • 2018-08-22
  • 1970-01-01
  • 1970-01-01
  • 2022-01-20
  • 2014-01-05
  • 2021-11-28
  • 1970-01-01
相关资源
最近更新 更多