尝试使用 .loc 在另一列中选择 jsonl 数据列，但即使密钥存在也会得到 KeyError答案

【问题标题】：Try to Select jsonl data column in another columns with .loc but got KeyError even though the key exists尝试使用 .loc 在另一列中选择 jsonl 数据列，但即使密钥存在也会得到 KeyError
【发布时间】：2021-05-29 14:02:24
【问题描述】：

这是我在 jsonl 中的数据结构

"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}

我尝试使用此代码从 place 列中选择 countryCode

country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]
countrycode_df = country_df["place"].loc["countryCode"]

但它给了我这个错误

KeyError: 'countryCode'

我该如何解决这个问题？

我试过这个method，但它不适合我的情况

【问题讨论】：

标签： python pandas jsonlines

【解决方案1】：

您可以通过str访问它：

country_df['place'].str['countryCode']

输出：

0    US
Name: place, dtype: object

【讨论】：

我能够得到相同的结果，但我无法使用.groupby("countryCode").size() 没有此错误“KeyError: 'countryCode'” 来解决这个问题？
当然，您可以使用df.groupby(df['place'].str['countryCode']).size()（如果您只想知道每个countryCode 有多少条记录，您也可以使用df['place'].str['countryCode'].value_counts()）
或者您可以使用json_normalize 将place 转换为DataFrame，然后您可以使用countryCode 作为列：pd.json_normalize(df['place'])
类似pd.json_normalize(df.to_dict(orient='records'))的东西来规范所有列
@someoneudon'tknow 你需要.to_dict(orient='records')

【解决方案2】：

因为“地方”基本上是一个dict（一个嵌套的字典），你可以像更高级别的dict一样访问它

country = {"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}}
country["place"]["countryCode"]

输出：

'US'

但是，使用 pandas json_normalize() 可能会更好：

country_df = pd.json_normalize(data = country)

print(country_df )

输出：

content	place._type	place.fullName	place.name	place.type	place.country	place.countryCode
Not yall gassing up a gay boy with no rhythm	snscrape.modules.twitter.Place	Manhattan, NY	Manhattan	city	United States	US

【讨论】：

在使用.groupby之前对我的所有数据进行标准化是最好的方法吗？
我建议....groupby() 是一个 DataFrame 函数，使用 json_normalize() 可以将 json 转换为 DataFrame
我曾尝试像 countrycode_df = pd.json_normalize(data=country_df) 那样标准化我的数据，但我得到了这个错误，而不是“AttributeError：'str' object has no attribute 'values'”
已经解决需要添加df.to_dict(orient='records')
提供给 json_normalize 的数据需要是 json。听起来您的数据已经部分是 DataFrame