【问题标题】:Transform a section within a nested Json in a Pandas dataframe column into another dataframe将 Pandas 数据框列中嵌套 Json 中的部分转换为另一个数据框
【发布时间】:2021-12-13 01:39:58
【问题描述】:

我正在处理一个 Pandas 数据框,我在其中的一列中找到了这个 Json:

quote
{'BTC': {'price': 1, 'volume_24h': 1e-08, 'percent_change_1h': 0, 'percent_change_24h': 0, 'percent_change_7d': 0, 'market_cap': 11071985.881444559, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 134.210021972656, 'volume_24h': 0, 'percent_change_1h': 0.639231, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1488566971.9558687, 'last_updated': '2013-04-28T23:55:01.000Z'}}
{'BTC': {'price': 0.032343507812039, 'volume_24h': 1e-08, 'percent_change_1h': 0.799273, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 555151.4070926482, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 4.34840488433838, 'volume_24h': 0, 'percent_change_1h': 0.799273, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 74637021.56790735, 'last_updated': '2013-04-28T23:55:01.000Z'}}
{'BTC': {'price': 0.002874978304697, 'volume_24h': 1e-08, 'percent_change_1h': -0.934763, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 53927.00880335152, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.38652485609054604, 'volume_24h': 0, 'percent_change_1h': -0.934763, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 7250186.647688276, 'last_updated': '2013-04-28T23:55:03.000Z'}}
{'BTC': {'price': 0.008235615152382001, 'volume_24h': 1e-08, 'percent_change_1h': -0.0505028, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 44598.326734696995, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 1.10723268985748, 'volume_24h': 0, 'percent_change_1h': -0.0505028, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 5995997.185385211, 'last_updated': '2013-04-28T23:55:02.000Z'}}
{'BTC': {'price': 0.004811595748858, 'volume_24h': 1e-08, 'percent_change_1h': 0.609159, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 11180.078331276067, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.646892309188843, 'volume_24h': 0, 'percent_change_1h': 0.609159, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1503099.4011388426, 'last_updated': '2013-04-28T23:55:02.000Z'}}
{'BTC': {'price': 2.425762856e-06, 'volume_24h': 1e-08, 'percent_change_1h': 0.461694, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 10592.384991552675, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.00032613033545200005, 'volume_24h': 0, 'percent_change_1h': 0.461694, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1424087.2975724188, 'last_updated': '2013-04-28T23:55:14.000Z'}}
{'BTC': {'price': 0.03158483190407, 'volume_24h': 1e-08, 'percent_change_1h': 2.13819, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 8644.956027083475, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 4.24640512466431, 'volume_24h': 0, 'percent_change_1h': 2.13819, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1162266.2956510494, 'last_updated': '2013-04-28T23:55:03.000Z'}}
Name: quote, dtype: object

如何通过 Pandas 将这个包含 Json 的数据框列转换为有序数据框? 我对美元部分的部分数据感兴趣

我尝试使用"pd.json_normalize",但无法进行任何更改。

【问题讨论】:

  • pd.DataFrame(df['quote'].tolist())?

标签: python json pandas dataframe


【解决方案1】:

由于您的 JSON 是嵌套的,即使我们将 JSON 扩展为数据框,您也只会获得最高级别部分 BTCUSD 的 2 列。因此,我们必须首先专门访问您感兴趣的部分数据,即USD 部分。

可选步骤:如果您的列quote 是字符串而不是真正的JSON,您可以先将字符串转换为真正的JSON,如下所示。否则,跳过这一步:

import ast
df['quote'] = df['quote'].map(ast.literal_eval)

在确保您的列quote 是真正的JSON,而不是字符串后,要访问USD 部分,我们可以使用字符串访问器str[],如下所示:

df['quote'].str['USD']    

(我们可以跳过这一步,运行最后一步,这里只运行确保你得到所需部分的内容)

这将给出:

0    {'price': 134.210021972656, 'volume_24h': 0, '...
1    {'price': 4.34840488433838, 'volume_24h': 0, '...
2    {'price': 0.38652485609054604, 'volume_24h': 0...
3    {'price': 1.10723268985748, 'volume_24h': 0, '...
4    {'price': 0.646892309188843, 'volume_24h': 0, ...
5    {'price': 0.00032613033545200005, 'volume_24h'...
6    {'price': 4.24640512466431, 'volume_24h': 0, '...
Name: quote, dtype: object

然后,要访问其内容(即 USD 部分)并展开为单独的列,我们可以使用:

pd.DataFrame(df['quote'].str['USD'].tolist())

结果:

        price  volume_24h  percent_change_1h percent_change_24h percent_change_7d    market_cap              last_updated
0  134.210022           0           0.639231               None              None  1.488567e+09  2013-04-28T23:55:01.000Z
1    4.348405           0           0.799273               None              None  7.463702e+07  2013-04-28T23:55:01.000Z
2    0.386525           0          -0.934763               None              None  7.250187e+06  2013-04-28T23:55:03.000Z
3    1.107233           0          -0.050503               None              None  5.995997e+06  2013-04-28T23:55:02.000Z
4    0.646892           0           0.609159               None              None  1.503099e+06  2013-04-28T23:55:02.000Z
5    0.000326           0           0.461694               None              None  1.424087e+06  2013-04-28T23:55:14.000Z
6    4.246405           0           2.138190               None              None  1.162266e+06  2013-04-28T23:55:03.000Z

【讨论】:

    猜你喜欢
    • 2017-03-21
    • 2020-12-16
    • 1970-01-01
    • 2022-01-21
    • 2019-11-24
    • 1970-01-01
    • 1970-01-01
    • 2019-07-05
    相关资源
    最近更新 更多