【问题标题】:Convert a dict to a pandas DataFrame将 dict 转换为 pandas DataFrame
【发布时间】:2017-01-28 08:41:35
【问题描述】:

我的数据如下所示:

{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

我想将其转换为熊猫数据框。但是当我尝试

df = pd.DataFrame(response.items())

我得到一个包含两列的 DataFrame,第一列是第一个键,第二列是键的值:

                            0                       1 
0  "57e01311817bc367c030b390"   {"ad_since": 2016, "indoor_swimming_pool": "No...
1  "57e01311817bc367c030b3a8"   {"ad_since": 2012, "indoor_swimming_pool": "No... 

如何获得每个键的单列:"ad_since""indoor_swimming_pool""indoor_swimming_pool"?并保留第一列,或者获取 id 作为索引。

【问题讨论】:

  • 您是否使用pd.DataFrame(response.items()) 尝试您的样本数据?对我来说它不起作用。
  • @jezrael 感谢您的评论,我编辑了我的帖子
  • @RichardRublev 我试过了,但得到了错误TypeError: Expected String or Unicode
  • @mitsi - 谢谢。但我认为两条记录很好,但现在只有一条记录 - DataFrame 中的第二行丢失了。可以添加一些valid json 或 json 列表吗?

标签: python json pandas


【解决方案1】:

您需要通过.apply(literal_eval).apply(json.loads)typestr 的列转换为dict,然后使用DataFrame.from_records

import pandas as pd
from ast import literal_eval

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

df = pd.DataFrame.from_dict(response, orient='index')

print (type(df.iloc[0,0]))
<class 'str'>

df.iloc[:,0] = df.iloc[:,0].apply(literal_eval)

print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

import pandas as pd
import json

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}


df = pd.DataFrame.from_dict(response, orient='index')
df.iloc[:,0] = df.iloc[:,0].apply(json.loads)


print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

【讨论】:

  • 使用第一种方法(使用literal_eval)和整个数据集我得到错误ValueError: malformed string可能是因为特殊字符。但它与json.loads 的第二种方法完美配合,谢谢
  • 很高兴能为您提供帮助。
【解决方案2】:

由于值是字符串,您可以使用json module 和列表推导:

In [20]: d =     {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

In [21]: import json

In [22]: pd.DataFrame(dict([(k, [json.loads(e)[k] for e in d.values()]) for k in json.loads(d.values()[0])]), index=d.keys())Out[22]: 
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b390"      2016                Yes                   No   
"57e01311817bc367c030b3a8"      2012                Yes                   No   

                       seaside  
"57e01311817bc367c030b390"      No  
"57e01311817bc367c030b3a8"      No  

【讨论】:

    猜你喜欢
    • 2021-07-16
    • 2021-11-06
    • 2018-10-26
    • 2013-12-28
    • 1970-01-01
    • 2019-06-19
    • 2016-02-15
    • 2018-10-14
    • 2017-03-03
    相关资源
    最近更新 更多