【问题标题】:create pandas dataframe from dictionary of dictionaries从字典字典创建熊猫数据框
【发布时间】:2016-01-14 10:36:10
【问题描述】:

我有一个字典形式的字典:

{'user':{movie:rating} }

例如,

{Jill': {'Avenger: Age of Ultron': 7.0,
                            'Django Unchained': 6.5,
                            'Gone Girl': 9.0,
                            'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
                                'Django Unchained': 9.0,
                                'Zoolander': 2.0}}

我想将这个字典转换成一个熊猫数据框,第 1 列是用户名,其他列是电影评级,即

user  Gone_Girl  Horrible_Bosses_2  Django_Unchained  Zoolander etc. \

但是,一些用户没有对电影评分,因此这些电影不包含在该用户 key() 的 values() 中。在这些情况下,最好只用 NaN 填充条目。

到目前为止,我遍历键,填充一个列表,然后使用这个列表创建一个数据框:

data=[] 
for i,key in enumerate(movie_user_preferences.keys() ):
    try:            
        data.append((key
                    ,movie_user_preferences[key]['Gone Girl']
                    ,movie_user_preferences[key]['Horrible Bosses 2']
                    ,movie_user_preferences[key]['Django Unchained']
                    ,movie_user_preferences[key]['Zoolander']
                    ,movie_user_preferences[key]['Avenger: Age of Ultron']
                    ,movie_user_preferences[key]['Kill the Messenger']))
    # if no entry, skip
    except:
        pass 
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])

但这只会给我一个数据框,其中包含对集合中所有电影进行评分的用户。

我的目标是通过遍历电影标签(而不是上面显示的蛮力方法)来附加到数据列表,其次,创建一个包含所有用户的数据框,并将空值放置在不包含的元素中有电影收视率。

【问题讨论】:

    标签: dictionary pandas dataframe


    【解决方案1】:

    你可以将dict的dict传递给DataFrame构造函数:

    In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}
    
    In [12]: pd.DataFrame(d)
    Out[12]:
                            Jill  Toby
    Avenger: Age of Ultron   7.0   8.5
    Django Unchained         6.5   9.0
    Gone Girl                9.0   NaN
    Kill the Messenger       8.0   NaN
    Zoolander                NaN   2.0
    

    或者使用from_dict方法:

    In [13]: pd.DataFrame.from_dict(d)
    Out[13]:
                            Jill  Toby
    Avenger: Age of Ultron   7.0   8.5
    Django Unchained         6.5   9.0
    Gone Girl                9.0   NaN
    Kill the Messenger       8.0   NaN
    Zoolander                NaN   2.0
    
    In [14]: pd.DataFrame.from_dict(d, orient='index')
    Out[14]:
          Django Unchained  Gone Girl  Kill the Messenger  Avenger: Age of Ultron  Zoolander
    Jill               6.5          9                   8                     7.0        NaN
    Toby               9.0        NaN                 NaN                     8.5          2
    

    【讨论】:

    • 有没有办法让用户名成为单独的列而不是索引?
    • pd.DataFrame.from_dict(d, orient='index').reset_index()
    • 有没有办法让所有信息都变成列?即第 1 列:吉尔和托比,第 2 列:为每一部重复的所有电影(托比和吉尔)等...
    【解决方案2】:

    这种蛮力方法似乎也行得通,但在我看来,迭代电影标签仍然会更加稳健。

    data=[] 
    for i,key in enumerate(movie_user_preferences.keys() ):
        try:            
            data.append((key
                        ,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN'
                        ,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN'
                        ,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN'
                        ,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN'
                        ,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN'
                        ,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN' ))
    
        # if no entry, skip
        except:
            pass
    
    
     user Gone_Girl Horrible_Bosses_2  Django_Unchained Zoolander  \
     0      Sam         6                 3               7.5         7   
     1      Max        10                 6               7.0        10   
     2   Robert       NaN                 5               7.0         9   
     3     Toby       NaN               NaN               9.0         2   
     4    Julia       6.5               NaN               6.0       6.5   
     5  William         7                 4               8.0         4   
     6     Jill         9               NaN               6.5       NaN   
    
     Avenger_Age_of_Ultron Kill_the_Messenger  
     0                   10.0                5.5  
     1                    7.0                  5  
     2                    8.0                  9  
     3                    8.5                NaN  
     4                   10.0                  6  
     5                    6.0                6.5  
     6                    7.0                  8  
    

    【讨论】:

      猜你喜欢
      • 2014-11-22
      • 1970-01-01
      • 1970-01-01
      • 2019-11-12
      • 2019-01-11
      • 2018-02-26
      • 1970-01-01
      相关资源
      最近更新 更多