将熊猫数据框转换为字典答案

【问题标题】：convert a pandas dataframe to dictionary将熊猫数据框转换为字典
【发布时间】：2018-07-14 06:33:09
【问题描述】：

我有一个如下的熊猫数据框：

df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})
df

看起来像

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

我想把它转换成字典，这样我就能得到：

red     d
yellow  c
blue    b

如果数据集很大，请避免使用任何迭代方法。我还没有想出解决办法。任何帮助表示赞赏。

【问题讨论】：

Convert a Pandas DataFrame to a dictionary的可能重复
pandas.pydata.org/pandas-docs/stable/generated/… 对您的数据进行子集化，然后执行to_dict，这是通过pandas 提供的现成可用的
连续两个1可以吗？
@tai : 一行中只有一个 1

标签： python python-3.x pandas dictionary dataframe

【解决方案1】：

首先，如果你真的想把它转换成字典，把你想要的作为键的值转换成DataFrame的索引会更好一点：

df.set_index('a', inplace=True)

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

您的数据似乎采用“单热”编码。您首先必须使用the method detailed here 来扭转它：

series = df.idxmax(axis=1)

这看起来像：

a
red       d
yellow    c
blue      b
dtype: object

快到了！现在并在“值”列上使用to_dict（这是设置列a 作为索引的地方）：

series.to_dict()

这看起来像：

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

我认为这是您正在寻找的。作为单行：

df.set_index('a').idxmax(axis=1).to_dict()

【讨论】：

很好的解释。我喜欢你采取的简单步骤

【解决方案2】：

你可以试试这个。

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b

【讨论】：

【解决方案3】：

这里需要dot 和zip

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: {'blue': 'b', 'red': 'd', 'yellow': 'c'}

【讨论】：

或许只是df.set_index('a').dot(df.columns[1:]).to_dict()

【解决方案4】：

希望这可行：

import pandas as pd
df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

输出：

{0: {'a': 'red', 'e': 'd'}, 1: {'a': 'yellow', 'e': 'c'}, 2: {'a': 'blue', 'e': 'b'}}

【讨论】：

是的，我使用的是 python 2.7
标记为3.x。输出看起来不像 OP 想要的。
看来，我忘了使用轴列。我也检查了它的python3，工作正常。
@bhushan，感谢您的回答，但输出不正确..我想要不同的格式

【解决方案5】：

您可以将dataframe 转换为dict，使用pandas to_dict 和list 作为参数。然后遍历生成的dict 并获取值为1 的列标签。

>>> {k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()}
>>> {'yellow': 'c', 'blue': 'b', 'red': 'd'}

【讨论】：

感谢您的解决方案，但它是一个迭代的解决方案，对于我的大型数据集来说很慢。

【解决方案6】：

将 a 列设置为索引，然后查看 df 的行找到值 1 的索引，然后使用 to_dict 将结果系列转换为字典

这里是代码

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

或者将索引设置为a，然后使用argmax查找每行中最大值的索引，然后使用to_dict转换为字典

df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

在这两种情况下，结果都是

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

附言。我使用 apply 通过设置 axis=1 来遍历 df 的行

【讨论】：