【问题标题】:Convert a `dict[str, list[any]]` into a binary `pandas.DataFrame`将 `dict[str, list[any]]` 转换为二进制 `pandas.DataFrame`
【发布时间】:2023-02-03 01:04:08
【问题描述】:

我有以下字典

d = {
    "anna": ["apple", "strawberry", "banana"],
    "bob": ["strawberry", "banana", "peach"],
    "chris": ["apple", "banana", "peach", "mango"]
}

我想把它转换成下面的pandas.DataFrame

       apple banana mango peach strawberry
anna       1      1     0     0          1
bob        0      1     0     1          1
chris      1      1     1     1          0

用Python实现不是很复杂(见下文),但我想知道pandas中是否已经有一些东西可以自动完成(或者下面的实现是否可以优化)

提前致谢!


Python当前实现

import numpy as np
import pandas as pd

d = {
    "anna": ["apple", "strawberry", "banana"],
    "bob": ["strawberry", "banana", "peach"],
    "chris": ["apple", "banana", "peach", "mango"]
}
fruits = sorted(set(np.hstack(d.values())))
df = pd.DataFrame(columns=fruits)
for client, client_fruits in d.items():
    s = pd.Series({
        fruit: fruit in client_fruits for fruit in fruits
    }).astype(int)
    df = pd.concat([df, pd.DataFrame({client: s}).T])
print(df)

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    使用str.get_dummies 的一种选择:

    out = pd.Series({k: '|'.join(v) for k,v in d.items()}).str.get_dummies()
    

    或者from_dictpandas.get_dummies

    out = (pd.get_dummies(pd.DataFrame.from_dict(d, orient='index').stack())
             .groupby(level=0).max()
           )
    

    或者使用crosstab

    out = pd.crosstab(*zip(*((k,v) for k,l in d.items() for v in l))).clip(upper=1)
    

    输出:

           apple  banana  mango  peach  strawberry
    anna       1       1      0      0           1
    bob        0       1      0      1           1
    chris      1       1      1      1           0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-01-19
      • 2018-02-20
      • 1970-01-01
      • 2011-10-07
      • 1970-01-01
      • 1970-01-01
      • 2020-08-29
      • 1970-01-01
      相关资源
      最近更新 更多