如何在 Python 中对数据帧的字符串进行哈希处理？答案

【问题标题】：How to hash the strings of a dataframe in Python?如何在 Python 中对数据帧的字符串进行哈希处理？
【发布时间】：2021-04-15 20:15:20
【问题描述】：

我需要以某种方式散列数据帧字段的字符串。

我有这个 df：

cars =            ['Tesla', 'Renault', 'Tesla', 'Fiat', 'Audi', 'Tesla', 'Mercedes', 'Mercedes']
included_colors = ['red', 'green', np.nan, np.nan, 'yellow', 'black', np.nan, 'orange']
data = {'Cars': cars, 'Included Colors': included_colors}
df = pd.DataFrame (data, columns = ['Cars', 'Included Colors'])

它看起来像这样：

       Cars Included Colors
0     Tesla             red
1   Renault           green
2     Tesla             NaN
3      Fiat             NaN
4      Audi          yellow
5     Tesla           black
6  Mercedes             NaN
7  Mercedes          orange

我正在尝试创建在这种情况下有用的字典或其他形式的数据结构，以这种方式：

这样我才能最终匹配汽车和所有相关的颜色，如下例所示：

Tesla - red, black
Renault - green
Fiat - np.nan
Audi - yellow
Mercedes - orange

我尝试了这段代码，但我不知道如何继续...：

all_cars = df['Cars'].tolist() # extract all the cars from the df in a list
all_cars = list(dict.fromkeys(all_cars)) # make them unique

dis = {}
for car in all_cars:
    mask = (df['Cars'] == car)
    dis[df.loc[mask, 'Cars']] = df.loc[mask, 'Included Colors']

它不必是字典，它可以是任何东西，只要匹配所有这些键值即可。我只是认为这种数据结构适合。

如何做到这一点？非常感谢！！！！

【问题讨论】：

Pandas 在这里没有帮助你。从原始列表创建字典会更容易（而且相当简单）。

标签： python pandas dataframe data-structures

【解决方案1】：

您可以使用groupby() 并聚合到list。然后创建输出字典：

x = df.groupby("Cars", as_index=False).agg(list)
out = dict(zip(x.Cars, x["Included Colors"]))
print(out)

打印：

{'Audi': ['yellow'], 'Fiat': [nan], 'Mercedes': [nan, 'orange'], 'Renault': ['green'], 'Tesla': ['red', nan, 'black']}

感谢@QuangHoang 更简短的回答：

print(df.groupby("Cars")['Included Colors'].agg(list).to_dict())

【讨论】：

非常感谢！这很棒！我会在 6 分钟内接受答案
df.groupby("Cars")['Included Colors'].agg(list).to_dict().