像 pd 数据框一样操作 defaultidict答案

【问题标题】：manipulate a defaultidict like a pd dataframe像 pd 数据框一样操作 defaultidict
【发布时间】：2021-11-27 12:24:36
【问题描述】：

我有一个数据框，我将其转换为 defaultdict，其中 'id' 列作为键，其余列作为值，所以我执行以下操作：

d = {'id': [1,1,1,1,2,2,3,3,3,4,4,4,4],
     'label':['A','A','B','G','A','BB','C','C','A','BB','B','AA','AA']
    ,'amount':[2,-12,12,-12,5,-5,2,3,5,3,3,10,10]}
df = pd.DataFrame(d)

from collections import defaultdict
import pandas as pd


dd = defaultdict(list)

# turn df into a dictionary groyped by the 'id'
for  index,row in df.iterrows():
        dd[row["id"]].append(
            { 
                
                "description": row["label"],
                'amount':row['amount'] })
dd

defaultdict(list,
           {1:[{'id':1, 'description': 'A', 'amount': 2},
              {'id': 1, 'description': 'A', 'amount':-12},
              {'id': 1, 'description': 'B', 'amount': 12},
              {'id': 1, 'description': 'G', 'amount':-12}],
             2:[{'id': 2, 'description': 'A', 'amount': 5},
              {  'id': 2, 'description': 'BB', 'amount':-5}],
             3:[{'id': 3, 'description': 'C', 'amount': 2},
                {'id': 3, 'description': 'C', 'amount': 3},
                {'id': 3, 'description': 'A', 'amount': 5}],
            4:[{'id': 4, 'description': 'BB', 'amount': 3},
              {'id': 4, 'description': 'B', 'amount': 3},
              {'id': 4, 'description': 'AA', 'amount': 10},
              {'id': 4, 'description': 'AA', 'amount':10}]})

我想做的是像熊猫数据框一样操作字典。例如，我想检查每个用户的“描述”和“数量”是否在记录中相等。对于具体的例子，我希望我想要的字典看起来像这样：

defaultdict(list,{4: [{'id': 4, 'description': 'AA', 'amount': 10},
                      {'id': 4, 'description': 'AA', 'amount': 10}]})

【问题讨论】：

嗯，所以 defaultdict 需要 pandas 方法吗？什么是理由？为什么不将 pandas 用于 pandas 方法并在最后一步将输出转换为 defaultdict ？

标签： python pandas list dataframe dictionary

【解决方案1】：

在我看来，更快更简单的是对不在defaultdict 中的 DataFrame 或 Series 使用 pandas 方法。因此，您的输出可以使用：

#all columns
df = df[df.duplicated(keep=False)]
#if need specify columns
#df = df[df.duplicated(['id','label','amount'], keep=False)]
print(df)
    id label  amount
11   4    AA      10
12   4    AA      10

最后一个id需要字典使用DataFrame.groupby:

d = df.groupby(df['id']).apply(lambda x: x.to_dict(orient='records')).to_dict()
print(d)
{4: [{'id': 4, 'label': 'AA', 'amount': 10}, {'id': 4, 'label': 'AA', 'amount': 10}]}

【讨论】：

这对我没有多大帮助，因为我想检查如何在字典中进行操作（例如检查一个用户的记录之间的相等性），而不是像您那样使用 pandas 操作与重复。所以我想在我有字典后检查是否相等。
@hippocampus - 很抱歉，那么不知道。