根据条件查找父ID和子ID答案

【问题标题】：find parent id and child id based on condition根据条件查找父ID和子ID
【发布时间】：2022-09-27 15:16:24
【问题描述】：

我有一个示例数据框

id    mobile_number    to_be_deleted
 1       123                0
 2       123                1
 3       123                1
 4       456                0
 5       789                0
 6       789                1
 7       1234               0
 8       789                1

考虑到包含to_be_deleted 值的行为0，这些记录的id 将是parent_id，而其他记录中相同的mobile_number 将是child 的id。

我期望将上述数据框总结为

parent_id       child_id
    1              2, 3
    4              null
    5              6, 8
    7              null

总结数据框的最佳方法是什么？

标签： python pandas dataframe

【解决方案1】：

您可以使用pd.crosstab 和join 作为aggfunc：

pd.crosstab(index=df['mobile_number'], columns = df['to_be_deleted'], values=df['id'], \
            aggfunc= lambda x: ','.join(x))

输出：

       to_be_deleted    0   1
mobile_number       
123                    1    2,3
1234                   7    NaN
456                    4    NaN
789                    5    6,8

【讨论】：

【解决方案2】：

一种方法如下。

我们开始准备专栏parent_id

df_new = df[df['to_be_deleted'] == 0].groupby('mobile_number')['id'].apply(lambda x: ','.join(x.astype(str))).reset_index()

[Out]:
   mobile_number id
0            123  1
1            456  4
2            789  5
3           1234  7

还有一件事要做：将列id 更改为parent_id。为此，可以使用pandas.DataFrame.rename，如下所示

df_new.rename(columns={'id': 'parent_id'}, inplace=True)

[Out]:
   mobile_number parent_id
0            123         1
1            456         4
2            789         5
3           1234         7

现在，准备好parent_id，只需调整mobile_number 列以获得child_id。

df_new['child_id'] = df_new['mobile_number'].apply(lambda x: ','.join(df[(df['mobile_number'] == x) & (df['to_be_deleted'] == 1)]['id'].astype(str)))

[Out]:
   mobile_number parent_id child_id
0            123         1      2,3
1            456         4         
2            789         5      6,8
3           1234         7

最后，由于不想要列mobile_number，可以使用pandas.DataFrame.pop 将其删除

df_new.pop('mobile_number')

[Out]:
  parent_id child_id
0         1      2,3
1         4         
2         5      6,8
3         7

有空间将所有内容包装在一个函数中，如下所示

def summarization(df):

    df_new = df[df['to_be_deleted'] == 0].groupby('mobile_number')['id'].apply(lambda x: ','.join(x.astype(str))).reset_index()

    df_new.rename(columns={'id': 'parent_id'}, inplace=True)

    df_new['child_id'] = df_new['mobile_number'].apply(lambda x: ','.join(df[(df['mobile_number'] == x) & (df['to_be_deleted'] == 1)]['id'].astype(str)))

    df_new.pop('mobile_number')

    return df_new

然后所要做的就是将该函数应用于数据帧

df_new = summarization(df)

[Out]:
  parent_id child_id
0         1      2,3
1         4         
2         5      6,8
3         7

【讨论】：