逗号分隔的列中的值作为熊猫中的行答案

【问题标题】：comma seperated values in columns as rows in pandas逗号分隔的列中的值作为熊猫中的行
【发布时间】：2022-01-18 07:13:24
【问题描述】：

如下所述，我在 pandas 中有一个数据框，其中 info 列中的元素与 id 列中的唯一文件相同：

id   text         info
1    great        boy,police
1    excellent    boy,police
2    nice         girl,mother,teacher
2    good         girl,mother,teacher
2    bad          girl,mother,teacher
3    awesome      grandmother
4    superb       grandson

我只想将列表元素作为每个文件的行，例如：

id   text         info
1    great        boy
1    excellent    police
2    nice         girl
2    good         mother
2    bad          teacher
3    awesome      grandmother
4    superb       grandson

【问题讨论】：

标签： python-3.x pandas dataframe

【解决方案1】：

让我们试试

df['new'] = df.loc[~df.id.duplicated(),'info'].str.split(',').explode().values
df
   id       text                 info          new
0   1      great           boy,police          boy
1   1  excellent           boy,police       police
2   2       nice  girl,mother,teacher         girl
3   2       good  girl,mother,teacher       mother
4   2        bad  girl,mother,teacher      teacher
5   3    awesome          grandmother  grandmother
6   4     superb             grandson     grandson

【讨论】：

【解决方案2】：

利用'info' 重复这一事实。

df['info'] = df['info'].drop_duplicates().str.split(',').explode().to_numpy()

输出：

   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson

【讨论】：

【解决方案3】：

一种使用pandas.DataFrame.groupby.transform的方式。

请注意，这是假设：

info 中的元素长度与每个id 在被',' 拆分后的成员数相同
info 中的元素在同一个 id 中是相同的。

df["info"] = df.groupby("id")["info"].transform(lambda x: x.str.split(",").iloc[0])
print(df)

输出：

   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson

【讨论】：

【解决方案4】：

创建临时变量，计算每个 info 组的行数：

temp = df.groupby('info').cumcount()

对info 中的每个文本进行列表理解：

df['info'] = [ent.split(',')[pos] for ent, pos in zip(df['info'], temp)]

df

   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson

【讨论】：

【解决方案5】：

或者试试apply:

df['info'] = pd.DataFrame({'info': df['info'].str.split(','), 'n': df.groupby('id').cumcount()}).apply(lambda x: x['info'][x['n']], axis=1)

输出：

>>> df
   id       text         info
0   1      great          boy
1   1  excellent       police
2   2       nice         girl
3   2       good       mother
4   2        bad      teacher
5   3    awesome  grandmother
6   4     superb     grandson
>>>

【讨论】：