Python pandas：爆炸多行答案

【问题标题】：Python pandas: explode multiple rowsPython pandas：爆炸多行
【发布时间】：2021-07-17 01:14:08
【问题描述】：

我必须在下面的数据框：

import pandas as pd

a = pd.DataFrame([{"name": "John", 
                   "item" : "item1||item2||item3", 
                   "itemVal" : "item1Val||item2Val||item3Val"}, 
                  {"name" : "Tom", 
                   "item":"item4", 
                   "itemVal" : "item4Val"
                  }
                 ])

数据框是这样的：

   name                 item                       itemVal
   John  item1||item2||item3  item1Val||item2Val||item3Val
    Tom                item4                      item4Val

我想将该行分解为多行，这样它就会像这样（注意item 和它的itemVal 需要匹配）。

   name                 item                       itemVal
   John                item1                      item1Val
   John                item2                      item2Val
   John                item3                      item3Val
    Tom                item4                      item4Val

我在这里查看了其他答案：

Split (explode) pandas dataframe string entry to separate rows

pandas: How do I split text in a column into multiple rows?

但只有一列的作品。如何让它在多个列上工作？我正在使用 Pandas 1.0.1 和 Python 3.8

【问题讨论】：

item和itemVal的partition数总是一样吗？
@MichaelDelgado 是的，总是

标签： python python-3.x pandas

【解决方案1】：

a = a.apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print(a)

打印：

   name   item   itemVal
0  John  item1  item1Val
0  John  item2  item2Val
0  John  item3  item3Val
1   Tom  item4  item4Val

编辑：如果您只想拆分选定的列，您可以这样做：

exploded = a[['item', 'itemVal']].apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print( pd.concat([a['name'], exploded], axis=1) )

【讨论】：

感谢您的解决方案，这有效。但是，有没有办法指定我要拆分的唯一列？

【解决方案2】：

zip、product和chain的组合可以实现分行。由于这涉及字符串，更重要的是不涉及数值计算，因此您应该在 Python 中获得比在 Pandas 中运行更快的速度：

from itertools import product,chain
combine = chain.from_iterable

#pair item and itemval columns
merge = zip(df.item,df.itemVal) 

#pair the entires from the splits of item and itemval
merge = [zip(first.split("||"),last.split("||")) for first, last in merge]

#create a cartesian product with the name column
merger = [product([ent],cont) for ent, cont in zip(df.name,merge)]

#create ur exploded values
res = [(ent,*cont) for ent, cont in combine(merger)]
pd.DataFrame(res,columns=['name','item','itemVal'])

    name    item    itemVal
0   John    item1   item1Val
1   John    item2   item2Val
2   John    item3   item3Val
3   Tom     item4   item4Val

【讨论】：

【解决方案3】：

这可能没有 Sammywemmy 建议的答案那么快，但是这里有一个使用 Pandas 函数工作的通用函数。请注意，explode 功能一次仅适用于一列。所以：

df = pd.DataFrame({'A': [1, 2], 'B': [['a','b'], ['c','d']], 'C': [['z','y'], ['x','w']]})

A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Logic for multi-col explode
list_cols = {'B','C'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

A B C
------
1 a z
1 b y
2 c x
2 d w

【讨论】：