Pandas 在多列上爆炸答案

【问题标题】：Pandas Explode on Multiple columnsPandas 在多列上爆炸
【发布时间】：2020-04-10 04:01:15
【问题描述】：

使用 Pandas 0.25.3，尝试分解几列。

数据如下：

d1 = {'user':['user1','user2','user3','user4'],
      'paid':['Y','Y','N','N']
      'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'],
      'col4':'data'}

我将它发送到数据框 df=pd.DataFrame([d1],columns=d1.keys())，如下所示：

user                              paid              last_active                                                col4               
['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018']  'data'

还有其他列，每个列都有一个值，{'A':'B'} 类型的东西，但我不担心这些。

当我执行 df.explode('user') 时，它适用于那个列，其他列也一样，但是当我尝试执行 df.explode(column=('user','paid','last_active') 时，它给了我以下错误：

KeyError: ('user','paid','last_active')

所以我想知道的是，如何在多个列上使用explode 函数对其进行分解以获得以下df：

user     paid  last_active    col4
'user1'  'Y'   '11 Jul 2019'  'data'
'user2'  'Y'   '23 Sep 2018'  NaN
'user3'  'N'   '08 Dec 2019'  NaN
'user4'  'N'   '03 Mar 2018'  NaN

【问题讨论】：

只做df=pd.DataFrame(d1).，没有[]
它给了我一个错误，因为数组的长度不同（col4 有 1 个元素，其他有多个）
@QuangHoang 会给你一行，每行都有data（不仅仅是第一行）

标签： python pandas dataframe explode

【解决方案1】：

我猜你需要（注意 col4 的数据差异，其中 None 如 OP 所述）：

pd.DataFrame([[i] if not isinstance(i,list) else i 
             for i in d1.values()],index=d1.keys()).T

    user paid  last_active  col4
0  user1    Y  11 Jul 2019  data
1  user2    Y  23 Sep 2018  None
2  user3    N  08 Dec 2019  None
3  user4    N  03 Mar 2018  None

【讨论】：

@anky_91 不错！ +1
@anky 如果我有一个数据框，没有字典，我该如何修改您的上述代码以直接通过爆炸数据框或将上述代码应用于我的数据框来获得相同的结果？这对我的测试字典问题很有用，但是我的数据在 df 中，甚至将其更改为 to_dict() 会导致它的格式不正确，无法应用上述代码。

【解决方案2】：

Pandas 没有多列分解。有解决方法。一种简单的方法可能是：

df = pd.DataFrame(
    {
        'A': [1, 2],
        'B': [['a','b'], ['c','d']],
        'C': [['z','y'], ['x','w']]
    }
)
print(df)

--------------
A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Let us say list_cols are the columns to be exploded
list_cols = {'B','C'}

other_cols = list(set(df.columns) - set(list_cols))
##other_cols now contains all the remaining column names in the df
##we temporarily convert to set() to easily get the differences in 2 lists

##now explode the list_cols using a loop
exploded = [df[col].explode() for col in list_cols]
##now we have long list of exploded values. Print to see the format

##This statement creates pairs of the exploded cols
##zip command is used to create the pairs
##dict puts it in an appropriate format from which a dataframe can be created
##Please print the individual outputs of each command to understand the flow
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))

##Now merge back the other_cols as well
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

##lastly, re-create the original column order
df2 = df2.loc[:, df.columns]

print(df2)

------
A B C
------
1 a z
1 b y
2 c x
2 d w

【讨论】：

我应用了这个逻辑，但每次 B 和 C 列都在互换。
上面的代码应该可以工作。请分享您的代码以检查出了什么问题
请您在您的代码中添加更多分步说明。很难理解这里发生了什么。
我已经添加了几个内联的 cmets