当熊猫中的值为空时，to_list 不适用于熊猫答案

【问题标题】：to_list not working with pandas when null values in pandas当熊猫中的值为空时，to_list 不适用于熊猫
【发布时间】：2021-05-30 02:16:35
【问题描述】：

df = pd.DataFrame({'a':[None,1, 2], 'b':[None, (1,2), (3,4)]}) 


    a   b
0   NaN None
1   1.0 (1, 2)
2   2.0 (3, 4)

我想将列中的元组设置为每个都有自己的列。但是，我的方法有问题

df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

ValueError: Columns must be same length as key

我试图填充一个空元组，但它不会接受一个元组。我怎样才能做到这一点？

【问题讨论】：

标签： python pandas dataframe tuples tolist

【解决方案1】：

在创建 2 列之前将 None 转换为 (None, None)，如下所示：

df['b'] = df['b'].map(lambda x: (None, None) if x is None else x)

然后你可以通过你的步骤得到想要的结果：

    df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)
    print(df)

Output:
    a              b     b1  b2
0   NaN (None, None)    NaN NaN
1   1.0       (1, 2)    1.0 2.0
2   2.0       (3, 4)    3.0 4.0

如果您希望 b 列中的 None 保持不变，您可以使用：

    df[['b1', 'b2']] = pd.DataFrame(df['b'].map(lambda x: (None, None) if x is None else x).tolist(), index=df.index)

    print(df)

Output:
    a         b    b1  b2
0   NaN    None   NaN NaN
1   1.0  (1, 2)   1.0 2.0
2   2.0  (3, 4)   3.0 4.0

【讨论】：

【解决方案2】：

如果您有具有不同数量元素的元组，一个更通用的解决方案是创建一个自定义函数，如下所示

def create_columns_from_tuple(df, tuple_col):
    
    # get max length of tuples
    max_len = df[tuple_col].apply(lambda x: 0 if x is None else len(x)).max()
    
    # select rows with non-empty tuples
    df_full = df.loc[df[tuple_col].notna()]
    
    # create dataframe with exploded tuples
    df_full_exploded = pd.DataFrame(df_full[tuple_col].tolist(),
                                    index=df_full.index, 
                                    columns=[tuple_col + str(n) for n in range(1, max_len+1)])
    
    # merge the two dataframes by index
    result = df.merge(df_full_exploded, left_index=True, right_index=True, how='left')
    
    return result

在此函数中，您传递数据框和元组列的名称。该函数将自动创建与元组的最大长度一样多的列。

create_columns_from_tuple(df, tuple_col='b')
#      a       b   b1   b2
# 0  NaN    None  NaN  NaN
# 1  1.0  (1, 2)  1.0  2.0
# 2  2.0  (3, 4)  3.0  4.0

如果您的元组具有不同数量的元素：

df = pd.DataFrame({'a':[None,1, 2], 'b':[None, (1,2,42), (3,4)]}) 
create_columns_from_tuple(df, tuple_col='b')
#      a           b   b1   b2    b3
# 0  NaN        None  NaN  NaN   NaN
# 1  1.0  (1, 2, 42)  1.0  2.0  42.0
# 2  2.0      (3, 4)  3.0  4.0   NaN

【讨论】：

【解决方案3】：

您可以先dropNaNb 列中的值，然后从b 列中的剩余元素创建一个新数据框，并将生成的数据框分配给b1 和b2 列：

b = df['b'].dropna()
df[['b1', 'b2']] = pd.DataFrame(b.tolist(), index=b.index)

>>> df

     a       b   b1   b2
0  NaN    None  NaN  NaN
1  1.0  (1, 2)  1.0  2.0
2  2.0  (3, 4)  3.0  4.0

【讨论】：

【解决方案4】：

令我惊讶的是，this solution by piR² 也适用于您的情况：

df["x"], df["y"] = df.b.str

输出：

     a       b    x    y
0  NaN    None  NaN  NaN
1  1.0  (1, 2)  1.0  2.0
2  2.0  (3, 4)  3.0  4.0

话虽如此 - 有一个 FutureWarning Columnar iteration over characters will be deprecated in future releases.，所以这不是一个长期的解决方案。

【讨论】：