Pandas 用特定列的列表替换 NaN 值答案

【问题标题】：Pandas replace NaN values with a list for specific columnsPandas 用特定列的列表替换 NaN 值
【发布时间】：2018-12-22 00:38:30
【问题描述】：

我有一个包含两行的数据框

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

有 8 个空值，它看起来像这样：

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

我想要什么

用我自己的列表替换序列列（seq_col、seq_col_2、seq_col_3 等）中的 NaN 值。

注意：。

在此数据中只有 2 个序列列，但可能更多。
无法替换列中已存在的先前列表，仅限 NaN

假设我找不到将 NaN 替换为用户提供的字典中的 list 值的解决方案。

伪代码：

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

我在下面尝试了这段代码，它适用于您传递的第一列，然后适用于第二列。这很奇怪。

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

上述方法有效，但在 seq_col_2 上再试一次，结果会很奇怪。

预期输出： 给定参数输入：

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]

【问题讨论】：

你能显示预期的输出吗？另外，您的代码会得到什么结果？
很好，终于有人发布了至少一个可执行代码示例！不幸的是，我不能帮助你，但我会因此支持你的问题。但正如 Harv 所说：预期的输出会有很大帮助。
您是否基本上想将这 2 个列表中的 10 个值转换为这些列中每一行的 10 个单独的值？如果是这样，您想对没有列表的列做什么？
链接可能对stackoverflow.com/questions/48197234/…有帮助
这是您要找的吗？ pandas.pydata.org/pandas-docs/version/0.22/generated/…

标签： python python-3.x pandas numpy

【解决方案1】：

对于输入数组，您可以使用pd.DataFrame.loc 和pd.Series.isnull：

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

如果您需要系列中的列表值，那么您可以在赋值之前明确转换为系列：

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))

【讨论】：