【问题标题】:Pandas replace NaN values with a list for specific columnsPandas 用特定列的列表替换 NaN 值
【发布时间】:2018-12-22 00:38:30
【问题描述】:

我有一个包含两行的数据框

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

有 8 个空值,它看起来像这样:

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

我想要什么

用我自己的列表替换序列列(seq_col、seq_col_2、seq_col_3 等)中的 NaN 值。

注意:

  • 在此数据中只有 2 个序列列,但可能更多。
  • 无法替换列中已存在的先前列表,仅限 NaN

假设我找不到将 NaN 替换为用户提供的字典中的 list 值的解决方案。

伪代码:

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

我在下面尝试了这段代码,它适用于您传递的第一列,然后适用于第二列。这很奇怪。

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

上述方法有效,但在 seq_col_2 上再试一次,结果会很奇怪。

预期输出: 给定参数输入:

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]

【问题讨论】:

  • 你能显示预期的输出吗?另外,您的代码会得到什么结果?
  • 很好,终于有人发布了至少一个可执行代码示例!不幸的是,我不能帮助你,但我会因此支持你的问题。但正如 Harv 所说:预期的输出会有很大帮助。
  • 您是否基本上想将这 2 个列表中的 10 个值转换为这些列中每一行的 10 个单独的值?如果是这样,您想对没有列表的列做什么?

标签: python python-3.x pandas numpy


【解决方案1】:

对于输入数组,您可以使用pd.DataFrame.locpd.Series.isnull

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

如果您需要系列中的列表值,那么您可以在赋值之前明确转换为系列:

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))

【讨论】:

    猜你喜欢
    • 2020-01-16
    • 1970-01-01
    • 1970-01-01
    • 2021-03-08
    • 2013-09-12
    • 1970-01-01
    • 2021-09-24
    相关资源
    最近更新 更多