如何用变量值填充熊猫数据框中的 NaN 值？答案

【问题标题】：How to fill NaN values in a pandas dataframe, with variable values?如何用变量值填充熊猫数据框中的 NaN 值？
【发布时间】：2018-02-23 06:33:28
【问题描述】：

我有一个数据框：

   Isolate1 Isolate2 Isolate3 Isolate4
2  NaN      NaN      AGTCTA   AGT
5  NaN      GC       NaN      NaN

并且想用破折号替换 Isolate1 列中的 NaN 值，其他列的非 NaN 值中的每个字母一个破折号（或者如果其他列具有其他不同的值，则为最大数量），以类似这些结尾：

  Isolate1 Isolate2 Isolate3 Isolate4
2 ------   NaN      AGTCTA   AGT
5 --       GC       NaN      NaN

我尝试了以下方法：

index_sizes_to_replace = {}
for row in df.itertuples():
    indel_sizes = []
    #0 pos is index
    for i, value in enumerate(row[1:]):
        if pd.notnull(value):
            indel_sizes.append((i, len(value)))
    max_size = max([size for i, size in indel_sizes])
    index_sizes_to_replace[row[0]]= max_size

现在我有了用来替换 NaN 值的破折号数，但不知道如何填充，试试这个：

for index, size in index_sizes_to_replace.iteritems():
    df.iloc[index].fillna("-"*size, inplace=True)

但是没用，有什么建议吗？

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

让我们试试吧：

import pandas as pd
import numpy as np

data = dict(Isolate1=[np.NaN,np.NaN,'A'],
            Isolate2=[np.NaN,'ABC','A'],
            Isolate3=['AGT',np.NaN,'A'],
            Isolate4=['AGTCTA',np.NaN,'A'])

df = pd.DataFrame(data)

原解决方案：

df['Isolate1'] = df.apply(lambda x: '-' * x.str.len().max().astype(int), axis=1)

忽略 Isolate1：

df['Isolate1'] = df.iloc[:,1:].apply(lambda x: x.str.len().max().astype(int)*'-', axis=1)

输出：

  Isolate1 Isolate2 Isolate3 Isolate4
0   ------      NaN      AGT   AGTCTA
1      ---      ABC      NaN      NaN
2        -        A        A        A

@Anton vBR 编辑处理 col1 中的 not nan。

# Create a mask
m = pd.isna(df['Isolate1'])
df.loc[m,'Isolate1'] = df[m].apply(lambda x: '-' * x.str.len().max().astype(int), axis=1)

输出：

  Isolate1 Isolate2 Isolate3 Isolate4
0   ------      NaN      AGT   AGTCTA
1      ---      ABC      NaN      NaN
2        A        A        A        A

【讨论】：

这个不错！ :)
我认为这个答案更符合 OP 的需求！
@AntonvBR，感谢您的警告，幸运的是这种情况没有发生
@ScottBoston，但是如果 isolate1 列没有 NaN 值会发生什么？是否会根据其他列的值的长度将其替换为破折号，对吗？只能对 NaN 值进行应用吗？
@AntonvBR，修改函数以包含列的名称，而不是其位置，将其添加为参数使其更通用，非常感谢

【解决方案2】：

看起来有点难看，但确实有效：

import pandas as pd
import numpy as np

data = dict(Isolate1=[np.NaN,np.NaN],
            Isolate2=[np.NaN,'GC'],
            Isolate3=['AGTCTA',np.NaN],
            Isolate4=['AGT',np.NaN])

df = pd.DataFrame(data)

df['Isolate1'] = (df.drop('Isolate1',1).ffill(axis=1).bfill(axis=1)
                         .iloc[:,0].replace('.', '-', regex=True))

print(df)

  Isolate1 Isolate2 Isolate3 Isolate4
2   ------      NaN   AGTCTA      AGT
5       --       GC      NaN      NaN

【讨论】：

@Anton vBR，感谢您的编辑！你不使用pd.read_clipboard() 吗？
我愿意，但有时我喜欢修复数据。 :)
哇，你能解释一下 replace('.', '-', regex=True) 部分吗？
不，它可以解决问题，最大 vlue 只是我的近似值，因为不知道如何计算破折号的数量
@Wen，Max，我想出了一些不一样的东西，你们两个怎么看？

【解决方案3】：

设置

df

  Isolate1 Isolate2 Isolate3 Isolate4
0      NaN      NaN      AGT   AGTCTA
1      NaN      ABC      NaN      NaN
2        A        A        A        A

解决方案
使用fillna + apply + str.__mul__：

df['Isolate1'] = df.Isolate1.fillna(
       df.fillna('').applymap(len).max(1).apply('-'.__mul__)
)

  Isolate1 Isolate2 Isolate3 Isolate4
0   ------      NaN      AGT   AGTCTA
1      ---      ABC      NaN      NaN
2        A        A        A        A

【讨论】：

好兄弟 :-) ！
空字符串的fillna是必要的，以避免由于NaN的类型而导致的错误，对吧？