Python数据框条件列填充答案

【问题标题】：Python dataframe conditional column populationPython数据框条件列填充
【发布时间】：2017-08-11 21:00:22
【问题描述】：

我需要根据不同列中的值是否包含某些字母和某些规则来填充列中的值。

例如：

这是我的起始数据框：

import pandas as pd
testdata1 = [('A', ['3c', '20b', '9']),
     ('B', ['Prod1', 'Prod2', 'Prod3']),
     ('C', ['', '', '']),
     ]
df = pd.DataFrame.from_items(testdata1)
df

这是我的目标数据框：

targetdf = [('A', ['3c', '20b', '9']),
     ('B', ['Prod1', 'Prod2', 'Prod3']),
     ('C', ['15.00', '40.00', '9']),
     ]
df2 = pd.DataFrame.from_items(targetdf)
df2

在我上面的示例中，如果 A 列中的单元格包含“c”，则 C 列中的相应单元格应包含 A 列中单元格的数字部分与 5 相乘的结果。如果列中的单元格A 包含“b”，C 列中的相应单元格应包含 A 列中单元格的数字部分与 2 相乘的结果。如果 A 列中的单元格不包含字母（即它是一个数字），复制C列对应单元格的编号。

我认为解决方案将涉及使用“包含”来搜索“c”或“b”。也许是一个 If 语句？我不知道。我当然需要帮助来提取 A 列中单元格的数字部分并在 C 列中填充正确的值。我对 Python 还很陌生。

感谢您的帮助。

【问题讨论】：

标签： python string pandas if-statement dataframe

【解决方案1】：

这应该可行：

def parse_data(x):
    if 'c' in x:
        num = int(x.split('c')[0])
        return num * 5
    elif 'b' in x:
        num = int(x.split('b')[0])
        return num * 2
    else:
        return x

df['C'] = df['A'].apply(lambda x: parse_data(x))

     A      B   C
0   3c  Prod1  15
1  20b  Prod2  40
2    9  Prod3   9

【讨论】：

【解决方案2】：

我会这样做：

In [17]: mapping={'c':' * 5', 'b':' * 2'}

In [18]: df['C'] = pd.eval(df.A.replace(mapping, regex=True))

In [19]: df
Out[19]:
     A      B   C
0   3c  Prod1  15
1  20b  Prod2  40
2    9  Prod3   9

解释：

In [20]: df.A.replace(mapping, regex=True)
Out[20]:
0     3 * 5
1    20 * 2
2         9
Name: A, dtype: object

【讨论】：

我也喜欢这个方案，简洁有效，+1

【解决方案3】：

我会使用正则表达式和查找，例如

In [538]: (df.A.str.extract('(\d+)(\w+)?', expand=True)
             .replace({1: {'c':5,'b':2,np.nan:1}}).astype(int)
             .prod(1))
Out[538]:
0    15
1    40
2     9
dtype: int32

In [539]: df['C'] = (df.A.str.extract('(\d+)(\w+)?', expand=True)
                       .replace({1: {'c':5,'b':2,np.nan:1}}).astype(int)
                       .prod(1))
In [540]: df
Out[540]:
     A      B   C
0   3c  Prod1  15
1  20b  Prod2  40
2    9  Prod3   9

详情

In [542]: df.A.str.extract('(\d+)(\w+)?', expand=True)
Out[542]:
    0    1
0   3    c
1  20    b
2   9  NaN

In [543]: df.A.str.extract('(\d+)(\w+)?', expand=True).replace({1: {'c':5,'b':2,np.nan:1}})
Out[543]:
    0  1
0   3  5
1  20  2
2   9  1

【讨论】：

这也有效。感谢您的帮助。你的代码对我来说非常先进。您能解释一下这种特殊方法的任何优点吗？