如果条件满足 python，遍历行并写入一个新列答案

【问题标题】：Iterate over rows and write a new column if condition meets python如果条件满足 python，遍历行并写入一个新列
【发布时间】：2023-01-31 04:54:14
【问题描述】：

我有两个要比较的独立 df 帧：

P53-Malat1
Neat1-Malat1
Gap1-Malat1

和 f2：

intA,intB
P53-Malat1,Neat1-Malat1
Gap1-Malat1,Malat1-Pias3

我想遍历 f2 中每一列的行，并查看它在 f1 中的 id。如果是，则打印该行+“找到”，如果否，则在单独的列中打印该行+“not_found”。

f2 中的第二列也是如此。

我试过这种方法但它不起作用 - 我错过了什么吗？

with open("f1.txt","r") as f1:
    content = f1.read().splitlines()
    #print(content)

f2 = pd.read_csv("f2.csv")


f2["col1_search"] = f2.apply(lambda x: x["intA"]+"_found" if x in content else x["intA"]+"_not_found", axis=1)
f2["col2_search"] = f2.apply(lambda x: x["intB"]+"_found" if x in content else x["intB"]+"_not_found", axis=1)

所以所需的输出应该是这种格式的 f2 ：

col1_search,col2_search
P53-Malat1_found,Neat1-Malat1_found
Gap1-Malat1_found,Malat1-Pias3_not_found

谢谢你。

【问题讨论】：

对于这样的事情，你最好的选择可能是使用像 np.where(condition, if_true_this, if_false_this) 这样的东西。如果您可以更改答案以将数据复制到 df 中，我认为您可以获得更多帮助。

标签： python pandas dataframe lambda

【解决方案1】：

如果我理解正确的话，内容是一个列表而不是数据框。如果是这种情况，您可以使用.isin，它将为可以映射到您想要的任何后缀的每一行返回True或False。

import pandas as pd
content = ['P53-Malat1','Neat1-Malat1','Gap1-Malat1']


f2 = pd.DataFrame({'intA': {0: 'P53-Malat1', 1: 'Gap1-Malat1'},
                   'intB': {0: 'Neat1-Malat1', 1: 'Malat1-Pias3'}})

f2['col1_search'] = f2.intA + f2.intA.isin(content).map({True:'_found',False:'_not_found'})
f2['col2_search'] = f2.intB + f2.intB.isin(content).map({True:'_found',False:'_not_found'})

输出

          intA          intB        col1_search             col2_search
0   P53-Malat1  Neat1-Malat1   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1  Malat1-Pias3  Gap1-Malat1_found  Malat1-Pias3_not_found

或者如果您有很多列：

(f2 + f2.isin(content).replace({True:'_found',False:'_not_found'})).add_suffix('_search')

输出

         intA_search             intB_search
0   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1_found  Malat1-Pias3_not_found

可以合并回原始数据

pd.concat([f2,(f2 + f2.isin(content).replace({True:'_found',False:'_not_found'})).add_suffix('_search')], axis=1)

输出

          intA          intB        intA_search             intB_search
0   P53-Malat1  Neat1-Malat1   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1  Malat1-Pias3  Gap1-Malat1_found  Malat1-Pias3_not_found

【讨论】：

【解决方案2】：

这是您将如何使用 np.where 的示例

data = {'Category' : ['First', 'Second', 'Third'], 
        'First_Numbers' : [10, 10, 10],
        'Second_Numbers' : [20, 20, 20],
        'Third_Numbers' : [9, 21, 15]
       } 
df = pd.DataFrame(data)
comp_column = np.where((df['Third_Numbers'] < df['Second_Numbers']) & (df['Third_Numbers'] > df['First_Numbers']), 'found', 'not found')
df['check'] = comp_column
df

我插入了一些示例数据，您应该可以用自己的数据替换这些数据。现在我看到你想比较 2 个不同的 df，所以我建议合并它们，这样你就只在一个 df 上工作。这是合并/加入/连接 pandas df 的最佳文档：https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

【讨论】：

【解决方案3】：

f2 = pd.read_csv("f2.csv")
def transform(path:str,x):
    with open(path,"r") as f1:
         content = f1.read().splitlines()
    if x in content:
        return f"{x}_found"
    return f"{x}_not_found"

f2["col1_search"] = f2['intA'].apply(lambda x:transform("f2.csv", x.intA),axis=1)
f2["col2_search"] = f2['intB'].apply(lambda x:transform("f2.csv", x.intB),axis=1)

【讨论】：

您的答案可以通过其他支持信息得到改进。请edit 添加更多详细信息，例如引用或文档，以便其他人可以确认您的答案是正确的。你可以找到更多关于如何写出好的答案的信息in the help center。