鉴于您想坚持原始数据结构,一个解决方案可能是使用 df.loc 在 cell_types 列中查找与“基因对”列中的给定值匹配的所有值,将其转换为列出并检查定义“通用发送者”的预定义单元类型列表中的所有值是否出现在该列表中:
import pandas as pd
data = [ { "Gene pairs": "gene4_gene5", "cell_types": "cell1_cell2" }, { "Gene pairs": "gene1_gene2", "cell_types": "cell1_cell1" }, { "Gene pairs": "gene1_gene2", "cell_types": "cell1_cell3" }, { "Gene pairs": "gene2_gene3", "cell_types": "cell3_cell2" }, { "Gene pairs": "gene4_gene5", "cell_types": "cell1_cell1" }, { "Gene pairs": "gene4_gene5", "cell_types": "cell1_cell3" } ]
df=pd.DataFrame(data)
df['new column'] = df['Gene pairs'].apply(lambda x: "universal sender" if all(item in df.loc[df['Gene pairs'] == x]['cell_types'].tolist() for item in ["cell1_cell2", "cell1_cell3", "cell1_cell1"]) else None)
输出:
| | Gene pairs | cell_types | new column |
|---:|:-------------|:-------------|:-----------------|
| 0 | gene4_gene5 | cell1_cell2 | universal sender |
| 1 | gene1_gene2 | cell1_cell1 | |
| 2 | gene1_gene2 | cell1_cell3 | |
| 3 | gene2_gene3 | cell3_cell2 | |
| 4 | gene4_gene5 | cell1_cell1 | universal sender |
| 5 | gene4_gene5 | cell1_cell3 | universal sender |
或者您可以将其包装在一个函数中以获得更好的可读性,或者如果您想添加额外的过滤器:
def lookup(row):
cells = sorted(df.loc[df['Gene pairs'] == row['Gene pairs']]['cell_types'].tolist())
if all(item in cells for item in ["cell1_cell2", "cell1_cell3", "cell1_cell1"]):
return_value = "universal sender"
else:
return_value = None
return return_value
df['new column'] = df.apply(lambda row: lookup(row), axis=1)