检查字符串是否在熊猫数据框中答案

【问题标题】：Check if string is in a pandas dataframe检查字符串是否在熊猫数据框中
【发布时间】：2015-09-05 19:35:51
【问题描述】：

我想查看我的数据框中的特定列中是否存在特定字符串。

我收到了错误

ValueError：Series 的真值不明确。使用 a.empty， a.bool()、a.item()、a.any() 或 a.all()。

import pandas as pd

BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

if a['Names'].str.contains('Mel'):
    print ("Mel is there")

【问题讨论】：

标签： python pandas

【解决方案1】：

import re
s = 'string'

df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE)

#or
df['Name'] = df[df['Name'].isin(['string1', 'string2'])]

【讨论】：

【解决方案2】：

import pandas as pd

(data_frame.col_name=='str_name_to_check').sum()

【讨论】：

您的答案可以通过额外的支持信息得到改进。请edit 添加更多详细信息，例如引用或文档，以便其他人可以确认您的答案是正确的。你可以找到更多关于如何写好答案的信息in the help center。

【解决方案3】：

如果你想保存结果，那么你可以使用这个：

a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))

【讨论】：

【解决方案4】：

用于不区分大小写的搜索。

a['Names'].str.lower().str.contains('mel').any()

【讨论】：

【解决方案5】：

OP 的意思是找出字符串 'Mel' 是否存在 在特定列中，而不是 contained 在列中的任何字符串中。因此，contains 的使用是不必要的，而且效率不高。

一个简单的等号就足够了：

df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})

mel_count = (df['names'] == 'Mel').sum() 
print("There are {num} instances of 'Mel'. ".format(num=mel_count)) 
 
mel_exists = (df['names'] == 'Mel').any() 
print("'Mel' exists in the dataframe.".format(num=mel_exists)) 

mel_exists2 = 'Mel' in df['names'].values 
print("'Mel' is in the dataframe: " + str(mel_exists2))

打印：

There are 2 instances of 'Mel'. 
'Mel' exists in the dataframe.
'Mel' is in the dataframe: True

【讨论】：

类似的解决方案：(a['Names'].eq('Mel')).any()
这是最准确的答案
为什么必须深入到 numpy 来检查一个字符串是否包含在一系列字符串中？（如 df['names'].values 中的 'Mel'）。似乎适得其反。我希望'Mel' in df['names'] 可以工作？

【解决方案6】：

Pandas 似乎在推荐 df.to_numpy since 其他方法仍然提出 FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

因此，在这种情况下可以使用的替代方法是：

b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
     print("Mel is in the dataframe column Names")

【讨论】：

【解决方案7】：

如果您有可能需要搜索空字符串，

    a['Names'].str.contains('')

不会工作，因为它总是返回 True。

改为使用

    if '' in a["Names"].values

准确反映一个字符串是否在一个Series中，包括搜索空字符串的边缘情况。

【讨论】：

【解决方案8】：

我遇到了同样的问题，我用过：

if "Mel" in a["Names"].values:
    print("Yep")

但是这个解决方案可能会更慢，因为 pandas 在内部从一个系列创建一个列表。

【讨论】：

它适用于该列中的多个字符串，谢谢

【解决方案9】：

您应该检查代码行的值，例如添加检查长度。

if(len(a['Names'].str.contains('Mel'))>0):
    print("Name Present")

【讨论】：

【解决方案10】：

a['Names'].str.contains('Mel') 将返回大小为 len(BabyDataSet) 的布尔值的指示向量

因此，您可以使用

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
    print ("There are {m} Mels".format(m=mel_count))

或者any()，如果您不在乎有多少记录匹配您的查询

if a['Names'].str.contains('Mel').any():
    print ("Mel is there")

【讨论】：

如果a['Names'] 中有NaN 值，请使用contains() 函数的na 参数。 pandas.pydata.org/pandas-docs/stable/reference/api/…
陷阱号 2：str.contains('Mel') 匹配数据框列中每一行的每个子字符串。所以ABCMelABC == Mel.
此答案不正确且具有误导性，因为您正在检查列中的任何字符串中是否包含“Mel”，例如列中的 'hi Mel' 也将评估为 true，而字符串的完全匹配是必需的

【解决方案11】：

你应该使用any()

In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True

In [99]: if a['Names'].str.contains('Mel').any():
   ....:     print "Mel is there"
   ....:
Mel is there

a['Names'].str.contains('Mel') 给你一系列布尔值

In [100]: a['Names'].str.contains('Mel')
Out[100]:
0    False
1    False
2    False
3    False
4     True
Name: Names, dtype: bool

【讨论】：

如果我想检查其中一个词是否存在，a['Names'].str.contains("Mel|word_1|word_2") 有效。您能否为“和”条件提出一些建议。我想检查我列表中的所有单词是否存在于数据框的每一行中