如何根据字符串列表在 df col 中查找字符串匹配？答案

【问题标题】：How to find a string match in df col based on list of strings?如何根据字符串列表在 df col 中查找字符串匹配？
【发布时间】：2020-02-24 17:12:54
【问题描述】：

我有一份包含 1000 家公司的清单和该年度所有以前交易的 df。对于每场比赛，我想在新列 (df$Covered) 中创建一个新行值 (True)。

我不确定为什么我不断收到以下错误。我尝试研究这些问题，但到目前为止没有运气。

Match string to list of defined strings

Pandas extract rows from df where df['col'] values match df2['col'] values

代码示例：当我设置 regex=False

Customer_List = ['3M','Cargill,'Chili's,---]

df['Covered'] = df[df['End Customer Name'].str.contains('|'.join(Customer_List),case=False, na=False, regex=False)]

ValueError: 传递的项目数错误 32，位置暗示 1

代码示例：当我设置 regex=True

错误：位置 177825 的错误字符范围 H-D

 ~/opt/anaconda3/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
    928 
    929     try:
--> 930         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    931     except Verbose:
    932         **# the VERBOSE flag was switched on inside the pattern.  to be**

~/opt/anaconda3/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
    424     while True:
    425         itemsappend(_parse(source, state, verbose, nested + 1,
--> 426                            **not nested and not items**))
    427         if not sourcematch("|"):
    428             break

【问题讨论】：

你能添加一些示例数据吗？
可以发布 df.sample().to_dict() 的 O/P - 这将有助于重新创建/测试问题。
df['End Customer Name'] 是 100k+ 行名称，而 Customer_List 是 1000 个公司名称的列表，这有帮助吗？
为什么说'regex=False'？您正在创建一个正则表达式，方法是将您的术语与正则表达式中的“条”符号表示 OR。
谢谢斯科特，我不知道我是否需要文字字符串或正则表达式。您认为这与具有特殊字符有关吗？

标签： python pandas

【解决方案1】：

怎么样：

mask = df['End Customer Name'].isin(Customer_List)
df['covered'] = 0
df.loc[mask, 'covered'] = 1

【讨论】：

感谢 TaxpayersMoney，但有很多行中 Customer_List 是“最终客户名称”字符串中的子字符串，这就是我使用包含的原因。示例：最终客户名称 -Apple Inc、Apple Incorporation、Apple Inc. 客户列表 [“Apple Inc”]

【解决方案2】：

谢谢大家，这与我的 Customer_List 有特殊字符有关，所以我需要使用 map(re.escape

这个链接在下面帮助了我 Python regex bad character range.

【讨论】：