检查字符串是否包含列表元素答案

【问题标题】：Check if a string contains the list elements检查字符串是否包含列表元素
【发布时间】：2017-11-20 22:56:24
【问题描述】：

如何检查一个字符串是否包含列表中的元素？

str1 = "45892190"
lis = [89,90]

【问题讨论】：

所有元素？还是只有一个？重叠呢？
@Willem 列表中的所有元素并且没有重叠
到目前为止你尝试过什么？请展示你的作品。我们不是来为您工作的。
string = "45892190" lis = [89,90,77,8] for i in lis: if str(i) in string: lis.remove(i) print(lis) print(lis )
@Soviut 到目前为止还没有工作

标签： python python-3.x

【解决方案1】：

你可以使用all()函数

In [1]: str1 = "45892190"
   ...: lis = [89,90]
   ...: all(str(l) in str1 for l in lis)
   ...:
Out[1]: True

【讨论】：

看起来 OP 不希望有任何重叠。

【解决方案2】：

def contains(s, elems):
    for elem in elems:
        index = s.find(elem)
        if index == -1:
            return False
        s = s[:index] + s[index + len(elem) + 1:]
    return True

用法：

>>> str1 = "45892190"
>>> lis = [89,90]
>>> contains(str1, (str(x) for x in lis))
True
>>> contains("890", (str(x) for x in lis))
False

【讨论】：

遗憾的是，这根本不起作用。 +1 是错误的。此外，它为 "45892190" 和 [89,90,4521] 返回 True。 PS：我没有投反对票
改进这个答案实际上是微不足道的：不是删除找到的数字，而是用“占位符”字符（例如空格）替换它们。此外，列表应该首先按降序排序，以便首先匹配最长的数字，以避免contains("2302", [2, 23])所示的问题。哦，我假设列表中的数字只是正整数。

【解决方案3】：

如果你想要不重叠的匹配，我会这样做：

创建初始字符串的副本（我们将对其进行修改）
遍历列表的每个元素，如果我们在字符串中找到该元素，我们将其替换为x
同时，如果我们在字符串中找到数字，我们会增加一个计数器
最后，如果变量等于列表的长度，则意味着它的所有元素都在那里

str1 = "45890190"
lis1 = [89, 90]

copy, i = str1, 0
for el in lis1:
    if str(el) in copy:
        copy = copy.replace(str(el), 'x')
        i = i + 1

if i == len(lis1):
    print(True)

此外，如果我们添加一个额外的条件，当在字符串中找不到元素时返回False，我们真的不需要计数器。也就是说，我们得到以下最终解决方案：

def all_matches(_list, _string):
    str_copy = _string
    for el in _list:
        if str(el) not in str_copy:
            return False
        str_copy = str_copy.replace(str(el), 'x')
    return True

你可以通过写作来测试：

str1 = "4589190"
lis1 = [89, 90]

print(all_matches(lis1, str1))

> True

这可能不是您正在寻找的最佳解决方案，但我想它可以达到目的。

【讨论】：

all_matches([89, 90, 8990], "8990189290") 失败

【解决方案4】：

如果不允许重叠，这个问题就会变得比最初看起来要困难得多。据我所知，没有其他答案是正确的（见最后的测试用例）。

之所以需要递归，是因为如果一个子字符串出现多次，使用一次而不是另一次可能会阻止找到其他子字符串。

这个答案使用两个函数。第一个查找字符串中子字符串的每一次出现，并返回字符串的迭代器，其中子字符串已被替换为不应出现在任何子字符串中的字符。

第二个函数递归检查是否有办法找到字符串中的所有数字：

def find_each_and_replace_by(string, substring, separator='x'):
    """
    list(find_each_and_replace_by('8989', '89', 'x'))
    # ['x89', '89x']
    list(find_each_and_replace_by('9999', '99', 'x'))
    # ['x99', '9x9', '99x']
    list(find_each_and_replace_by('9999', '89', 'x'))
    # []
    """
    index = 0
    while True:
        index = string.find(substring, index)
        if index == -1:
            return
        yield string[:index] + separator + string[index + len(substring):]
        index += 1


def contains_all_without_overlap(string, numbers):
    """
    contains_all_without_overlap("45892190", [89, 90])
    # True
    contains_all_without_overlap("45892190", [89, 90, 4521])
    # False
    """
    if len(numbers) == 0:
        return True
    substrings = [str(number) for number in numbers]
    substring = substrings.pop()
    return any(contains_all_without_overlap(shorter_string, substrings)
               for shorter_string in find_each_and_replace_by(string, substring, 'x'))

这里是测试用例：

tests = [
    ("45892190", [89, 90], True),
    ("8990189290", [89, 90, 8990], True),
    ("123451234", [1234, 2345], True),
    ("123451234", [2345, 1234], True),
    ("123451234", [1234, 2346], False),
    ("123451234", [2346, 1234], False),
    ("45892190", [89, 90, 4521], False),
    ("890", [89, 90], False),
    ("8989", [89, 90], False),
    ("8989", [12, 34], False)
]

for string, numbers, should in tests:
    result = contains_all_without_overlap(string, numbers)
    if result == should:
        print("Correct answer for %-12r and %-14r (%s)" % (string, numbers, result))
    else:
        print("ERROR : %r and %r should return %r, not %r" %
              (string, numbers, should, result))

以及对应的输出：

Correct answer for '45892190'   and [89, 90]       (True)
Correct answer for '8990189290' and [89, 90, 8990] (True)
Correct answer for '123451234'  and [1234, 2345]   (True)
Correct answer for '123451234'  and [2345, 1234]   (True)
Correct answer for '123451234'  and [1234, 2346]   (False)
Correct answer for '123451234'  and [2346, 1234]   (False)
Correct answer for '45892190'   and [89, 90, 4521] (False)
Correct answer for '890'        and [89, 90]       (False)
Correct answer for '8989'       and [89, 90]       (False)
Correct answer for '8989'       and [12, 34]       (False)

【讨论】：

【解决方案5】：

str1 = "45892190"
lis = [89,90]

for i in lis:
    if str(i) in str1:
        print("The value " + str(i) + " is in the list")

输出：

值 89 在列表中

值 90 在列表中

如果要检查lis中的所有值是否都在str1中，cricket_007的代码

all(str(l) in str1 for l in lis)
out: True

就是你要找的东西

【讨论】：

【解决方案6】：

您可以使用正则表达式进行搜索。

import re
str1 = "45892190"
lis = [89,90]
for i in lis:
  x = re.search(str(i), str1)
  print(x)

【讨论】：

哇，这太棒了！没想到正则表达式会这么方便！
@SouvikRay 不，它没有，请避免在没有真正调查的情况下立即接受答案。尝试将str1 更改为'890'（而不是'45892190'）和it still seems to work just fine。这个答案没有考虑到您在 cmets 中所说的“并且没有重叠”要求。
如果这个答案真的对你有用，而且你实际上并不关心重叠，不像你在 cmets 中所说的那样（这使得这是一个重复的问题......），那么我强烈推荐改用 cricket_007 或 Giovanni Gianni 的答案，因为他们不需要依赖正则表达式。
我同意这段代码不考虑重叠。
@EricDuminil 到目前为止，您的答案看起来很完美，并且在我测试过的情况下实现了目标。

【解决方案7】：

可以使用正则表达式正确实现这一点。生成输入的所有唯一排列，对于每个排列，用“.*”连接术语，然后用“|”连接所有排列。例如，[89, 90, 8990] 变成 89.*8990.*90| 89.*90.*8990| 8990.*89.*90| 8990.*90.*89| 90.*89.*8990| 90.*8990.*89 ，我在每个“|”之后添加了一个空格为了清楚起见。”

以下内容通过了 Eric Duminil 的测试套件。

import itertools
import re

def create_numbers_regex(numbers):
    # Convert each into a string, and double-check that it's an integer
    numbers = ["%d" % number for number in numbers]

    # Convert to unique regular expression terms
    regex_terms = set(".*".join(permutation)
                            for permutation in itertools.permutations(numbers))
    # Create the regular expression. (Sorted so the order is invariant.)
    regex = "|".join(sorted(regex_terms))
    return regex

def contains_all_without_overlap(string, numbers):
    regex = create_numbers_regex(numbers)
    pat = re.compile(regex)
    m = pat.search(string)
    if m is None:
        return False
    return True

然而，这是一个很大的问题，正则表达式的大小在最坏的情况下会随着数字数量的阶乘而增长。即使只有 8 个唯一数字，也就是 40320 个正则表达式。编译该正则表达式需要 Python 几秒钟。

此解决方案可能有用的唯一情况是，如果您有少量数字并且想要搜索大量字符串。在这种情况下，您可能还会研究 re2，我相信它可以处理该正则表达式而无需回溯。

【讨论】：