如果字符串包含来自列表答案

【问题标题】：if string contains from list如果字符串包含来自列表
【发布时间】：2021-09-26 00:23:00
【问题描述】：

我想检查是否有任何被排除的网站出现。我可以让它只在一个站点上工作，但是一旦我把它列在列表中，它就会在if donts in thingy 出错：

TypeError: 'in' 需要字符串作为左操作数，而不是元组"

这是我的代码：

import requests 
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if donts in thingy:

        pass

    else:

        print (thingy)

【问题讨论】：

因为 donts 是一个元组，它需要一个字符串。
我认为您的意思是 if thingy in donts: 尽管 if thingy not in donts: 会比使用无操作 then 子句更直接。

标签： python parsing beautifulsoup

【解决方案1】：

import requests 

from bs4 import BeautifulSoup

from lxml import html, etree

import sys

import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if thingy in donts :

        print (thingy)

    else:

        pass

判断：元组中的字符串

【讨论】：

【解决方案2】：

问题的症结在于您如何搜索排除列表：

excluded = ("a", "b", "c")
links = ["a", "d", "e"]

for site in links:
    if site not in excluded:  # We want to know if the site is in the excluded list
        print(f"Site not excluded: {site}")

颠倒你的元素的顺序，这应该可以正常工作。我在这里颠倒了你的逻辑，所以你可以跳过不必要的pass。

作为旁注，这是清晰的变量名称可以提供帮助的原因之一 - 它们将帮助您推理逻辑应该做什么。特别是在存在in 等人体工程学的Python 中，这非常有用。

【讨论】：

修复了，谢谢！

【解决方案3】：

import requests 

from bs4 import BeautifulSoup

from lxml import html, etree

import sys

import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if any(d in thingy for d in donts):

        pass

    else:

        print (thingy)

【讨论】：

这不起作用，因为str 被视为可迭代：>>> excluded = ("aa", "bb", "b") >>> site = "bbb" >>> any(d in site for d in excluded) True 本质上，即使排除的站点是 'bb'，站点 'bbb' 也会导致 if 为真'。