在 Beautifulsoup 中使用 select 函数返回 None 值答案

【问题标题】：Using select function in Beautifulsoup returns None value在 Beautifulsoup 中使用 select 函数返回 None 值
【发布时间】：2016-06-09 07:49:45
【问题描述】：

我正在使用 Python-2.7 和 BeautifulSoup

参考我的this 问题，我试图从名称几乎相似的 div 标签中获取内容。因此，我需要严格检查 div 标签的类名。

以下是我的代码-

list = ['Link1','Link2','Link3','Link4',....etc]
for i in list:
    mech = Browser()
    mech.set_handle_robots(False)
    mech.set_handle_equiv(False)
    hadr = {'User-Agent':'Agent'}
    req = urllib2.Request(i,headers=hadr)
    try:
            pan = urllib2.urlopen(req)
            soup = BeautifulSoup(pan, "lxml") 
            tag1 = soup.select("div[class=profile-container abc-profile-container]")
            print "TAG_1",tag1
            tag2 = soup.select("div[class=profile-container]")
            print "TAG_2",tag2
    except Exception as e:
            print e
            print(type(e))

我想进一步说明的是列表中的任何随机链接都包含 tag1 的 div 类，但其输出为空白。

我希望所有具有("div[class=profile-container abc-profile-container]") 的链接都应该接受 tag1 并相应地工作，而不是给出一个空白列表作为输出。

【问题讨论】：

既然您最初说的是Guidance / Help in any form is appreciated，我建议您查看traceback.print_exc 而不是print(e),print(type(e))，它的信息量非常丰富。
您好，非常感谢，但有人建议我编辑它，因此我这样做了。虽然我一定会试试这个
嗯，是的，它吸引了像我这样的主题外的 cmet。 :P

标签： python python-2.7 css-selectors beautifulsoup web-crawler

【解决方案1】：

在.select() 中使用CSS Selectors

tag1 = soup.select("div.profile-container.abc-profile-container")
tag2 = soup.select("div.profile-container")

【讨论】：

我试过了，但它给了我以下输出-TAG_1 [] TAG_2[---the actual content---] 具有来自 tag1 的类的链接没有显示任何内容，因为它显示了 TAG_2[]
你确定汤里有div标签和profile-containerabc-profile-container类吗？我为我的测试网页测试了 tag1 select 和 tag2 select 并且效果很好。
我将整个输出粘贴到记事本中，它显示TAG_1[] TAG_2[---actual content from div.profile-container-----] TAG_1[---actual content from div.profile-container.abc-profile-container-----] TAG_2[---actual content from div.profile-container.abc-profile-container-----]
好吧，您的输出表明，您使用 2 个链接进行了测试，第一个链接有 .profile-container 但没有 .abc-profile-container，第二个链接有两个类。你还想得到什么？根据您的问题，这是您想要的输出。
它就像一个链接具有上述任何一个类，但不是两者，因此我为它制作了 2 个单独的 tag_variables。我无法理解为什么它使用第二个标签而不是第一个标签，其中第一个链接包含 profile-container abc-profile-container 类