【问题标题】:Python BeautifulSoup can't select specific tagPython BeautifulSoup 无法选择特定标签
【发布时间】:2016-07-04 14:35:07
【问题描述】:

我的问题是在解析网站然后用 BS 加载数据树时。如何查找<em> 标签的内容?我试过了

for first in soup.find_all("li", class_="li-in"):
    print first.select("em.fl.in-date").string

                   #or

    print first.select("em.fl.in-date").contents

但它不起作用。请帮忙。

我正在 tutti.ch 上搜索汽车

这是我的全部代码:

#Crawl tutti.ch
import urllib
thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos"
handle = urllib.urlopen(thisurl)
html_gunk =  handle.read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_gunk, 'html.parser')

for first in soup.find_all("li", class_="li-in"):
    if first.a.string and "Audi" and "BMW" in first.a.string:
        print "Geschafft: %s" % first.a.contents
        print first.select("em.fl.in-date").string
    else:
        print first.a.contents

当它找到宝马或奥迪时,它应该检查汽车何时插入。时间位于这样的 em-Tag 中:

<em class="fl in-date"> Heute <br></br> 13:59 </em>

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:
     first.select("em.fl.in-date").text
    

    假设您的选择器是正确的。您没有提供要抓取的网址,所以我无法确定。

    >>> url = "http://stackoverflow.com/questions/38187213/python-beautifulsoup"
    >>> from bs4 import BeautifulSoup
    >>> import urllib2
    >>> html = urllib2.urlopen(url).read()
    >>> soup = BeautifulSoup(html)
    >>> soup.find_all("p")[0].text
    u'My problem is when parsing a website and then loading the data tree with BS. How can I look for the content of an <em> Tag? I tried '
    

    看到你的代码后,我做了如下改动,看看:

    #Crawl tutti.ch
    import urllib
    thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos"
    handle = urllib.urlopen(thisurl)
    html_gunk =  handle.read()
    
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html_gunk, 'html.parser')
    
    for first in soup.find_all("li", class_="li-in"):
        if first.a.string and "Audi" and "BMW" in first.a.string:
            print "Geschafft: %s" % first.a.contents
            print first.select("em.fl.in-date")[0].text
        else:
            print first.a.contents
    

    【讨论】:

    • 非常感谢亚当巴恩斯。您的代码完美运行!
    • and "Audi" 总是正确的
    猜你喜欢
    • 2015-03-11
    • 2015-09-09
    • 2021-04-17
    • 2021-08-10
    • 1970-01-01
    • 1970-01-01
    • 2019-04-20
    • 1970-01-01
    • 2019-04-26
    相关资源
    最近更新 更多