【问题标题】:KeyError 0 on <dl> tag<dl> 标签上的 KeyError 0
【发布时间】:2018-07-30 08:40:52
【问题描述】:

我正在尝试解析 HTML 网站,但出现 KeyError。

代码如下:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "http://www.kontrakt.szczecin.pl/mieszkanie-sprzedaz-6664m2-339600pln-potulicka-nowe-miasto-szczecin-zachodniopomorskie,351165"

#PL: otwiera połączenie z wybraną stroną, pobieranie zawartości strony (urllib)
#EN: Opens a connection and grabs url

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing (BeautifulSoup)
page_soup = soup(page_html, "html.parser") #html.parser -> zapisujemy do html, nie np. do xml

#PL: zbiera tabelkę z numerami ofert, kuchnią i innymi danymi o nieruchomości z tabelki
#EN: grabs the data about real estate like kitchen, offer no, etc.
containers = page_soup.findAll("section",{"class":"clearfix"},{"id":"quick-summary"})

# print(len(containers)) - len(containers) sprawdza ile takich obiektów istnieje na stronie
#PL: Co prawda na stronie jest tylko jedna taka tabelka, ale dla dobra nauki zrobię tak jak gdyby tabelek było wiele.
#EN: There is only one table, but for the sake of knowledge I do the container variable
container = containers[0]
print(len(container.dl))
print(container.dl[0])

这是显示错误的日志。

runfile('/home/bartosz/Pulpit/web_scrap.py', wdir='/home/bartosz/Pulpit')
36
Traceback (most recent call last):

  File "<ipython-input-70-e826e21c585a>", line 1, in <module>
    runfile('/home/bartosz/Pulpit/web_scrap.py', wdir='/home/bartosz/Pulpit')

  File "/home/bartosz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "/home/bartosz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/bartosz/Pulpit/web_scrap.py", line 30, in <module>
    print(container.dl[0])

  File "/home/bartosz/anaconda3/lib/python3.6/site-packages/bs4/element.py", line 1011, in __getitem__
    return self.attrs[key]

KeyError: 0

len(container.dl) 显示 dl 中有 36 个。如果我执行 len(container.dl.dt),它会显示:1.

【问题讨论】:

    标签: python python-3.x beautifulsoup html-parsing


    【解决方案1】:

    你需要访问元素的内容不是通过直接索引,而是通过.contents属性:

    print(container.dl.contents[0])
    

    应该可以。

    通过直接索引,您可以访问标签的属性,例如。如果是&lt;dl class="myclass"&gt;,那么dl['class'] 将打印myclass

    编辑:

    打印container.dl的所有内容:

    from urllib.request import urlopen as uReq
    from bs4 import BeautifulSoup as soup
    
    my_url = "http://www.kontrakt.szczecin.pl/mieszkanie-sprzedaz-6664m2-339600pln-potulicka-nowe-miasto-szczecin-zachodniopomorskie,351165"
    
    with uReq(my_url) as uClient:
        page_soup = soup(uClient.read(), "html.parser")
    
    container = page_soup.findAll("section",{"class":"clearfix"},{"id":"quick-summary"})[0]
    
    print(len(container.dl))
    print('-' * 80)
    for content in container.dl.contents:
        print(content)
        print('-' * 80)
    

    打印(第一行长度为container.dl.contents):

    36
    --------------------------------------------------------------------------------
    
    
    --------------------------------------------------------------------------------
    <dt>Numer oferty</dt>
    --------------------------------------------------------------------------------
    <dd>351165</dd>
    --------------------------------------------------------------------------------
    <dt>Liczba pokoi</dt>
    --------------------------------------------------------------------------------
    <dd>4</dd>
    --------------------------------------------------------------------------------
    <dt>Cena</dt>
    --------------------------------------------------------------------------------
    <dd><span class="tag price">339 600 PLN</span></dd>
    --------------------------------------------------------------------------------
    <dt>Cena za m2</dt>
    --------------------------------------------------------------------------------
    <dd>5 096 PLN</dd>
    --------------------------------------------------------------------------------
    <dt>Powierzchnia</dt>
    --------------------------------------------------------------------------------
    <dd>66,64 m2</dd>
    --------------------------------------------------------------------------------
    <dt>Piętro</dt>
    --------------------------------------------------------------------------------
    <dd>1</dd>
    --------------------------------------------------------------------------------
    <dt>Liczba pięter</dt>
    --------------------------------------------------------------------------------
    <dd>6</dd>
    --------------------------------------------------------------------------------
    <dt>Typ kuchni</dt>
    --------------------------------------------------------------------------------
    <dd>Aneks</dd>
    --------------------------------------------------------------------------------
    <dt>Balkon</dt>
    --------------------------------------------------------------------------------
    <dd>Tak</dd>
    --------------------------------------------------------------------------------
    <dt>Rodzaj ogrzewania</dt>
    --------------------------------------------------------------------------------
    <dd>CO miejskie</dd>
    --------------------------------------------------------------------------------
    <dt>Gorąca woda</dt>
    --------------------------------------------------------------------------------
    <dd>Wodociąg miejski</dd>
    --------------------------------------------------------------------------------
    <dt>Rodzaj budynku</dt>
    --------------------------------------------------------------------------------
    <dd>Wysoki blok</dd>
    --------------------------------------------------------------------------------
    <dt>Materiał</dt>
    --------------------------------------------------------------------------------
    <dd>Silikat</dd>
    --------------------------------------------------------------------------------
    <dt>Rok budowy</dt>
    --------------------------------------------------------------------------------
    <dd>2019</dd>
    --------------------------------------------------------------------------------
    <dt>Winda</dt>
    --------------------------------------------------------------------------------
    <dd>Tak</dd>
    --------------------------------------------------------------------------------
    <dt>Stan nieruchomości</dt>
    --------------------------------------------------------------------------------
    <dd>Stan deweloperski</dd>
    --------------------------------------------------------------------------------
    <dt>Rynek</dt>
    --------------------------------------------------------------------------------
    <dd>Pierwotny</dd>
    --------------------------------------------------------------------------------
    
    --------------------------------------------------------------------------------
    

    【讨论】:

    • 发生了什么事。我做了 print(container.dl.dt.contents),但它只将列表中的第一个显示为 ['Numer oferty']。我不知道为什么我没有看到 dt 的其余部分。
    • @BartBart 我编辑了我的答案,我打印了所有container.dl.contents
    • 谢谢!我以其他方式管理,但你的似乎更容易。顺便提一句。这是项目主题,我想检查一下:stackoverflow.com/questions/51592631/…
    猜你喜欢
    • 1970-01-01
    • 2020-02-24
    • 1970-01-01
    • 2012-09-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-12-01
    • 1970-01-01
    相关资源
    最近更新 更多