【问题标题】:How to extract value from html via BeautifulSoup如何通过 BeautifulSoup 从 html 中提取价值
【发布时间】:2020-12-25 15:48:58
【问题描述】:

我已经通过 BeautifulSoup 解析了我的字符串。

from bs4 import BeautifulSoup
import requests
import re

def otoMoto(link):
    URL = link
    page = requests.get(URL).content

    bs = BeautifulSoup(page, 'html.parser')

    for offer in bs.find_all('div', class_= "offer-item__content ds-details-container"):

        # print(offer)
        # print("znacznik")
        linkOtoMoto = offer.find('a', class_="offer-title__link").get('href')
        # title = offer.find("a")
        titleOtoMoto = offer.find('a', class_="offer-title__link").get('title')
        rokProdukcji = offer.find('li', class_="ds-param").get_text().strip()
        rokPrzebPojemPali = offer.find_all('li',class_="ds-param")
        print(linkOtoMoto+" "+titleOtoMoto+" "+rokProdukcji)
        print(rokPrzebPojemPali)
        break

URL = "https://www.otomoto.pl/osobowe/bmw/seria-3/od-2016/?search%5Bfilter_float_price%3Afrom%5D=50000&search%5Bfilter_float_price%3Ato%5D=65000&search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_financial_option%5D=1&search%5Border%5D=filter_float_price%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D="

otoMoto(URL)

结果:

https://www.otomoto.pl/oferta/bmw-seria-3-x-drive-nowe-opony-ID6Dr4JE.html#d51bf88c70 BMW Seria 3 2016
[<li class="ds-param" data-code="year">
<span>2016 </span>
</li>, <li class="ds-param" data-code="mileage">
<span>50 000 km</span>
</li>, <li class="ds-param" data-code="engine_capacity">
<span>1 998 cm3</span>
</li>, <li class="ds-param" data-code="fuel_type">
<span>Benzyna</span>
</li>]

所以我可以提取单个字符串,但如果我看到同一个类

class="ds-param"

例如,我无法将生产日期分配给变量。如果您有任何想法,请告诉我:)。

祝你有美好的一天!

【问题讨论】:

    标签: html python-3.x web-scraping


    【解决方案1】:

    来自文档:

    某些属性,例如 HTML 5 中的 data-* 属性,其名称不能用作关键字参数的名称:

    data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
    data_soup.find_all(data-foo="value")
    # SyntaxError: keyword can't be an expression
    
    

    您可以通过将这些属性放入字典并将字典作为 attrs 参数传递给 find_all() 来在搜索中使用这些属性:

    data_soup.find_all(attrs={"data-foo": "value"})
    # [<div data-foo="value">foo!</div>]
    

    所以你可以做类似的事情 data_soup.find_all(attrs={"data-code": "year" })[0]. get_text()

    【讨论】:

      猜你喜欢
      • 2011-02-06
      • 2021-10-24
      • 1970-01-01
      • 2019-07-31
      • 1970-01-01
      • 1970-01-01
      • 2019-07-10
      • 2016-06-20
      相关资源
      最近更新 更多