【问题标题】:BeautifulSoup extract the value without class in PythonBeautifulSoup 在 Python 中提取没有类的值
【发布时间】:2019-12-17 09:49:46
【问题描述】:

我想在 Python 中使用 BeautifulSoup 提取数据。

我的文件:

<div class="listing-item" data-id="309531" data-score="0">

  <div class="thumb">
    <a href="https://res.cloudinary.com/">

      <div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
    </a>
  </div>
</div>

这里我想获取背景图片的URL

<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>

我的代码:

from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests

url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'

print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))

for page in range(0, 40):   # <--- Increase to number pages you want
    response = requests.get(url.format(page))
    soup = BeautifulSoup(response.text, 'lxml')

    for title, price, date, thumb  in zip(soup.select('.listing-item .title'),
                            soup.select('.listing-item .price'),
                            soup.select('.listing-item .date'),
                            soup.select('.listing-item .thumb')):

        print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), thumb.get_text().strip()))

如何从文档中获取背景图片 URL?

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    您可以通过在 thumb 值中搜索来访问该网址。

    你可以试试这个:

    代码:

    from textwrap import shorten
    from bs4 import BeautifulSoup
    from urllib.parse import parse_qsl, urljoin, urlparse
    import requests
    
    url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
    
    print('{:^50} {:^15} {:^25} '.format('Title', 'Price', 'Date'))
    
    for page in range(0, 1):   # <--- Increase to number pages you want
        response = requests.get(url.format(page))
        soup = BeautifulSoup(response.text, 'lxml')
    
        for title, price, date, thumb  in zip(soup.select('.listing-item .title'),soup.select('.listing-item .price'),soup.select('.listing-item .date'),soup.select('.listing-item .thumb')):
            # url = thumb.find('div').get('style').split('url(')[1].split(');')[0])
            print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(),50),price.get_text().strip(), thumb.find('div').get('style').split('url(')[1].split(');')[0]))
    

    【讨论】:

    • 你可以用thumb.find('a')['href']代替thumb.find('div').get('style').split('url(')[1].split(');')[0]
    • @Shijith 我想他想要图片链接。在这种情况下,href 链接到另一个页面,图像存储在背景图像中
    【解决方案2】:

    需要使用find_next('div')获取div元素,然后获取style属性。使用正则表达式获取Image Url。

    试试下面的代码。

    from textwrap import shorten
    from bs4 import BeautifulSoup
    import requests
    import re
    
    url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
    
    print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))
    
    for page in range(0, 40):   # <--- Increase to number pages you want
        response = requests.get(url.format(page))
        soup = BeautifulSoup(response.text, 'lxml')
    
        for title, price, date, thumb  in zip(soup.select('.listing-item .title'),
                                soup.select('.listing-item .price'),
                                soup.select('.listing-item .date'),
                                soup.select('.listing-item .thumb')):
    
    
            print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), re.search("https?:\/\/[^\s]+[^);]", thumb.find_next("div")['style']).group(0)))
    

    这是控制台上的一些输出:

    G91911 - Excellent for PORSCHE                     AED 59,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:88887,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:J,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
    R 199                                              AED 49,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2122,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:M,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
    88887 J                                            AED 49,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2212,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:S,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
    M2122                                              AED 52,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:22022,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:J,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
    S 2212                                             AED 309,000               https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:5000,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:L,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
    22022 J                                            AED 9,500                 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:5945,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:H,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
    5000 L                                             AED 2,800,000             https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:90,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:Z,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
    Dubai                                              AED 760,000               https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:10000,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:H,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
    

    【讨论】:

      猜你喜欢
      • 2018-03-04
      • 2014-11-23
      • 2022-01-09
      • 2013-08-28
      • 2014-12-21
      • 2023-03-18
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多