【问题标题】:AttributeError: 'NavigableString' object has no attribute and Index out of rangeAttributeError:“NavigableString”对象没有属性并且索引超出范围
【发布时间】:2017-12-17 02:54:38
【问题描述】:

任何人都可以提供帮助,因为我正在努力两天,以便在此处提供的链接中检索有关产品的基本信息(产品名称、图片、评级、价格)。这是我的代码,我是 python 新手

import urllib.request
from bs4 import BeautifulSoup
from random import randint
from bs4.dammit import EncodingDetector
import re
import sys


url='https://fr.aliexpress.com/category/205000316/men-clothing-accessories.html'
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0,Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0',Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10; rv:33.0) Gecko/20100101 Firefox/33.0',Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko',Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; EN; rv:11.0) like Gecko',Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.0)',Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A'"

req = urllib.request.Request(url, headers = headers)
html = urllib.request.urlopen(req).read()

soup = BeautifulSoup(html.decode('utf8', 'ignore'), "html.parser")


# retrive infos such product name, price , rating
Prod=soup.find_all('ul', class_='util-clearfix son-list')

for item in Prod:
    print(item.contents[0].find_all("span",{"class:","star star-s "[0].text)
    print(item.contents[0].find_all("span",{"class:","star star-s "[0].text)
    print(item.contents[0].find_all("span",{"class:","star star-s "[0].text)

【问题讨论】:

  • 为什么你会得到三倍相同的元素?也许在第一个 for 内使用第二个 for 循环。
  • 看起来你一遍又一遍地打印相同的索引应该可能使用动态索引,如内容[i]

标签: python python-3.x web-scraping beautifulsoup


【解决方案1】:

您的第一个错误是 Prod 实际上匹配产品列表 - ul 元素,而您需要内部 li 元素 - 每个都代表一个产品容器。

然后,一旦您更改它以定位产品 - 遍历产品容器并查找包含名称、评级和其他所需信息的内部元素。为此,您需要浏览器开发人员工具来了解哪些 HTML 元素代表您需要提取的数据:

products = soup.select('ul.son-list li.list-item')
for product in products:
    name = product.select_one("a.product").get_text()
    stars_element = product.select_one(".star")
    rating = stars_element["title"].split(": ")[1].strip().split(" ", 1)[0] if stars_element else "Unknown rating"

    print(name, rating)

打印:

Lurker Requin Peau Soft Shell V4 Tactique Militaire Veste Hommes Imperméable Coupe-Vent Chaud Manteau À Capuchon de Camouflage C... 4.8
HEYGUYS coton t chemises hommes new summer street wear hanche hop T-SHIRTS 2017 marque de mode fermeture éclair sur la manche t-... 4.8
2017 Nutella Motif Hommes et Femmes Hoodies Couples Casual Style 3D Impression Personnalité Automne Hiver Sweats À Capuche Survê... 4.8
...

【讨论】:

  • 嗨,谢谢,这很完美,但你能解释一下这条线 rating = stars_element["title"].split(": ")[1].strip().split(" ", 1) [0] if stars_element else "Unknown rating"
  • @Ilyas 这只是从“title”属性中获取数字评分值的一种方法..
  • 非常感谢 alecxe 但如果我想将此列表插入到 sqlite 中,我该怎么做,你知道吗,但我已经尝试过,但没有成功
【解决方案2】:

产品名称、图片、价格:

import urllib.request
from bs4 import BeautifulSoup

url='https://fr.aliexpress.com/category/205000316/men-clothing-accessories.html'

headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0,Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0',Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10; rv:33.0) Gecko/20100101 Firefox/33.0',Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko',Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; EN; rv:11.0) like Gecko',Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.0)',Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A'"

req = urllib.request.Request(url, headers = headers)
html = urllib.request.urlopen(req).read()

soup = BeautifulSoup(html.decode('utf8', 'ignore'), "html.parser")

products = soup.find_all('div', class_='item')

for item in products:
    print(' item:', item.find(class_='info').find("a").text)
    print('price:', item.find(class_="price").find(class_='value').text)
    print('image:', item.find(class_="pic").find("img")['src'])
    print('--')

结果:

 item: Rocksir punisher t chemises pour hommes t-shirt Coton de mode marque t shirt hommes Casual Manches Courtes le punisher T-shirt h...
price: € 12,05 - 12,92
image: //ae01.alicdn.com/kf/HTB1ByVZSpXXXXcxaXXXq6xXFXXXW/New-Design-Male-Novelty-Men-T-shirt-Fashion-Cotton-O-neck-Hip-Hop-T-shirt-.jpg_220x220.jpg
--
 item: DIFFELEMENT 2017 Nouveau style long Manteau Hommes marque vêtements mode Long Vestes Manteaux marque-vêtements hommes Pardessus ...
price: € 44,82
image: //ae01.alicdn.com/kf/HTB12IqMXEAKL1JjSZFkq6y8cFXa2/DIFFELEMENT-2017-New-style-long-Coat-Men-brand-clothing-fashion-Long-Jackets-Coats-brand-clothing-mens.jpg_220x220.jpg
--

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2014-10-23
    • 1970-01-01
    • 1970-01-01
    • 2011-08-11
    • 2017-12-16
    • 2019-02-09
    • 2015-05-26
    • 1970-01-01
    相关资源
    最近更新 更多