【问题标题】:Image link cant be extracted from scrapy无法从scrapy中提取图像链接
【发布时间】:2021-05-15 22:04:57
【问题描述】:

我正在使用scrapy从www.flipkart.com网站提取产品的图片链接,但返回的输出是一个空列表我尝试了所有组合请帮忙 '''

class StackSpider(Spider):
   name = "stack"
   allowed_domains = ["flipkart.com"]
   start_urls = [
    "https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen",
]
   def parse(self, response):
    
     image_url = response.css('._2r_T1I::attr(src)').extract()
     item['image_url'] = image_url
     yield item
   

'''

【问题讨论】:

    标签: python html web-scraping scrapy


    【解决方案1】:

    我认为这可能与网站政策有关。 下面的代码非常适合这个网站

    url = "http://hdresim.net/"
    source = requests.get(url).text
    
    soup = BeautifulSoup(source, "lxml")
    main = soup.find("div", attrs={"class":'icerik'})
    for i in main.findAll("img"):
      print(i["src"])
    

    结果如下:

    http://hdresim.net/resimler/r6/k_karinca(10).jpg
    http://hdresim.net/resimler/r3/k_deniz_canlilari(38).jpg
    http://hdresim.net/resimler/r5/k_sonbahar14(44).jpg
    http://hdresim.net/resimler/r7/k_yaz16(3).jpg
    ..
    

    但是当我对您的网站尝试相同的方法时,我会得到空列表

    url = "https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen"
    source = requests.get(url).text
    soup = BeautifulSoup(source, "lxml")
    
    main = soup.findAll("div", attrs={"class":"_13oc-S"})
    
    for images in  soup.findAll("img", attrs={"class":"_2r_T1I"}):
      print(images["src"])
    

    【讨论】:

      猜你喜欢
      • 2019-03-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-12
      • 1970-01-01
      • 1970-01-01
      • 2016-01-23
      • 1970-01-01
      相关资源
      最近更新 更多