【问题标题】:Image link cant be extracted from scrapy无法从scrapy中提取图像链接
【发布时间】:2021-05-15 22:04:57
【问题描述】:
我正在使用scrapy从www.flipkart.com网站提取产品的图片链接,但返回的输出是一个空列表我尝试了所有组合请帮忙
'''
class StackSpider(Spider):
name = "stack"
allowed_domains = ["flipkart.com"]
start_urls = [
"https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen",
]
def parse(self, response):
image_url = response.css('._2r_T1I::attr(src)').extract()
item['image_url'] = image_url
yield item
'''
【问题讨论】:
标签:
python
html
web-scraping
scrapy
【解决方案1】:
我认为这可能与网站政策有关。
下面的代码非常适合这个网站
url = "http://hdresim.net/"
source = requests.get(url).text
soup = BeautifulSoup(source, "lxml")
main = soup.find("div", attrs={"class":'icerik'})
for i in main.findAll("img"):
print(i["src"])
结果如下:
http://hdresim.net/resimler/r6/k_karinca(10).jpg
http://hdresim.net/resimler/r3/k_deniz_canlilari(38).jpg
http://hdresim.net/resimler/r5/k_sonbahar14(44).jpg
http://hdresim.net/resimler/r7/k_yaz16(3).jpg
..
但是当我对您的网站尝试相同的方法时,我会得到空列表
url = "https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen"
source = requests.get(url).text
soup = BeautifulSoup(source, "lxml")
main = soup.findAll("div", attrs={"class":"_13oc-S"})
for images in soup.findAll("img", attrs={"class":"_2r_T1I"}):
print(images["src"])