【问题标题】:Scrape image url抓取图片网址
【发布时间】:2021-05-31 15:58:55
【问题描述】:

我正在尝试使用来自亚马逊的漂亮汤抓取图像源链接,但没有得到正确的输出,我抓取的链接是:https://www.amazon.in/s?bbn=1389401031&rh=n%3A1389401031%2Cp_36%3A1318505031&dc&qid=1622460176&rnid=1318502031&ref=lp_1389401031_nr_p_36_2

下面是代码:

base_url = requests.get("https://www.amazon.in/mobile-phones/b/?ie=UTF8&node=1389401031&ref_=nav_cs_mobiles_9292c6cb7b394d30b2467b8f631090a7")  

base_url

soup = BeautifulSoup(base_url.content,'html.parser')

search_url = soup.find_all("span",class_="a-list-item")

search_url

urls = []

abz = []

for i in search_url:

    for j in i.find_all("a"):

        urls.append(j["href"])


urls

lst = [x for x in urls if "%E2%82%" in x]

links_to_scrap = lst[2:4]

links_to_scrap

img_links = []


for url in links_to_scrap:

    pname = requests.get("https://www.amazon.in/mobile-phones/b/ie=UTF8&node=1389401031&ref_=nav_cs_mobiles_9292c6cb7b394d30b2467b8f631090a7"+url)

    soupp = BeautifulSoup(pname.content,'html.parser')

    image = soupp.find_all("div",class_="a-section aok-relative s-image-wide-3-2-aspect")

    for i in image:

        for j in i.find_all("img"):

            img_links.append(j["src"])
img_links

【问题讨论】:

  • 你得到什么输出有没有错误?请提及。
  • 仅供参考,它是 scrapescraping 不是报废或报废。

标签: html web-scraping beautifulsoup hyperlink data-collection


【解决方案1】:

要从此亚马逊页面获取图片 URL,您可以使用以下示例:

import requests
from bs4 import BeautifulSoup


url = "https://www.amazon.in/s?bbn=1389401031&rh=n%3A1389401031%2Cp_36%3A1318505031&dc&qid=1622460176&rnid=1318502031&ref=lp_1389401031_nr_p_36_2"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

for img in soup.select(".s-image"):
    print(img["src"])

打印:

https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71A9Vo1BatL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71jG5HwkQQS._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71GQUxuSpnS._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/710jkZNub3S._AC_UL320_.jpg
https://m.media-amazon.com/images/I/716nHhG9SWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/713asoeJn7S._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71jG5HwkQQS._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/618MEYCaUQL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71A9Vo1BatL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/51UUJpcldDL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/81WVehzY2+L._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71nrZHQMZ7L._AC_UL320_.jpg
https://m.media-amazon.com/images/I/71U2SiHgbiL._AC_UL320_.jpg
https://m.media-amazon.com/images/I/41QsvcpKaZL._AC_UL320_.jpg

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-05-10
    • 2022-01-27
    • 1970-01-01
    • 2014-01-19
    • 1970-01-01
    • 2020-09-03
    • 1970-01-01
    • 2015-05-25
    相关资源
    最近更新 更多