【发布时间】:2021-05-31 15:58:55
【问题描述】:
我正在尝试使用来自亚马逊的漂亮汤抓取图像源链接,但没有得到正确的输出,我抓取的链接是:https://www.amazon.in/s?bbn=1389401031&rh=n%3A1389401031%2Cp_36%3A1318505031&dc&qid=1622460176&rnid=1318502031&ref=lp_1389401031_nr_p_36_2
下面是代码:
base_url = requests.get("https://www.amazon.in/mobile-phones/b/?ie=UTF8&node=1389401031&ref_=nav_cs_mobiles_9292c6cb7b394d30b2467b8f631090a7")
base_url
soup = BeautifulSoup(base_url.content,'html.parser')
search_url = soup.find_all("span",class_="a-list-item")
search_url
urls = []
abz = []
for i in search_url:
for j in i.find_all("a"):
urls.append(j["href"])
urls
lst = [x for x in urls if "%E2%82%" in x]
links_to_scrap = lst[2:4]
links_to_scrap
img_links = []
for url in links_to_scrap:
pname = requests.get("https://www.amazon.in/mobile-phones/b/ie=UTF8&node=1389401031&ref_=nav_cs_mobiles_9292c6cb7b394d30b2467b8f631090a7"+url)
soupp = BeautifulSoup(pname.content,'html.parser')
image = soupp.find_all("div",class_="a-section aok-relative s-image-wide-3-2-aspect")
for i in image:
for j in i.find_all("img"):
img_links.append(j["src"])
img_links
【问题讨论】:
-
你得到什么输出有没有错误?请提及。
-
仅供参考,它是 scrape 和 scraping 不是报废或报废。
标签: html web-scraping beautifulsoup hyperlink data-collection