【问题标题】:Problem - Getting all href from beautifulsoup content问题 - 从 beautifulsoup 内容中获取所有 href
【发布时间】:2021-02-22 11:31:59
【问题描述】:

我想从下面的代码中获取所有的 href 链接,但只获取第一个 href。无法解决我错的地方。你能帮我解决这个问题吗?

for i in range(1,3): 
    url = "https://www.gittigidiyor.com/samsung-cep-telefonu?sf=" + str(i)
    r = requests.get(url) 
    source = BeautifulSoup(r.content,"lxml")
    liste = source.find_all('div', attrs={"class":"gg-w-24 gg-d-24 gg-t-24 gg-m-24 root-column padding-none"}) 
    for url in liste:
        url_phone = "https:" + url.a.get("href")

        print(url_phone)

【问题讨论】:

  • 通过print(r.status_code)检查响应状态码,看看你是否得到了预期的结果

标签: python web-scraping beautifulsoup


【解决方案1】:

您需要find_all('a') 并遍历它们,而不是只使用find('a').a,因为它只会抓取它找到的第一个<a> 标签。

import requests
from bs4 import BeautifulSoup
import pandas as pd

for i in range(1,3): 
    url = "https://www.gittigidiyor.com/samsung-cep-telefonu?sf=" + str(i)
    r = requests.get(url) 
    source = BeautifulSoup(r.content,"lxml")
    liste = source.find_all('div', attrs={"class":"gg-w-24 gg-d-24 gg-t-24 gg-m-24 root-column padding-none"}) 
    for url in liste:
        all_hrefs = url.find_all('a', href=True)
        for href in all_hrefs:
            url_phone = "https:" + href['href']
            print(url_phone)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多