【问题标题】:How to extract link from href using beautifulsoup如何使用beautifulsoup从href中提取链接
【发布时间】:2021-11-03 10:24:20
【问题描述】:

我正在尝试从 href 中提取 url,但他们会给我一个空列表

    import requests
    from bs4 import BeautifulSoup
    headers ={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
    }
    r =requests.get('https://www.redfin.com/city/5357/WA/Edmonds')
    soup=BeautifulSoup(r.content, 'html.parser')
    tra=soup.find_all('div',class_='bottomV2')
    for links in tra:
        for link in links.find_all('a',href=True):
            comp=link['href']
            print(comp)

【问题讨论】:

  • 看看你的汤——“糟糕!看起来我们的使用分析算法认为你可能是一个机器人。以自动方式访问 redfin.com 违反了 Redfin 的使用条款。"
  • 现在我能做些什么来解决这些问题请指导我们
  • 您可以进入页面(以人类身份),将其保存为 HTML 文件,然后从文件中将其读入soup

标签: python web-scraping beautifulsoup


【解决方案1】:

这是所需的输出:

import requests
from bs4 import BeautifulSoup
headers ={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
    }
r =requests.get('https://www.redfin.com/city/5357/WA/Edmonds',headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
tra=soup.find_all('div',class_='bottomV2')
for links in tra:
    link=links.find('a',href=True)
    comp=link['href']
    abs_url = f'https://www.redfin.com{comp}'
    print(abs_url)

输出:

https://www.redfin.com/WA/Edmonds/31-Pine-St-98020/unit-301/home/21500390
https://www.redfin.com/WA/Lynnwood/18785-76th-Ave-W-98036/home/175276722      
https://www.redfin.com/WA/Edmonds/8104-238th-St-SW-98026/unit-C/home/2820302  
https://www.redfin.com/WA/Edmonds/6021-145th-St-SW-98026/home/2771575
https://www.redfin.com/WA/Edmonds/21001-88th-Pl-W-98026/home/2652799
https://www.redfin.com/WA/Edmonds/920-Dayton-St-98020/home/2683005
https://www.redfin.com/WA/Edmonds/19730-86th-Pl-W-98026/home/2742187
https://www.redfin.com/WA/Edmonds/1227-8th-Ave-S-98020/home/2757086
https://www.redfin.com/WA/Edmonds/7505-181st-Pl-SW-98026/home/2703399
https://www.redfin.com/WA/Edmonds/1015-Maple-St-98020/home/2682773
https://www.redfin.com/WA/Edmonds/7300-176th-St-SW-98026/home/2718839
https://www.redfin.com/WA/Edmonds/23706-84th-Ave-W-98026/home/2698405
https://www.redfin.com/WA/Edmonds/7217-Meadowdale-Beach-Rd-98026/home/17505861https://www.redfin.com/WA/Edmonds/15419-58th-Pl-W-98026/home/176811540        
https://www.redfin.com/WA/Edmonds/15423-58th-Pl-W-98026/home/176811536        
https://www.redfin.com/WA/Edmonds/840-Daley-St-98020/home/2683310
https://www.redfin.com/WA/Edmonds/6106-136th-Pl-SW-98026/home/2822544
https://www.redfin.com/WA/Shoreline/115-NW-205th-St-98177/unit-115/home/176726040
https://www.redfin.com/WA/Lynnwood/19814-76th-Ave-W-98036/home/2552538        
https://www.redfin.com/WA/Edmonds/15407-58th-Pl-W-98026/home/176686464        
https://www.redfin.com/WA/Lynnwood/18751-76th-Ave-W-98037/home/175276737      
https://www.redfin.com/WA/Edmonds/7903-218th-St-SW-98026/home/2697000
https://www.redfin.com/WA/Edmonds/907-Dayton-St-98020/home/2682997
https://www.redfin.com/WA/Edmonds/17802-Talbot-Rd-98026/home/2754375
https://www.redfin.com/WA/Edmonds/19126-94th-Ave-W-98020/home/175037005       
https://www.redfin.com/WA/Edmonds/191-94th-Ave-W-98020/home/174864268
https://www.redfin.com/WA/Edmonds/12627-Possession-Ln-98026/home/103506228    
https://www.redfin.com/WA/Edmonds/827-Fir-St-98020/home/2763817
https://www.redfin.com/WA/Edmonds/9503-Bowdoin-Way-98020/home/2666913
https://www.redfin.com/WA/Picnic-Point-North-Lynnwood/131-Puget-Sound-Blvd-98026/home/108288661
https://www.redfin.com/WA/Edmonds/23726-100th-Ave-W-98020/home/146161823      
https://www.redfin.com/WA/Edmonds/7223-224th-St-SW-98026/unit-J9/home/2782416 
https://www.redfin.com/WA/Edmonds/0-xxx-Olympic-View-Dr-Unknown/home/175079254https://www.redfin.com/WA/Edmonds/18230-91st-Ave-W-98026/home/2545485
https://www.redfin.com/WA/Edmonds/13531-67th-Ave-W-98026/home/161335515       
https://www.redfin.com/WA/Edmonds/15805-72nd-Ave-W-98026/home/2718730
https://www.redfin.com/WA/Edmonds/95-Main-St-98020/home/175589481
https://www.redfin.com/WA/Edmonds/14920-72nd-Ave-W-98026/home/2713344
https://www.redfin.com/WA/Edmonds/7317-Lake-Ballinger-Way-98026/home/2707535  
https://www.redfin.com/WA/Edmonds/15604-75th-Pl-W-98026/home/112973708        
https://www.redfin.com/WA/Edmonds/Trailside-at-Meadowdale-Beach/Residence-GR-24/home/175129446

【讨论】:

    【解决方案2】:

    只有一种替代方法,您可以使用selenium

    示例

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    driver = webdriver.Chrome('YOUR PATH TO CHROMEDRIVER')
    driver.get('https://www.redfin.com/city/5357/WA/Edmonds')
    soup=BeautifulSoup(driver.page_source, 'html.parser')
    
    tra=soup.find_all('div',class_='bottomV2')
    for links in tra:
        for link in links.find_all('a',href=True):
            comp=link['href']
            print(comp)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-11-10
      • 2021-01-01
      • 1970-01-01
      • 2021-01-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多