【问题标题】:Python Selenium not giving out expected resultsPython Selenium 没有给出预期的结果
【发布时间】:2021-01-06 22:45:23
【问题描述】:

我想抓取 Github 趋势页面并想出了这段代码。由于某种原因,它无法正常工作,而是给出了一些其他会话代码。知道为什么吗? 这是我的代码-

#!/usr/bin/python3
from selenium import webdriver
from bs4 import BeautifulSoup



driver = webdriver.Firefox()
driver.get('https://github.com/trending')
content_element = driver.find_elements_by_xpath("/html/body/div[4]/main/div[3]/div/div[2]/article[1]/h1/a")

for element in content_element:  
  print(element)

driver.close()

谢谢

【问题讨论】:

    标签: python selenium beautifulsoup screen-scraping


    【解决方案1】:

    您可以使用此示例提取带有beautifulsoup 的所有趋势存储库:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://github.com/trending'
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    
    for a in soup.select('article h1 a'):
        print('{:<50} {}'.format(a.get_text(strip=True, separator=' '), 'https://github.com' + a['href']))
    

    打印:

    cli / cli                                          https://github.com/cli/cli
    gnebbia / kb                                       https://github.com/gnebbia/kb
    schollz / croc                                     https://github.com/schollz/croc
    onevcat / Kingfisher                               https://github.com/onevcat/Kingfisher
    moby / moby                                        https://github.com/moby/moby
    matterport / Mask_RCNN                             https://github.com/matterport/Mask_RCNN
    google / googletest                                https://github.com/google/googletest
    FreeCAD / FreeCAD                                  https://github.com/FreeCAD/FreeCAD
    iamadamdev / bypass-paywalls-chrome                https://github.com/iamadamdev/bypass-paywalls-chrome
    vuejs / vue-next                                   https://github.com/vuejs/vue-next
    microsoft / onefuzz                                https://github.com/microsoft/onefuzz
    twintproject / twint                               https://github.com/twintproject/twint
    lyhue1991 / eat_tensorflow2_in_30_days             https://github.com/lyhue1991/eat_tensorflow2_in_30_days
    snakers4 / silero-models                           https://github.com/snakers4/silero-models
    hediet / vscode-debug-visualizer                   https://github.com/hediet/vscode-debug-visualizer
    tannerlinsley / react-query                        https://github.com/tannerlinsley/react-query
    proxysu / windows                                  https://github.com/proxysu/windows
    mozilla / send                                     https://github.com/mozilla/send
    jaywcjlove / linux-command                         https://github.com/jaywcjlove/linux-command
    material-shell / material-shell                    https://github.com/material-shell/material-shell
    iamkun / dayjs                                     https://github.com/iamkun/dayjs
    swisskyrepo / PayloadsAllTheThings                 https://github.com/swisskyrepo/PayloadsAllTheThings
    TheCherno / Hazel                                  https://github.com/TheCherno/Hazel
    HeroTransitions / Hero                             https://github.com/HeroTransitions/Hero
    pytorch / pytorch                                  https://github.com/pytorch/pytorch
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-05-29
      • 1970-01-01
      • 2019-06-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-01-09
      • 2018-12-08
      相关资源
      最近更新 更多