【问题标题】:Python Selenium not giving out expected resultsPython Selenium 没有给出预期的结果
【发布时间】:2021-01-06 22:45:23
【问题描述】:
我想抓取 Github 趋势页面并想出了这段代码。由于某种原因,它无法正常工作,而是给出了一些其他会话代码。知道为什么吗?
这是我的代码-
#!/usr/bin/python3
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://github.com/trending')
content_element = driver.find_elements_by_xpath("/html/body/div[4]/main/div[3]/div/div[2]/article[1]/h1/a")
for element in content_element:
print(element)
driver.close()
谢谢
【问题讨论】:
标签:
python
selenium
beautifulsoup
screen-scraping
【解决方案1】:
您可以使用此示例提取带有beautifulsoup 的所有趋势存储库:
import requests
from bs4 import BeautifulSoup
url = 'https://github.com/trending'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for a in soup.select('article h1 a'):
print('{:<50} {}'.format(a.get_text(strip=True, separator=' '), 'https://github.com' + a['href']))
打印:
cli / cli https://github.com/cli/cli
gnebbia / kb https://github.com/gnebbia/kb
schollz / croc https://github.com/schollz/croc
onevcat / Kingfisher https://github.com/onevcat/Kingfisher
moby / moby https://github.com/moby/moby
matterport / Mask_RCNN https://github.com/matterport/Mask_RCNN
google / googletest https://github.com/google/googletest
FreeCAD / FreeCAD https://github.com/FreeCAD/FreeCAD
iamadamdev / bypass-paywalls-chrome https://github.com/iamadamdev/bypass-paywalls-chrome
vuejs / vue-next https://github.com/vuejs/vue-next
microsoft / onefuzz https://github.com/microsoft/onefuzz
twintproject / twint https://github.com/twintproject/twint
lyhue1991 / eat_tensorflow2_in_30_days https://github.com/lyhue1991/eat_tensorflow2_in_30_days
snakers4 / silero-models https://github.com/snakers4/silero-models
hediet / vscode-debug-visualizer https://github.com/hediet/vscode-debug-visualizer
tannerlinsley / react-query https://github.com/tannerlinsley/react-query
proxysu / windows https://github.com/proxysu/windows
mozilla / send https://github.com/mozilla/send
jaywcjlove / linux-command https://github.com/jaywcjlove/linux-command
material-shell / material-shell https://github.com/material-shell/material-shell
iamkun / dayjs https://github.com/iamkun/dayjs
swisskyrepo / PayloadsAllTheThings https://github.com/swisskyrepo/PayloadsAllTheThings
TheCherno / Hazel https://github.com/TheCherno/Hazel
HeroTransitions / Hero https://github.com/HeroTransitions/Hero
pytorch / pytorch https://github.com/pytorch/pytorch