【发布时间】:2021-04-04 13:10:37
【问题描述】:
我试图通过网络抓取来获取所有项目标题和创建者姓名,并且大部分都可以正常工作,但是当我尝试使用“加载更多”按钮抓取无限滚动页面时,我得到了一个“TimeoutException:消息:”。请让我知道什么是错的以及我需要纠正什么。谢谢
下面是当前使用的代码:
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get("https://www.kickstarter.com/discover/advanced?sort=newest&seed=2695789&page=1/")
button = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,'bttn keyboard-focusable bttn-medium bttn-primary theme--create fill-bttn-icon hover-fill-bttn-icon')))
button.click()
names=[]
creators=[]
soup = BeautifulSoup(driver.page_source)
for a in soup.findAll('div',{'class':'js-react-proj-card grid-col-12 grid-col-6-sm grid-col-4-lg'}):
name=a.find('div', attrs={'class':'clamp-5 navy-500 mb3 hover-target'})
creator=a.find('div', attrs={'class':'type-13 flex'})
names.append(name.h3.text)
creators.append(creator.text)
df = pd.DataFrame({'Name':names,'Creator':creators})
【问题讨论】:
标签: python selenium web-scraping