【发布时间】:2017-11-15 16:19:06
【问题描述】:
我正在尝试使用 python 和 phantomjs 阅读一些新闻文章。 我正在使用无限滚动的网站在滚动到底部时动态加载下一篇文章。 Here 是一个示例 URL。
我设法使用下面的代码来加载另一篇文章,但只加载一篇……谁能帮助我让它无休止地工作?或者任何提示有什么问题,可以改进吗? 谢谢!
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# Pretend to be chrome
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.set_window_size(1120, 550)
## GET
driver.get("https://www.bloomberg.com/news/features/2017-06-08/no-one-has-ever-made-a-corruption-machine-like-this-one")
# print current scrollTop
driver.execute_script('return document.body.scrollTop')
# out: 0
# print current scrollHeight
driver.execute_script('return document.body.scrollHeight')
# out: 18255
# scroll to bottom
driver.execute_script("window.scrollTo(0, 18255)")
# print current scrollTop
driver.execute_script('return document.body.scrollTop')
# out: 17705
# print current scrollHeight
driver.execute_script('return document.body.scrollHeight')
# out: 29050
# It works! Great!
# Scroll to bottom again
driver.execute_script("window.scrollTo(0, 29050)")
# print current scrollTop
driver.execute_script('return document.body.scrollTop')
# out: 28500
# print current scrollHeight
driver.execute_script('return document.body.scrollHeight')
# out: 29050
# It's still the same, no matter how hard I try, it cannot load more...
# According to tolmachofof's suggestion below, I tried to scroll very slowly, still no luck. :<
top = driver.execute_script('return document.body.scrollTop')
height = driver.execute_script('return document.body.scrollHeight')
for i in range(top, height, 100):
driver.execute_script("window.scrollTo(0," + str(i) + ")")
print(driver.execute_script('return document.body.scrollTop'))
sleep(0.2)
【问题讨论】:
标签: javascript python selenium web-scraping phantomjs