如何让网页以手动向下滚动时的方式以编程方式加载内容？答案

【问题标题】：How do I get a webpage to programmatically load content the way it does when I manually scroll down?如何让网页以手动向下滚动时的方式以编程方式加载内容？
【发布时间】：2018-06-28 06:47:23
【问题描述】：

我想从this 网站上抓取一些新闻链接。为此，我的代码是这样的：

from bs4 import BeautifulSoup
import requests

base = "https://www.philstar.com/business/"
page = requests.get(base)
soup = BeautifulSoup(page.text, "html.parser")

li_box = soup.find_all("href")

links = open("News article links.txt", "w+")

for a in li_box:
    links.write(base+a['href']+"\n")

问题是，它只能找到登陆页面上显示的大约 15-16 个链接。如果您手动向下滚动到页面底部，您可以看到它加载了更多新闻内容。滚动更多，它将加载更多，依此类推。代码无法执行此“向下滚动以查看更多”部分。我如何抓取所有这些新闻（或者说，前 1000 个）？

【问题讨论】：

如果您想坚持使用requests 并仍然从该网页获取所有数据，请使用try manipulating this link according to the query。

标签： python html css web-scraping beautifulsoup

【解决方案1】：

您必须为此使用Selenium。我已经稍微修改了你的代码，它会让你知道如何去做。

试试这个：

from bs4 import BeautifulSoup
import requests
from selenium import webdriver
import time

browser = webdriver.Chrome('--path--')      # here path of driver if it didn't find it.

base = "https://www.philstar.com/business/"

browser.get(base)

''' to auto scroll page '''
SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

html_source = browser.page_source
soup = BeautifulSoup(html_source, "html.parser")


li_box = soup.find_all('a')     # here whatever you want to find
print(li_box)

希望对您有所帮助！ :) 谢谢！

【讨论】：

非常感谢！我以前不知道硒，直到第一个回答者写了它。我只是在研究如何开始使用它，你的代码肯定会帮助我快速跟踪这个过程，所以再次感谢！顺便问一下，你怎么知道我使用的是 chrome 驱动程序？我问是因为当我检查 selenium 的文档时，大多数时候，他们对使用 firefox 浏览器是聋子。那么，您的代码使用 chrome 浏览器是巧合吗？只是好奇:)
那很好。我的荣幸！ :) 我不知道您使用的是 chrome，这只是巧合，因为我也使用了 chrome 而不是 firefox :)

【解决方案2】：

对于这种情况，我可能会考虑使用Selenium。

使用 Selenium，您可以使用页面滚动方法来模拟用户滚动浏览网页的行为。有关一些指导，请参阅以下内容：

http://selenium-python.readthedocs.io/faq.html#how-to-scroll-down-to-the-bottom-of-a-page http://blog.varunin.com/2011/08/scrolling-on-pages-using-selenium.html

【讨论】：