点击“加载更多新闻”按钮后 Python 抓取页面答案

【问题标题】：Python scraping pages after hitting the "load more news" button点击“加载更多新闻”按钮后 Python 抓取页面
【发布时间】：2025-12-02 18:10:01
【问题描述】：

我可以使用以下代码来抓取财经新闻网站的首页。

df = pd.DataFrame()
url = 'https://std.stheadline.com/realtime/finance/%E5%8D%B3%E6%99%82-%E8%B2%A1%E7%B6%93'
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"

为了下载后续页面，我需要点击“加载更多新闻”按钮。我使用 Chrome>Inspect>Network 检查了该网站。我发现点击“加载更多新闻”按钮后，请求 URL 是“https://std.stheadline.com/realtime/get_more_news”和表单数据；是“cat=finance&page=3”。我将这两个放在一起并添加了“？”介于两者之间。但是，这样的 URL 不起作用。有什么遗漏吗？

url="https://std.stheadline.com/realtime/get_more_news?cat=finance&page=3"

【问题讨论】：

标签： python-3.x url web-scraping

【解决方案1】：

那个按钮实际上是一个POST 请求，所以除了API 之外不需要寻找任何东西，然后发出正确的请求。

方法如下：

import requests

headers = {
    "Referer": "https://std.stheadline.com/realtime/finance/",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:83.0) Gecko/20100101 Firefox/83.0",
    "X-Requested-With": "XMLHttpRequest",
}
payload = {
    "cat": "finance",
    "page": 4,
}
print(requests.post("https://std.stheadline.com/realtime/get_more_news/", data=payload, headers=headers).json())

这会将您“加载”到下一页的新闻。

【讨论】：