使用 Python 访问不断变化的网页元素答案

【问题标题】：Accessing the elements of a webpage that changes consistently using Python使用 Python 访问不断变化的网页元素
【发布时间】：2017-04-19 08:41:23
【问题描述】：

大家好，今天我有一个具体问题要问，我如何从不断变化的网站（例如在线赌博网站）中抓取数据。当我执行这段代码时，我写了

import requests
from bs4 import BeautifulSoup

def ColorRequest():
    url = 'http://csgoroll.com/#/' # Could add a + pls str(pagesomething) to add on to the url so that it would update
    sourcecode = requests.get(url) #requests the data from the site
    plaintext = sourcecode.text #imports all of the data gathered
    soup = BeautifulSoup(plaintext, 'html.parser') #This hold all of the data, and allows you to sort through all of the data, converts it
    for links in soup.findAll():
        print(links)

ColorRequest()

我得到了页面的 html 输出，但我正在寻找页面加载后显示的元素，而不是构成该页面的元素。

任何有经验的 Python 开发人员都遇到过这个问题，请帮助没有经验的程序员解决这个问题？

【问题讨论】：

标签： python html dynamic screen-scraping

【解决方案1】：

有很多方法可以做到这一点。在下面的问题中，Avi 给出了一个使用dryscrape 和漂亮的汤的例子。

Web-scraping JavaScript page with Python

我没有任何使用干刮的经验，但您也可以使用 selenium webdriver 和像 phantomJS 这样的无头浏览器来做到这一点。

【讨论】：

【解决方案2】：

这是进行此类抓取的“直接”方式。

通常这些“不断变化”的网站是通过 AJAX 更新的，因此您真正应该寻找的是用于更新网站内容的特定请求。

您可以在网站更新时使用fiddler捕获流量，然后找出哪个请求包含您需要的有效信息（在此情况下，可能是赔率或其他）。找到它后，只需模拟请求并提取您需要的任何信息。

【讨论】：