抓取 Amcharts 交互式数据答案

【问题标题】：Scraping Amcharts interactive data抓取 Amcharts 交互式数据
【发布时间】：2026-02-02 17:05:01
【问题描述】：

我正在尝试从此页面上的图表中获取股票市值数据：https://www.macrotrends.net/stocks/charts/AAPL/apple/market-cap

我首先查看了网络响应，但没有看到任何内容。当悬停在图表上时，我可以看到类 amcharts-balloon-div 的 div 出现（并且日期和值出现在图表上），但我无法追踪在 JS 中的哪个位置被调用（可能只是我不知道在哪里看）。我还注意到 chartData 属性有时会出现在 window 对象中，但它并不总是存在。

我希望有人可以推荐如何查找和获取数据，以及从加载页面开始的跟踪数据的过程。任何帮助表示赞赏。

【问题讨论】：

标签： json web-scraping amcharts

【解决方案1】：

数据隐藏在该页面上脚本内的变量中。提取它是一个有点复杂的过程（可能还有其他我没有想到的过程），但这应该可以让你到达那里（使用几个 python 库）：

import requests
from bs4 import BeautifulSoup as bs
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Connection': 'keep-alive',
    'Referer': 'https://www.macrotrends.net/stocks/charts/AAPL/apple/market-cap',
    'Upgrade-Insecure-Requests': '1',
    'TE': 'Trailers',
}

params = (
    ('t', 'AAPL'),
)

response = requests.get('https://www.macrotrends.net/assets/php/market_cap.php', headers=headers, params=params)
soup = bs(response.text,'lxml')
#the above gets you the page contents, with the target data; we now look for the scripts on that page
scpt = soup.select('script')
val_dict = {} #intialize a dictionary to house the data at the end
#there are numerous scripts on that page; we need to single out the relevant script
for s in scpt:
    if len(s.contents)>0 and 'var chartData = ' in s.contents[0]:
        #the above selects the one script with the data; from now on, everything is string and list manipulations to extract the data and append it to the dictionary
        vr = s.contents[0].split('var chartData = ')[1].split(';\r')[0]
        vals = vr.split(',')
        for d,v in zip(vals[::2],vals[1::2]):
            val_dict[d.split(':')[1]]= v.split(':')[1].replace('}','')

输出的随机部分：

date:  "2005-06-10" value:  29.19
date:  "2005-06-13" value:  29.26
date:  "2005-06-14" value:  29.34
date:  "2005-06-15" value:  30.27
date:  "2005-06-16" value:  30.96

等等

【讨论】：

很棒的东西。我能问一下你是如何设法追踪到它的吗？除了我在问题中提到的内容之外，您还有什么建议的一般流程吗？
@SuperCodeBrah 恐怕我不知道任何通用过程（我希望存在一个）。我通常做的是在 xhr 中查找 json 字符串；如果（像这里）没有，我只是按大小对响应进行排序 - 通常（但当然，并非总是）数据潜伏在较大尺寸的响应中（就像这里的情况一样）。从那里开始，这只是繁重的工作......
在您确定链接后我注意到的一件事 - 它在 iframe 中作为 src 属性（不知道我之前是如何错过的）。在网络响应中搜索相对链接 assets/php/market_cap.php 会显示相关文件，并且从那里更容易。在这种情况下，iframe 页面非常简单，它只是像您的答案一样解析脚本。我还看到，在单独加载 iframe 页面时，chartData 属性总是添加到window，因此如果有人更喜欢这种方法，可能还有其他工具可以直接从window 获取它。
@SuperCodeBrah 我猜想使用 Selenium 是一种这样的工具。