从使用 Python 从 JavaScript/JSON 数组（POST 请求）加载数据的网站抓取数据答案

【问题标题】：Scrape the data from a website that loads the data from JavaScript / JSON Array (POST request) using Python从使用 Python 从 JavaScript/JSON 数组（POST 请求）加载数据的网站抓取数据
【发布时间】：2020-03-27 18:19:27
【问题描述】：

我正在尝试从this link 抓取数据

这个方法我试过了

from bs4 import BeautifulSoup
import urllib.request
import csv

# specify the url
urlpage =  'https://www.ikh.se/sysNet/getProductsJSON/getProductsJSONDB.aspx?' \ 
'sua=1&lang=2&navid=19277994'

# query the website and return the html to the variable 'page'
page = urllib.request.urlopen(urlpage)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
tag = soup.find('div', attrs={'class':'dnsCell'})
text = (''.join(tag.stripped_strings))
print (page)

我得到了 HTML dom，但缺少产品列表 dom。实际上，我猜产品列表 dom 由来自 this link 的请求的 JSON 数组管理，但 我不确定 产品列表 dom 加载方法。我是对还是错。我想从site 中抓取所有产品详细信息并导出到 Excel 中。

【问题讨论】：

最终结果应该是什么样子？
docs.scrapy.org/en/latest/topics/dynamic-content.html

标签： python web web-scraping beautifulsoup scrapy

【解决方案1】：

请求库不加载 Javascript。如果要下载完全渲染的网站，请使用 selenium 库：https://selenium-python.readthedocs.io/

【讨论】：

你可以添加示例代码，这样我可以更好地理解
medium.com/cs-note/… : - 这可能对你有帮助