【发布时间】:2021-01-05 07:07:51
【问题描述】:
我正在尝试获取 JSON 对象但出现错误,我正在使用 BeautifulSoup。 我无法删除“window.pageData=”来完美地做到这一点。使用 .replace 方法替换“window.pageData=”也出错,但无法成功。 我的代码:
link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
my_script=all_scripts[3]
jsData = re.search(r'window.pageData=', my_script.text)
data = json.loads(jsData.group(1))
这是我的脚本
<script>window.pageData={
"mods": {
"listItems": [
{
"name": "Mint leaf Powder (পুদিনা পাতা গুড়া) (১০০গ্রাম)- Pudina Pata Gura",
"nid": "125018674",
"productUrl": "//www.daraz.com.bd/products/mint-leaf-powder-pudina-pata-gura-i125018674-s1045213986.html?search=1",
"image": "https://static-01.daraz.com.bd/p/e742aabbea46336304f2081a29de1139.jpg",
"originalPrice": "180.00",
"originalPriceShow": "৳ 180",
"price": "171",
}
]
}
}</script>
【问题讨论】:
-
这里根本不需要 BeautifulSoup。试试这个
print(json.loads(re.findall(r"window\.pageData=(.*?)</",r.text)[0]))
标签: python json web-scraping beautifulsoup