【发布时间】:2018-10-04 18:16:19
【问题描述】:
我正在使用 BeautifulSoup 抓取一些数据,并希望将这些数据写入 json 文件。我已经能够编写脚本来将数据保存到 json 文件中,但它只保存页面上的最后一项并且不会遍历所有结果。它在终端中打印出每个结果。我不确定我错过了什么。这是我的代码
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
import json
otl_url = 'https://open.umn.edu/opentextbooks/SearchResults.aspx?subjectAreaId=99'
#opening up connection and grabbing page
uClient = urlopen(otl_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs info for each textbook
containers = page_soup.findAll("div",{"class":"twothird"})
data = {}
for container in containers:
data['title'] = container.h2.text
data['author'] = container.p.text
data['link'] = "https://open.umn.edu/opentextbooks/" + container.h2.a["href"]
print("title: " + data['title'])
print("author: " + data['author'])
print("link: " + data['link'])
with open("textbooks.json", "w") as writeJSON:
json.dump(data, writeJSON, ensure_ascii=False)
【问题讨论】:
标签: python json beautifulsoup