【发布时间】:2020-12-10 16:41:47
【问题描述】:
我正在尝试学习 Python 抓取,我想从网站获取类别和子类别并将它们放入 json 文件中。你能告诉我怎么做吗?对此,我真的非常感激。非常感谢您。
import BeautifulSoup
import os
import urllib
from urllib.request import urlopen , urlretrieve
from datetime import datetime as dt
import os.path
import json
cats_name=[]
sub_cats_name = []
theUrl = 'https://divar.ir/s/tehran'
for j in range(1,3):
result = requests.get(theUrl.format(j))
resultc =result.content
print(result.text)
sp = BeautifulSoup(result.text ,'html.parser')
print(sp.prettify())
cut_soup1 = sp.findAll('ul', attrs={'class':'kt-accordion'})
cut_soup2 = sp.findAll('li', attrs ={"kt-accordion-item kt-accordion-item--with-icon kt-accordion-item_header"})
for i in range(0,len(cut_soup1)):
cats_name.append(cut_soup1[i].text)
sub_cats_name.append(cut_soup2[i].text)
print("categories: ".format(j), cats_name ," sub-cats: " + sub_cats_name)
【问题讨论】:
-
您的问题是“如何在网页上查找特定内容”还是“如何将内容保存在 json 文件中”?
标签: python json web-scraping