【发布时间】:2018-05-14 00:33:05
【问题描述】:
我正在尝试检索零售网站中的所有类别和子类别。一旦我进入该类别,我就可以使用 BeautifulSoup 来提取该类别中的每一个产品。但是,我正在为类别的循环而苦苦挣扎。我用这个作为测试网站https://www.uniqlo.com/us/en/women
如何循环浏览网站左侧的每个类别以及子类别?问题是您必须在网站显示所有子类别之前单击类别。我想将类别/子类别中的所有产品提取到 csv 文件中。这是我目前所拥有的:
import bs4
import json
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
myurl = 'https://www.uniqlo.com/us/en/women/'
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
filename = "products.csv"
file = open(filename,"w",newline='')
product_list = []
containers = page_soup.findAll("li",{"class" : lambda L: L and
L.startswith('grid-tile')}) #Find all li with class: grid-tile
for container in containers:
product_container = container.findAll("div",{"class":"product-swatches"})
product_names = product_container[0].findAll("li")
for i in range(len(product_names)):
try:
product_name = product_names[i].a.img.get("alt")
product_mod_name = product_name.split(',')[0].lstrip()
print(product_mod_name)
except:
product_name = ''
i +=1
product = [product_mod_name]
print(product)
product_list.append(product)
import csv
with open('products.csv','a',newline='') as file:
writer=csv.writer(file)
for row in product_list:
writer.writerow(row)
【问题讨论】:
标签: python beautifulsoup