【发布时间】:2018-11-17 23:52:11
【问题描述】:
此代码旨在从网站上抓取数据变量并将值绘制下来。我正在尝试使用它来绘制图形卡随时间变化的价格。
我正在使用 beautifulsoup,一切正常,但我无法正确打印价格。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://www.newegg.ca/Product/ProductList.aspx?Submit=ENE&N=100007708%20601210955%20601203901%20601294835%20601295933%20601194948&IsNodeId=1&bop=And&Order=BESTSELLING&PageSize=96"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div",{"class":"item-container"})
filename = "GPU Prices.csv"
f = open(filename, "w")
header = "Price,Product Brand,Product Name,Shipping Cost\n"
f.write(header)
for container in containers:
price_container = container.findAll("li", {"class":"price-current"})
price = price_container[0].text.strip()
brand = container.div.div.a.img["title"]
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text
shipping_container = container.findAll("li", {"class":"price-ship"})
shipping = shipping_container[0].text.strip()
price
f.write(price.replace(",", "") + "," + brand.replace(",", ".") + "," + product_name.replace(",", " |") + "," + shipping + "\n")
f.close()
运行后,csv文件如下所示:
【问题讨论】:
-
价格打印不正确是什么意思?您不想打印可用的报价吗?或在 excel 中格式化它们?
-
如果您看到附加的屏幕截图,价格栏会跳过带有“-”和“|”等条目的行。它不是全部统一,而是创建了额外的线条,因此所有东西都排成一行,但成本却没有。
标签: python html csv web-scraping beautifulsoup