尝试用beautifulsoup抓取时出现python循环错误答案

【问题标题】：Error in python loop while trying scraping with beautifulsoup尝试用beautifulsoup抓取时出现python循环错误
【发布时间】：2020-12-19 11:25:26
【问题描述】：

您好，我是网络抓取的新手，我正在尝试学习教程，但我在访问某些项目时遇到了问题！这是我要抓取的页面https://www.newegg.com/todays-deals?cm_sp=Homepage_4spots-_--_-12182020，我想获得产品的标题、品牌和价格，在循环之外一切正常！但我在为所有产品创建循环时出错

#this is  the loop to scrape all items from the webpage
containers = pagesoup.findAll("div",{"class":"item-container"})
for con in containers:
  title = con.img["title"]
  titleco=con.findAll("div",{"class":"item-branding"})
  brand= titleco[0].img["title"]
  priceco=con.findAll("li",{"class":"price-current"})
  priceco[0].text.strip()

我收到这个错误

----> 5 brand= titleco[0].img["title"] 'NoneType' 对象不可下标

【问题讨论】：

尝试只打印titleco。我想你会发现它是空的。所以问题来自它上面的行，具体来说，.finall(...) 调用没有返回任何内容。
不，在这个循环之外我得到了我需要刮的一切！

标签： python web-scraping beautifulsoup

【解决方案1】：

并非您页面上的每个 item-branding 项目都有一个 img ：因此在某些情况下，titleco[0].img 是 None，因此在尝试访问 "title" 元素时会出现错误。

稍后您的price-current 也会遇到另一个问题：有时您会发现零匹配项，因此当您尝试通过priceco[0] 访问ResultSet 的第一个元素时会出错。或者至少我知道，但您的网站似乎对我的国家/地区不可用，因此您可能不会得到相同的结果。

这是您运行的代码版本：

containers = pagesoup.findAll("div", {"class": "item-container"})
for con in containers:
    title = con.img["title"]
    titleco = con.findAll("div", {"class": "item-branding"})
    if titleco[0].img != None:
        brand = titleco[0].img["title"]
    priceco = con.findAll("li", {"class": "price-current"})
    if len(priceco) > 0:
        priceco[0].text.strip()

【讨论】：