启动 My Scraper 后，我没有得到输出答案

【问题标题】：After starting My Scraper I do not get an output启动 My Scraper 后，我没有得到输出
【发布时间】：2018-08-13 16:12:19
【问题描述】：

我正在运行一个抓取工具来检索产品名称、货号、尺寸和价格，但是当我运行脚本时它没有给我输出或错误消息。我正在为此使用 Jupyter Notebook，但不确定这是否是问题所在。我也不确定是否因为我将其输入到 CSV 文件中，如果这也给它带来了问题。任何帮助将不胜感激。

这是我正在运行的代码。

from selenium import webdriver
import csv, os
from bs4 import BeautifulSoup

os.chdir(r'C:\Users\kevin.cragin\AppData\Local\pip\Cache\wheels\09\14\7d\1dcfcf0fa23dbb52fc459e5ce620000e7dca7aebd9300228fe') 
driver = webdriver.Chrome()
driver.get('https://www.biolegend.com/en-us/advanced-search?GroupID=&PageNum=1')
html = driver.page_source

containers = html.find_all('li', {'class': 'row list'})

with open("BioLegend_Crawl.csv", "w") as f:

    f.write("Product_name, CatNo, Size, Price\n")

    for container in containers:

        product_name = container.find('a',{'itemprop':'name'}).text
        info = container.find_all('div',{'class':'col-xs-2 noPadding'})
        catNo = info[0].text.strip()
        size = info[1].text.strip()
        price = info[2].text.strip()

        print('Product_name: '+ product_name)
        print('CatNo: ' + catNo)
        print('Size: ' + size)
        print('Price: ' + price + '\n')

        f.write(','.join([product_name,catNo,size,price]))

【问题讨论】：

你有没有检查containers的大小看它是否为空？
另外，您没有在脚本中的任何地方使用headers 字符串。你的意思是f.write(headers) 而不是f.write('header')？
@rahlf23 我分别测试了每个容器，看看它们是否会拉动，它们会拉动，但当我将它们全部放在同一个脚本中时，它们不会拉动。此外，我不确定是否由于此页面的大小是否也会给它带来问题。
我将在下面发布您的脚本的简化版本供您测试。今天晚些时候，我可以在没有防火墙的家用笔记本电脑上进行测试，以确保它可以正常工作。
如果您有兴趣提取的内容不是动态加载的，那么如果内容的大小（信息量）实际上会减慢您的速度，我会感到非常惊讶......

标签： python selenium selenium-webdriver web-scraping beautifulsoup

【解决方案1】：

您使用的网站在技术上是从数据库中加载信息，因此默认情况下加载的产品名称并未在网站 HTML 中预设。它们必须根据搜索约束动态加载。

因此，您需要下载 chromedriver.exe（如果您使用 Google Chrome）或其他一些可自动执行 Web 浏览器的驱动程序（PhantomJS 是另一个不错的驱动程序），然后您需要在您的机器上指定路径位置到哪里这个 .exe 像这样存在：

import selenium import webdriver
import csv, os
from bs4 import BeautifulSoup

os.chdir('Path to chromedriver or other driver') 
driver = webdriver.Chrome()
driver.get('Link to your webpage you want to extract HTML from')
html = driver.page_source
soup = BeautifulSoup(html)

containers = soup.find_all('ul',{'id':'productsHolder'})

with open("BioLegend_Crawl.csv", "w") as f:

    f.write("Product_name, CatNo, Size, Price\n")

    for container in containers:

        product_name = container.find('a',{'itemprop':'name'}).text
        info = container.find_all('div',{'class':'col-xs-2 noPadding'})
        catNo = info[0].text.strip()
        size = info[1].text.strip()
        price = info[2].text.strip()

        print('Product_name: '+ product_name)
        print('CatNo: ' + catNo)
        print('Size: ' + size)
        print('Price: ' + price + '\n')

        f.write(','.join([product_name,catNo,size,price]))

【讨论】：

输入了您的代码并按照您的所有步骤操作，但现在它似乎给了我这个错误消息“str”对象没有属性“find_all”。一直在玩它，但我仍然无法拉出任何东西，我也更新了我的 OP。
我的答案已更新。必须将 html 传递给 BeautifulSoup 以保留要解析的 HTML 树结构。