尝试检索图像链接时出现 Python 错误答案

【问题标题】：Python error when trying to retrieve image links尝试检索图像链接时出现 Python 错误
【发布时间】：2017-01-11 14:40:05
【问题描述】：

每次运行以下 Python 脚本时，我都会收到错误 AttributeError: 'NoneType' object has no attribute 'findAll'。我做了一些研究，发现一些帖子表明我在尝试查找图像时可能会传递“无”，这就是它出错的原因。我仍然没有解决方案。任何信息都有帮助。

这是完整的错误：

Traceback (most recent call last):
  File "D:\Program Files\Parser Python\Test.py", line 33, in <module>
    for img in divImage.findAll('img'):
AttributeError: 'NoneType' object has no attribute 'findAll'


    from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.common.exceptions import TimeoutException
import os

firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = 'C:\Program Files (x86)\Mozilla Firefox\firefox.exe'


os.environ["PATH"] += "C:\Python27\Lib\site-packages\selenium-2.53.6-py2.7.egg\selenium"
#binary = FirefoxBinary('C:\Program Files (x86)\Mozilla Firefox\firefox.exe')
driver = webdriver.Firefox(capabilities=firefox_capabilities)
# it takes forever to load the page, therefore we are setting a threshold
driver.set_page_load_timeout(5)

try:
    driver.get("http://readcomiconline.to/Comic/Flashpoint/Issue-1?id=19295&readType=1")
except TimeoutException:
    # never ignore exceptions silently in real world code
    pass

soup2 = BeautifulSoup(driver.page_source, 'html.parser')
divImage = soup2.find('div', {"id": "divImage"})
#divImage = soup2.find('div', {"id": "containerRoot"})

# close the browser 
driver.close()

for img in divImage.findAll('img'):
    print img.get('src')

【问题讨论】：

这个话题对你有帮助吗？链接：stackoverflow.com/questions/18065768/…
这也可能与stackoverflow.com/questions/31419641/… 相关。顺便说一句，为什么要使用 Java 标记？
天哪，那是个意外。我最初试图用 Java 中的 JSOUP 来完成这个，所以我想我脑子里有 java 。我很抱歉。
有什么方法可以删除标签吗？
您应该可以在编辑问题时将其删除。此外，如果只是意外，也没问题。但是在stackoverflow上，有时有些人只是在他们的问题中添加尽可能多的语言标签，希望得到更快的答案，因此我的第一条评论。但正如我所说，如果这只是一个诚实的错误，那就忘掉它。看起来有人已经编辑了问题，只是在等待审核。

标签： python selenium firefox

【解决方案1】：

该错误表示divImage是None，这意味着在解析的HTML中没有找到带有id="divImage"的div元素。

您应该首先等待页面上出现所需的元素，然后才能获取页面源并对其进行解析。这可以通过WebDriverWait 来完成：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ...

driver.get("http://readcomiconline.to/Comic/Flashpoint/Issue-1?id=19295&readType=1")

wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.ID, "divImage")))

soup2 = BeautifulSoup(driver.page_source, 'html.parser')

注意，要等待所有图片加载完毕，需要不断滚动页面到页脚，直到所有图片都加载完毕，实现：

driver.get("http://readcomiconline.to/Comic/Flashpoint/Issue-1?id=19295&readType=1")
wait.until(EC.presence_of_element_located((By.ID, "divImage")))

footer = driver.find_element_by_id("footer")

while True:
    # scroll to the footer
    driver.execute_script("arguments[0].scrollIntoView();", footer)
    time.sleep(0.5)

    # check if all images are loaded
    if all(img.get_attribute("src") for img in driver.find_elements_by_css_selector("#divImage p img")):
        break

别忘了import time。

【讨论】：

这似乎成功了。该页面确实需要很长时间才能完全加载，因为显示了许多图像。我现在可以打印出“src”文本，但我只能得到两行两幅图像。这可能是因为当时没有加载其他图像吗？
@HunterZolomon 好点，这部分不是那么容易，更新了答案 - 看看吧。
嗯，现在我收到此错误：selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: [id="footer"] 我检查了网页，肯定有一个带有“页脚”的 ID。
@HunterZolomon 啊，可能又是时间问题了，等一下试试。谢谢。
是的，我等了页脚，现在它可以工作了。由于等待所有图像需要几分钟。非常感谢你的帮助！是时候进行下一步了。