Python BeautifulSoup find_all 在 csv 的 for 循环中答案

【问题标题】：Python BeautifulSoup find_all in a for loop of csvPython BeautifulSoup find_all 在 csv 的 for 循环中
【发布时间】：2020-07-27 10:36:39
【问题描述】：

我正在处理的项目遇到问题。

我有一个 CSV 文件，其中包含第一列中的所有网址。

我下面的脚本当前拉入并遍历每一行，但是一旦它尝试 find_all 它就会准备好以下错误： IndexError: 列表索引超出范围。

import requests
from bs4 import BeautifulSoup
import csv

with open('1.csv', "r", newline="") as inFile, open("1output.csv", "w", newline="") as outFile:
    next(inFile)
    reader = csv.reader(inFile)
    writer = csv.writer(outFile)
    for row in reader:
        subURL = row[0]

        # Parse the HTML from the website
        URL = 'https://www.example.com/{}'.format(subURL)
        page = requests.get(URL)
        soup = BeautifulSoup(page.content, 'html.parser')

        # find iframe on webpage and get the src of the iframe
        iframeDesc = soup.find_all('iframe')[0]
        pageDesc = requests.get(iframeDesc['src'])
        soupDesc = BeautifulSoup(pageDesc.content, 'html.parser')

        # Get Description from iframe Desc
        itemDesc = soupDesc.find_all('div', id="div_01")

此行发生错误：

iframeDesc = soup.find_all('iframe')[0]

【问题讨论】：

可能该网站没有 iframe。分享网站的链接
网站有一个 iframe，并且 iframe 选择器代码在没有嵌入 for 循环时可以工作
感谢 Joshua，网址不正确，没有 Iframe，您是对的
当然。欢迎来到 SO

标签： python csv beautifulsoup

【解决方案1】：

您的问题可能有多种动机，让我向您介绍最有可能的原因。

错误模式：在这种情况下，异常是正常的，因为您要求 BeautifulSoup 返回文档中未出现的内容
错字：最简单的一种，可能是一个错误的字母导致您无法获得所需的节点？

此外，我怀疑您在树中寻找错误的节点。事实上，这种情况在使用 BS 时经常发生，因为您基本上是在 DOM 中下降，并且很可能会发生缺少标签的情况。只需在您的代码周围放置一些打印件，看看这些行发生了什么。

【讨论】：

谢谢。目前，当我打印 html 解析器时，它会向我显示带有 iframe 的页面内容。当我删除 findall 末尾的 [0] 时，脚本可以工作，但是当我在代码中注释下一行时，它会给我一个不同的错误是：TypeError：列表索引必须是整数或切片，而不是str。
发生这种情况是因为它会返回您使用“src”字符串索引的对象列表。尝试打印 iframeDesc 并检查列表是否不为空，如果是这种情况，您可以获取您的对象并访问其“src”属性。
谢谢我发现网址有误谢谢您的帮助