【发布时间】:2020-07-27 10:36:39
【问题描述】:
我正在处理的项目遇到问题。
我有一个 CSV 文件,其中包含第一列中的所有网址。
我下面的脚本当前拉入并遍历每一行,但是一旦它尝试 find_all 它就会准备好以下错误: IndexError: 列表索引超出范围。
import requests
from bs4 import BeautifulSoup
import csv
with open('1.csv', "r", newline="") as inFile, open("1output.csv", "w", newline="") as outFile:
next(inFile)
reader = csv.reader(inFile)
writer = csv.writer(outFile)
for row in reader:
subURL = row[0]
# Parse the HTML from the website
URL = 'https://www.example.com/{}'.format(subURL)
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
# find iframe on webpage and get the src of the iframe
iframeDesc = soup.find_all('iframe')[0]
pageDesc = requests.get(iframeDesc['src'])
soupDesc = BeautifulSoup(pageDesc.content, 'html.parser')
# Get Description from iframe Desc
itemDesc = soupDesc.find_all('div', id="div_01")
此行发生错误:
iframeDesc = soup.find_all('iframe')[0]
【问题讨论】:
-
可能该网站没有 iframe。分享网站的链接
-
网站有一个 iframe,并且 iframe 选择器代码在没有嵌入 for 循环时可以工作
-
感谢 Joshua,网址不正确,没有 Iframe,您是对的
-
当然。欢迎来到 SO
标签: python csv beautifulsoup