【问题标题】:Python Selenium Scraper error - ValueError: I/O operation on closed filePython Selenium Scraper 错误 - ValueError: I/O operation on closed file
【发布时间】:2021-11-17 21:20:00
【问题描述】:

我正在尝试运行以下 python 代码

技术:Python、Selenium 刮刀
设备:Windows 设备

出现错误......

Traceback (most recent call last): 
File "scraper.py", line 35, in for row in cp_url: 
ValueError: I/O operation on closed file.
#!/usr/bin/python3
# Description: The Python code below will search selenium in Google.
import time
import csv
import os


from selenium import webdriver
from selenium.webdriver.common.keys import Keys

#EVERYTIME CHANGE THE DRIVER PATH TO THE CHROME DRIVER FOR LATEST CHROME VERSION
driver = webdriver.Chrome(
    executable_path="D:\chromedriver.exe")

options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])

contents = []

filePath = 'output1.csv'
# As file at filePath is deleted now, so we should check if file
# exists or not not before deleting them
if os.path.exists(filePath):
    os.remove(filePath)
else:
    print("Can not delete the file as it doesn't exists")

f = open("output1.csv", "a")
f.write("website," + "htmltag," + "type," + "id," + "classname," + "for," + "href," + "alt," + "type," + "src,"
+ "name," + "width," + "height," + "data-src,"+ 'inner-text,' + 'action,' + 'value,' + "\n")


with open('inputLinks1.csv', 'rt') as cp_csv:
 cp_url = csv.reader(cp_csv)
for row in cp_url:
        links = row[0]
        contents.append(links)
        driver.get(links)
        with open('xpathtags.csv', 'rt') as cp2_csv:
            cp_url2 = csv.reader(cp2_csv)
            for row1 in cp_url2:
                print(row[0])
                (xtype, xpathtext) = row1[0].split(';') 
                print(xtype, xpathtext)
                contents.append(xtype)
                contents.append(xpathtext)
                elems = driver.find_elements_by_xpath(xpathtext)
                for elem in elems:
                    f = open('output1.csv', 'a', encoding='utf-8')
                    f.write( links + ", "+ xtype + "," 
                        + str(elem.get_attribute('type')) + ', '
                        + str(elem.get_attribute('id')) + ', '
                        + str(elem.get_attribute('class')) + ', '
                        + str(elem.get_attribute('for')) + ', '
                        + str(elem.get_attribute('href')) + ', '
                        + str(elem.get_attribute('alt')) + ', '                        
                        + str(elem.get_attribute('type')) + ', '
                        + str(elem.get_attribute('src')) + ', '
                        + str(elem.get_attribute('name')) + ', '
                        + str(elem.get_attribute('width')) + ', '
                        + str(elem.get_attribute('height')) + ', '
                        + str(elem.get_attribute('data-src')) + ', '
                        + str(elem.get_attribute('innerText').strip()) + ', '
                        + str(elem.get_attribute('action')) + ', '
                         + str(elem.get_attribute('value')) + ', '

                        + '\n')
                   
                    f.close()  


driver.close()

我正在使用以下 CSV 文件

A) inputlinks1.csv

www.flipkart.com
www.ebay.com

B) xpathtags.csv

Link;//a[@href]
Button;//button
Image;//img
Heading1;//h1
Heading2;//h2
Heading3;//h3
Heading4;//h4

C) Output.csv 是一个空白文件

我收到以下错误

Traceback (most recent call last):
  File "scraper.py", line 35, in <module>
    for row in cp_url:
  ValueError: I/O operation on closed file.

【问题讨论】:

  • 您在该行有语法错误,您是否使用了正确的间距而不是交换空格/制表符?
  • 间距是正确的。可以请指导吗?
  • 尝试将with open('inputLinks1.csv', 'rt') as cp_csv:更改为with open('inputLinks1.csv', 'r') as cp_csv:
  • 谢谢Kamalesh。更改后我仍然遇到同样的错误。
  • 这是完整的错误信息吗?始终显示完整的回溯。 Stackoverflow 不显示行号 - 哪一行是 35?

标签: python selenium csv web-scraping


【解决方案1】:

我无法测试它,但我认为你的问题是你有错误的缩进

with open('inputLinks1.csv', 'rt') as cp_csv:
 cp_url = csv.reader(cp_csv)
for row in cp_url:
    # ...rest...

所以你在with...as... 之外运行for-loop,with...as... 会自动关闭文件。

你应该在with...as...中运行for-loop

with open('inputLinks1.csv', 'rt') as cp_csv:
    cp_url = csv.reader(cp_csv)
    for row in cp_url:
        # ...rest...

或者您可以使用标准的open()close()

cp_csv = open('inputLinks1.csv', 'rt')

cp_url = csv.reader(cp_csv)
for row in cp_url:
    # ...rest...

cp_csv.close()

【讨论】:

  • 我应该刷新我的页面,同时测试代码:(,想出了相同的解决方案。
  • @KamaleshS 有时它也会发生在我身上。我开始创建解决方案,同时其他人创建答案。您添加了有关其他问题的信息,因此您的回答也很有用
  • 是的,同意!!
【解决方案2】:

必须对您的代码进行一些更改才能使其正常工作。

修复缩进后,它又抛出了另一个错误 w.r.t inputlinks1.csv 文件。 改成-

https://www.flipkart.com
https://www.ebay.com

在处理文件时始终尝试使用with open

代码sn-p:-

contents = []

filePath = 'output1.csv'
# As file at filePath is deleted now, so we should check if file
# exists or not not before deleting them
if os.path.exists(filePath):
    os.remove(filePath)
else:
    print("Can not delete the file as it doesn't exists")

with open("output1.csv", "a") as f:
    f.write("website," + "htmltag," + "type," + "id," + "classname," + "for," + "href," + "alt," + "type," + "src,"
    + "name," + "width," + "height," + "data-src,"+ 'inner-text,' + 'action,' + 'value,' + "\n")

with open('inputLinks1.csv', 'r') as cp_csv:
    cp_url = csv.reader(cp_csv)
    for row in cp_url:
            links = row[0]
            print(links)
            contents.append(links)
            driver.get(links)
            with open('xpathtags.csv', 'r') as cp2_csv:
                cp_url2 = csv.reader(cp2_csv)
                for row1 in cp_url2:
                    print(row[0])
                    (xtype, xpathtext) = row1[0].split(';') 
                    print(xtype, xpathtext)
                    contents.append(xtype)
                    contents.append(xpathtext)
                    elems = driver.find_elements_by_xpath(xpathtext)
                    for elem in elems:
                        with open('output1.csv', 'a', encoding='utf-8') as f:
                            f.write( links + ", "+ xtype + "," 
                            + str(elem.get_attribute('type')) + ', '
                            + str(elem.get_attribute('id')) + ', '
                            + str(elem.get_attribute('class')) + ', '
                            + str(elem.get_attribute('for')) + ', '
                            + str(elem.get_attribute('href')) + ', '
                            + str(elem.get_attribute('alt')) + ', '                        
                            + str(elem.get_attribute('type')) + ', '
                            + str(elem.get_attribute('src')) + ', '
                            + str(elem.get_attribute('name')) + ', '
                            + str(elem.get_attribute('width')) + ', '
                            + str(elem.get_attribute('height')) + ', '
                            + str(elem.get_attribute('data-src')) + ', '
                            + str(elem.get_attribute('innerText').strip()) + ', '
                            + str(elem.get_attribute('action')) + ', '
                            + str(elem.get_attribute('value')) + ', '

                            + '\n')
                   

driver.close()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-11-27
    • 1970-01-01
    • 2018-02-21
    • 1970-01-01
    • 1970-01-01
    • 2022-11-25
    • 2021-05-19
    相关资源
    最近更新 更多