为此,每次从 CSV 文件中读取一行时,您都需要不断地将当前位置保存在另一个文件中,这当然会增加一些处理它的开销。
我认为创建Context Manager Type 和with 语句将是解决此问题的一种非常好的方法,并且可以在一定程度上最小化开销。
下面的代码实现了一个用于读取 CSV 文件的内容管理器,并允许读取它,或者如果在读取整个文件之前被中断(在 with 语句的上下文中),它们会自动恢复。
这是通过创建一个单独的“状态”文件来跟踪成功读取的最后一行来完成的。如果在读取过程中没有发生异常,则该文件将被删除,但是,它不会发生,如果发生,它将保留。因此,下次读取文件时,将检测到现有的状态文件并用于允许读取从之前停止的位置开始。
值得注意的是,由于每个可恢复的 CSV 阅读器都是一个单独的对象,因此您一次可以创建和使用多个。在读取 CSV 文件时,每个关联的“状态”文件保持打开状态,因此无需在每次更新其内容时重复打开和关闭。
import csv
import os
class ResumableCSVReader:
def __init__(self, filename):
self.filename = filename
self.state_filename = filename + '.state'
self.csvfile = None
self.statefile = None
def __enter__(self):
self.csvfile = open(self.filename, 'r', newline='')
try: # Open and read state file
with open(self.state_filename, 'r', buffering=1) as statefile:
self.start_row = int(statefile.read())
except FileNotFoundError: # No existing state file.
self.start_row = 0
self.statefile = open(self.state_filename, 'w', buffering=1)
return _CSVReaderContext(self)
def __exit__(self, exc_type, exc_val, exc_tb):
if self.csvfile:
self.csvfile.close()
if self.statefile:
self.statefile.close()
if not exc_type: # No exception?
os.remove(self.state_filename) # Delete state file.
class _CSVReaderContext:
def __init__(self, resumable):
self.resumable = resumable
self.reader = csv.reader(self.resumable.csvfile)
# Skip to start row.
for _ in range(self.resumable.start_row):
next(self.reader)
self.current_row = self.resumable.start_row
def __iter__(self):
return self
def __next__(self):
self.current_row += 1
row = next(self.reader)
# Update state file.
self.resumable.statefile.seek(0)
self.resumable.statefile.write(str(self.current_row)+'\n')
return row
if __name__ == '__main__':
csv_filename = 'resumable_data.csv'
# Read a few rows and raise an exception.
try:
with ResumableCSVReader(csv_filename) as resumable:
for _ in range(2):
print('row:', next(resumable))
raise MemoryError('Forced') # Cause exception.
except MemoryError:
pass # Expected, suppress to allow test to keep running.
# CSV file is now closed.
# Resume reading where left-off and continue to end of file.
print('\nResume reading\n')
with ResumableCSVReader(csv_filename) as resumable:
for row in resumable:
print('row:', row)
print('\ndone')
输出:
row: ['id', 'name']
row: ['001', 'jane']
Resume reading
row: ['002', 'winky']
row: ['003', 'beli']
done