在 Python 中打印多个文件的特定行答案

【问题标题】：Print specific lines of multiple files in Python在 Python 中打印多个文件的特定行
【发布时间】：2017-07-03 13:07:20
【问题描述】：

我有 30 个文本文件，每个文件 30 行。出于某种原因，我需要编写一个脚本来打开文件 1，打印文件 1 的第 1 行，关闭它，打开文件 2，打印文件 2 的第 2 行，关闭它，等等。我试过这个：

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    f.close()
            continue

显然，我收到以下错误：

ValueError: 对已关闭文件的 I/O 操作。

因为 f.close() 的事情。仅读取所需行后，如何从文件移动到下一个文件？

【问题讨论】：

可以使用break退出循环；用那个替换f.close()。底部的continue 也是不必要的，外部循环可以是for i in range(0, 30):（或i, file in enumerate(files)？）而不显式增加i。
注意@Ryan 的跟进：f.close() 根本不需要，因为您在opening 文件时（正确地）使用了with 语句，确保它自动关闭当你退出区块时。
旁注：您可以使用itertools.islice 完全删除显式内部循环。用print(next(itertools.islice(f, i, None))) 替换with 块的全部内容，无需任何类型的显式循环。这需要@Ryan 建议将外部while 循环替换为for i, file in enumerate(files):（或确保您只处理30 个文件，for i, file in enumerate(islice(files, 30)):），这样您就不会手动跟踪/递增i。

标签： python python-3.x enumerate readlines

【解决方案1】：

首先，要回答这个问题，如 cmets 中所述，您的主要问题是您关闭文件然后尝试继续迭代它。有罪代码：

        for index, line in enumerate(f): # <-- Reads
            if index == i:
                print(line)
                i += 1
                f.close()                # <-- Closes when you get a hit
                                         # But loop is not terminated, so you'll loop again

最简单的解决方法是只使用break 而不是显式关闭，因为您的with 语句已经保证在退出块时确定性关闭：

        for index, line in enumerate(f):
            if index == i:
                print(line)
                i += 1
                break

但是因为这很有趣，所以这里有一段经过显着清理的代码来完成相同的任务：

import glob
from itertools import islice

# May as well use iglob since we'll stop processing at 30 files anyway    
files = glob.iglob('/Users/path/to/*/files.txt')

# Stop after no more than 30 files, use enumerate to track file num
for i, file in enumerate(islice(files, 30)):
    with open(file,'r') as f:
        # Skip the first i lines of the file, then print the next line
        print(next(islice(f, i, None)))

【讨论】：

【解决方案2】：

您可以使用linecache 模块来获取您需要的线路，并为您省去很多麻烦：

import glob
import linecache

line = 1
for file in glob.glob('/Users/path/to/*/files.txt'):
    print(linecache.getline(file, line))
    line += 1
    if line > 30:  # if you really need to limit it to only 30
        break

【讨论】：

好建议，虽然我会注意到linecache 将整个文件缓存到内存中以获得单行；这对于小文件（例如模块最初设计的源文件）通常不是问题，特别是如果您需要对多行执行随机访问，但对于任意输入，您最终可能会将 GB 文件读入内存（由于 Python 开销，这些行需要远远超过 GB 的内存）即使你想要的只是文件的第一行。避免手动line 跟踪也是有意义的，只需将glob 调用包装在enumerate 中。
不错，虽然很方便linecache 会占用内存，但我不知道 OP 会处理大文件。如果不再需要访问文件，可以在处理完之后随时致电clearcache()。如果需要访问非常大的文件，逐行浏览它们（传统方式）可能会产生可怕的性能 - 如果这是我宁愿建议使用mmap模块并让操作系统优化访问到数据。
谢谢！虽然我不得不用line 1 替换line 0，但效果很好。
糟糕，忘记了linecache 行索引以 1 开头。已修复。
@zwer：如果您只访问前 30 行或更少的行，那么逐行遍历它们直到到达目标行就可以了；不管文件本身有多大，读取前 30 行的时间取决于前 30 行的大小，而不是文件的大小。有line oriented uses for mmap，但在这里没有多大帮助；您仍然需要扫描换行符。您可以跳过任意数量的字节，然后查找附近的行，但没有固定长度的行，这不会为您提供特定的行号。

【解决方案3】：

我认为你想要这样的东西：

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    break
        f.close()

目前您正在 for 循环的中间关闭文件，然后尝试再次读取它。因此，如果您只在退出 for 循环后关闭文件，那应该没问题。

【讨论】：

【解决方案4】：

将您的工作分成更简单的步骤，直到最后一步变得微不足道。使用函数。

请记住，文件对象以行序列的形式工作。

def nth(n, sequence):
  for position, item in enumerate(sequence):
    if position == n:
      return item
  return None  # if the sequence ended before position n

def printNthLines(glob_pattern)
  # Note: sort file names; glob guarantees no order.
  filenames = sorted(glob.glob(glob_pattern))
  for position, filename in enumerate(filenames):
    with open(filename) as f:
      line = nth(position, f)  # Pick the n-th line.
      if line is not None:
        print(line)
      # IDK what to do if there's no n-th line in n-th file

printNthLines('path/to/*/file.txt')

显然我们将第 n 个文件扫描到第 n 行，但这是不可避免的，没有办法直接到明文文件中的第 n 行。

【讨论】：