如何跳过python读取文件代码中的一些块？答案

【问题标题】：How to skip some chunks in python read file code?如何跳过python读取文件代码中的一些块？
【发布时间】：2020-04-10 12:46:03
【问题描述】：

我有这样的代码：

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
while True:
        data = big_file .read(chunk_size)
        if not data:
            break

如果我只想读取每 10 个项目/元素或每 5 个元素，像这样，我该怎么做？

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
counter = 0
while True:
        counter +=1
        if counter%5!=0:
           big_file.next(chunksize) #Just skip it, don't read it...HOW TO DO THIS LINE?
           continue #I want to skip the chunk, and in the next loop, read the next chunk.
        data = big_file .read(chunk_size)
        if not data:
            break

在这种情况下，速度对我来说非常重要。我将为数百万个文件执行此操作。我正在做块哈希。

【问题讨论】：

我会看看seek() 函数。它应该做你想做的事。只要跟上偏移量。见this link

标签： python python-3.x python-3.6 python-3.7 python-3.8

【解决方案1】：

您可以为此使用文件的.seek() 方法。我使用pos 跟踪文件中当前位置的计数。 .read(chunk_size) 每 5 次读取一次数据。

寻找超出文件的大小不是问题。 data 那时将是空的，所以如果没有读取任何内容，我们就会中断。

chunk_size=512*1024 #512 kb
big_file = open("filename", 'rb')
counter = 0
pos = 0

while True:
    counter += 1
    if counter % 5 == 0:
        big_file.seek(pos)
        data = big_file.read(chunk_size)
        if not data:
            break
        print(data.decode("utf-8")) # here do your processing

    pos += chunk_size

【讨论】：

这个方法似乎比仅仅添加 continue 慢 1.5-2 倍。
这让我很惊讶。
对不起，我重新测试了。它慢 1.5-2 倍
你可以在上面运行我的代码。刚刚删除了`big_file.next(chunksize)`和测试
我在这里得到不同的数字。您的版本为 595 µs，我的版本为 90.3 µs。您是否可能包含了额外的打印语句？