使用这个 Python 脚本:
from __future__ import print_function
import time
import sys
import platform
if sys.version_info[0]==2:
range=xrange
times=[]
results=[]
t1=time.time()
t0=t1
tgt=5000000000
bucket=tgt/10
width=len('{:,} '.format(tgt))
with open('/tmp/disk_test.txt', 'w') as fout:
for line in range(1,tgt+1):
fout.write('Line {:{w},}\n'.format(line, w=width))
if line%bucket==0:
s='{:15,} {:10.4f} secs'.format(line, time.time()-t1)
results.append(s)
print(s)
t1=time.time()
else:
info=[platform.system(), platform.release(),sys.version, tgt, time.time()-t0]
s='\n\nDone!\n{} {}\n{} \n\n{:,} lines written in {:10.3f} secs'.format(*info)
fout.write('{}\n{}'.format(s, '\n'.join(results)))
print(s)
在 Python 2 和 OS X 下,打印:
500,000,000 475.9865 secs
1,000,000,000 484.6921 secs
1,500,000,000 463.2881 secs
2,000,000,000 460.7206 secs
2,500,000,000 456.8965 secs
3,000,000,000 455.3824 secs
3,500,000,000 453.9447 secs
4,000,000,000 454.0475 secs
4,500,000,000 454.1346 secs
5,000,000,000 454.9854 secs
Done!
Darwin 13.3.0
2.7.8 (default, Jul 2 2014, 10:14:46)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
5,000,000,000 lines written in 4614.091 secs
在 Python 3.4 和 OS X 下:
500,000,000 632.9973 secs
1,000,000,000 633.0552 secs
1,500,000,000 682.8792 secs
2,000,000,000 743.6858 secs
2,500,000,000 654.4257 secs
3,000,000,000 653.4609 secs
3,500,000,000 654.4969 secs
4,000,000,000 652.9719 secs
4,500,000,000 657.9033 secs
5,000,000,000 667.0891 secs
Done!
Darwin 13.3.0
3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
5,000,000,000 lines written in 6632.965 secs
生成的文件为 139 GB。您可以看到在一个相对空的磁盘上(我的/tmp 路径是一个 3 TB 的卷)时间是线性的。
我的怀疑是,在 Ubuntu 下,您正在运行操作系统,试图使不断增长的文件在 EXT4 磁盘上保持连续。
回想一下,OS X 的 HFS+ 和 Linux 的 EXT4 文件系统都使用allocate-on-flush 磁盘分配方案。 Linux 操作系统还将尝试主动移动文件以允许分配是连续的(而不是碎片化的)
对于 Linux EXT4——您可以预先分配更大的文件以减少这种影响。使用fallocate,如this SO 帖子中所示。然后在 Python 中倒回文件指针并原地覆盖。
您或许可以使用 Python truncate 方法创建文件,但结果取决于平台。
类似于(伪代码)的东西:
def preallocate_file(path, size):
''' Preallocate of file at "path" of "size" '''
# use truncate or fallocate on Linux
# Depending on your platform, You *may* be able to just the following
# works on BSD and OS X -- probably most *nix:
with open(path, 'w') as f:
f.truncate(size)
preallocate_file(fn, size)
with open(fn, 'r+') as f:
f.seek(0) # start at the beginning
# write whatever
f.truncate() # erases the unused portion...