在 Octave 4.0.3 上使用 textscan() 时出现问题（300 万行/250 MB 文件）答案

【问题标题】：Issue using textscan() on Octave 4.0.3 (3 million lines / 250 MB file)在 Octave 4.0.3 上使用 textscan() 时出现问题（300 万行/250 MB 文件）
【发布时间】：2016-10-14 23:29:38
【问题描述】：

我正在尝试重写一段 MATLAB 代码，以便它可以使用 Octave 运行，但我发现使用 textscan() 函数时遇到了一些问题。

原始代码（MATLAB）：

function data = import_file(filename, startRow, endRow)

delimiter = ' ';
if nargin<=2
    startRow = 3;
    endRow = inf;
end

formatSpec = '%f%f%f%f%f%f%*s%[^\n\r]';

fileID = fopen(filename,'r');

dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'EmptyValue' ,NaN,'HeaderLines', startRow(1)-1, 'ReturnOnError', false);
for block=2:length(startRow)
    frewind(fileID);
    dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'EmptyValue' ,NaN,'HeaderLines', startRow(block)-1, 'ReturnOnError', false);
    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end

fclose(fileID);

data = [dataArray{1:end-1}];

end

错误：

error: strread: %q, %c, %[] or bit width format specifier
s are not supported yet.
error: called from
     strread at line 329 column 7
     textscan at line 321 column 8
     import_file at line 13 column 15
     main at line 52 column 22

样本数据：

# U  POINT_DATA 3711396
#  x  y  z  U_x  U_y  U_z  
739263.5 9363820 172.809998 -5.34212399 -0.0408997531 0.0736143066
739263.5 9363789 172.979996 -5.34212399 -0.0408997531 0.0736143066
739294.312 9363820 172.449997 -5.34212399 -0.0408997531 0.0736143066
739294.312 9363789 173.710007 -5.34212399 -0.0408997531 0.0736143066
739325.125 9363820 170.699997 -5.248474 -0.00403332808 0.041700209
739325.125 9363789 172.350006 -5.37227834 -0.0307070923 0.0492642202
739355.938 9363820 168.690002 -5.248474 -0.00403332808 0.041700209
739355.938 9363789 170.5 -5.37227834 -0.0307070923 0.0492642202
739386.75 9363820 169.110001 -5.248474 -0.00403332808 0.041700209
739386.75 9363789 170.839996 -5.37227834 -0.0307070923 0.0492642202
739417.562 9363820 170.789993 -5.248474 -0.00403332808 0.041700209
739417.562 9363789 171.820007 -5.37227834 -0.0307070923 0.0492642202

我已经尝试过使用其他函数，例如 dlmread()、load() 甚至 fgetl() 来完成这项工作，但与过去在 MATLAB 上使用的 8s 相比，它需要太多时间。

将 formatSpec 替换为 '%f%f%f%f%f%f' 也不起作用。

该文件包含 3711396 行和 250 MB 的数据，分为六列数据。

你能帮我修改一下代码吗？

【问题讨论】：

仅使用load 应该是在 Octave 中读取数据的最快方法。你试过data=load(filename)(startRow:endRow,:);

标签： matlab octave

【解决方案1】：

我能够检查您的代码，发现有两件事阻止它运行。

第一个是在你的格式规范中使用%[^\n\r]

第二个是使用“ReturnOnError”名称/值对。

Octave 尚不支持这两个功能。

我能够使用以下修改后的代码成功导入您提供的示例数据：

function data = import_file(filename, startRow, endRow)

if nargin<=2
    startRow = 3;
    endRow = inf;
end

formatSpec = '%f%f%f%f%f%f';
% Corrected formatSpec to import 6 consecutive floats 

fileID = fopen(filename,'r');

dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1,...
    'EmptyValue' ,NaN,...
    'HeaderLines', startRow(1)-1); 
% Removed 'ReturnOnError' as it is not yet implimented in Octave.

for block=2:length(startRow)
    frewind(fileID);

    dataArrayBlock = textscan(fileID, formatSpec,...
        endRow(block)-startRow(block)+1,...
        'EmptyValue' ,NaN,...
        'HeaderLines', startRow(block)-1);

    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end

fclose(fileID);

data = [dataArray{1:end}]; 
%Changed 'end-1' to 'end' to include last column.

end

【讨论】：