使用 textscan 在 .txt 文件中容纳空白条目 - MATLAB答案

【问题标题】：Accommodating blank entries in .txt files using textscan - MATLAB使用 textscan 在 .txt 文件中容纳空白条目 - MATLAB
【发布时间】：2015-08-16 09:46:24
【问题描述】：

我有一个 9 列的制表符分隔的 .txt 文件，其中包含许多数据格式 - 但是“type”中的一些条目是空的。

id  id_2 s1      s2      st1     st2          type         desig  num
1   1   51371   51434   52858   52939   5:3_4:4_6:2_4:4_2:6 CO     1
2   1   108814  108928  109735  110856  5:3_4:4_6:2_4:4_2:7 CO     2
3   1   130975  131303  131303  132066  5:3_4:4_6:2_4:4_2:8 NCO    3
4   1   191704  191755  194625  194803                      NCO    4
5   2   69355   69616   69901   70006                       CO     5
6   2   202580  202724  204536  205151  5:3_4:4_6:2_4:4     CO     6

由于格式类型混合，我一直使用 textscan 来导入这些数据：

data = textscan(fid1, '%*f %f %f %f %f %f %*s %s %*[^\r\n]','HeaderLines',1);

要获取第 2-6 列，请跳过“type”并获取第 8 列。

这种方法在具有空条目的行上失败 - 它会跳过它，就好像它不是一列一样，而不是采用“NCO”或“CO”，而是采用“4”或“5”。

有没有办法防止这种情况发生？我知道我可以更改原始 .txt 文件以包含类似“NA”之类的空条目，但这比读取此类文件的更强大的方式更不可取。

编辑：

除了下面的答案，简单地指定使用的分隔符似乎可以解决问题：

data = textscan(fid1, '%*f %f %f %f %f %f %*s %s %*[^\r\n]','HeaderLines',1,'delimiter','\t');

【问题讨论】：

是仅type 列中缺少条目，还是其他列中也缺少条目？
仅type 列。

标签： matlab textscan

【解决方案1】：

这是importdata 和strsplit 的一种方法-

%// Read in data with importdata
data = importdata('data1.txt') %// 'data1.txt' is the input text file

%// Split data
split_data = cellfun(@(x) strsplit(x,' '),data,'Uni',0)

N = numel(split_data) %// number of rows in input textfile

%// Setup output cell and mask arrays
out_cell = cell(9,N)
mask = true(9,N)

%// Set the "type" entry as zero in mask array for the rows in textfile
%// that has corresponding entry missing
mask(7,cellfun(@length,split_data)~=9)=0

%// Use mask to set cells in out_cell from split data entries
out_cell(mask) = [split_data{:}]
out = out_cell'

示例运行 -

>> type data1.txt

id  id_2 s1      s2      st1     st2          type         desig  num
1   1   51371   51434   52858   52939   5:3_4:4_6:2_4:4_2:6 CO     1
2   1   108814  108928  109735  110856  5:3_4:4_6:2_4:4_2:7 CO     2
3   1   130975  131303  131303  132066  5:3_4:4_6:2_4:4_2:8 NCO    3
4   1   191704  191755  194625  194803                      NCO    4
5   2   69355   69616   69901   70006                       CO     5
6   2   202580  202724  204536  205151  5:3_4:4_6:2_4:4     CO     6
>> out
out = 
    'id'    'id_2'    's1'        's2'        'st1'       'st2'       'type'                   'desig'    'num'
    '1'     '1'       '51371'     '51434'     '52858'     '52939'     '5:3_4:4_6:2_4:4_2:6'    'CO'       '1'  
    '2'     '1'       '108814'    '108928'    '109735'    '110856'    '5:3_4:4_6:2_4:4_2:7'    'CO'       '2'  
    '3'     '1'       '130975'    '131303'    '131303'    '132066'    '5:3_4:4_6:2_4:4_2:8'    'NCO'      '3'  
    '4'     '1'       '191704'    '191755'    '194625'    '194803'                       []    'NCO'      '4'  
    '5'     '2'       '69355'     '69616'     '69901'     '70006'                        []    'CO'       '5'  
    '6'     '2'       '202580'    '202724'    '204536'    '205151'    '5:3_4:4_6:2_4:4'        'CO'       '6'

【讨论】：