你可以找到办公室的限制products here
Matlab 非常适合处理这种大文件和大量文件集。 2014 版对引入的 csv 数据存储进行了很多改进,现在也可以很好地处理 excel 文件。
看看这个教程:
http://blogs.mathworks.com/loren/2014/12/03/reading-big-data-into-matlab/
我有 3 个包含以下内容的 csv 文件(文件 [1-3].csv):
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
a3,b3,c3,d3,e3
a4,b4,c4,d4,e4
a5,b5,c5,d5,e5
a6,b6,c6,d6,e6
a7,b7,c7,d7,e7
a8,b8,c8,d8,e8
a9,b9,c9,d9,e9
a10,b10,c10,d10,e10
以及用于列名称的文件 varnames:
A B C D E
让我们阅读文件:
>> datafile = 'csv-files/file1.csv';
>> headerfile = 'csv-files/varnames.txt'
>> fileID = fopen(headerfile);
>> varnames = textscan(fileID,'%s');
>> varnames = varnames{:};
ds = datastore(datafile,'ReadVariableNames',false);
>> ds.VariableNames = varnames
ds =
TabularTextDatastore with properties:
Files: {
'/home/anquegi/learn/matlab/stackoverflow/csv-files/file1.csv'
}
FileEncoding: 'UTF-8'
ReadVariableNames: false
VariableNames: {'A', 'B', 'C' ... and 2 more}
Text Format Properties:
NumHeaderLines: 0
Delimiter: ','
RowDelimiter: '\r\n'
TreatAsMissing: ''
MissingValue: NaN
Advanced Text Format Properties:
TextscanFormats: {'%q', '%q', '%q' ... and 2 more}
ExponentCharacters: 'eEdD'
CommentStyle: ''
Whitespace: ' \b\t'
MultipleDelimitersAsOne: false
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'A', 'B', 'C' ... and 2 more}
SelectedFormats: {'%q', '%q', '%q' ... and 2 more}
ReadSize: 20000 rows
>> preview(ds)
ans =
A B C D E
____ ____ ____ ____ ____
'a1' 'b1' 'c1' 'd1' 'e1'
'a2' 'b2' 'c2' 'd2' 'e2'
'a3' 'b3' 'c3' 'd3' 'e3'
'a4' 'b4' 'c4' 'd4' 'e4'
'a5' 'b5' 'c5' 'd5' 'e5'
'a6' 'b6' 'c6' 'd6' 'e6'
'a7' 'b7' 'c7' 'd7' 'e7'
'a8' 'b8' 'c8' 'd8' 'e8'
如果看我们取的参数ReadSize是ReadSize:20000行,那么matlab每次读取20000行就可以处理了。由于数据只有 10 行,我将其更改为 3:
>> ds.ReadSize=3
ds =
TabularTextDatastore with properties:
Files: {
'/home/anquegi/learn/matlab/stackoverflow/csv-files/file1.csv'
}
FileEncoding: 'UTF-8'
ReadVariableNames: false
VariableNames: {'A', 'B', 'C' ... and 2 more}
Text Format Properties:
NumHeaderLines: 0
Delimiter: ','
RowDelimiter: '\r\n'
TreatAsMissing: ''
MissingValue: NaN
Advanced Text Format Properties:
TextscanFormats: {'%q', '%q', '%q' ... and 2 more}
ExponentCharacters: 'eEdD'
CommentStyle: ''
Whitespace: ' \b\t'
MultipleDelimitersAsOne: false
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'A', 'B', 'C' ... and 2 more}
SelectedFormats: {'%q', '%q', '%q' ... and 2 more}
ReadSize: 3 rows
>> reset(ds)
while hasdata(ds)
T = read(ds);
T.A
end
ans =
'a1'
'a2'
'a3'
ans =
'a4'
'a5'
'a6'
ans =
'a7'
'a8'
'a9'
ans =
'a10'
那么T变量是一个表,你可以写到你想写的地方:注意每次read(ds)它移动readsie分配的行数,这个参数可以是行,也可以是文件
>> reset(ds)
>> T = read(ds);
>> T
T =
A B C D E
____ ____ ____ ____ ____
'a1' 'b1' 'c1' 'd1' 'e1'
'a2' 'b2' 'c2' 'd2' 'e2'
'a3' 'b3' 'c3' 'd3' 'e3'
>> writetable(T,'mySpreadsheet','FileType','spreadsheet')
>> reset(ds)