从元胞数组中删除重复的行答案

【问题标题】：Erase repeated rows from a cell array从元胞数组中删除重复的行
【发布时间】：2014-07-20 11:26:39
【问题描述】：

我有一个包含很多行的元胞数组，有时行会重复。从这个意义上说，我想删除重复的行，只保留第一行。重要的是要知道我主要处理字符串值，这意味着常规和有用的函数不起作用。有人可以帮我吗？这是一个例子：

19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'
19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'

我想得到什么：

19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'

【问题讨论】：

你可以使用unique的第三个输出，逐列攻击单元格。 [~,~,a] = unique([cell_array{:,1}])

标签： string matlab cell repeat

【解决方案1】：

您可以先对元胞数组的行进行排序 (sortrows)，然后您可以识别具有线性复杂度的重复行（isequal 应用于连续行）。

让cellArray 表示您的输入元胞数组：

cellArray = {19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
             19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
             19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'
             19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
             19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'}

代码：

[sorted, jj] = sortrows(cellArray);
ind = arrayfun(@(n) isequal(sorted(n,:),sorted(n+1,:)), 1:size(cellArray,1)-1);
result = cellArray(sort(jj([true ~ind])),:);

结果：

result = 
    [19970101]    [18659]    [183]    '19980820'    '00018659'    'RUNYON L'     '00001534'    'MERRILL'
    [19970101]    [18290]    [183]    '19981221'    '00018290'    'MANTON S'     '00001534'    'MERRILL'
    [19970101]    [10280]    [183]    '19980819'    '00010280'    'BRENNAN S'    '00001534'    'MERRILL'

【讨论】：

我的印象是单元格数组不能按行排序。看着它，但哇太棒了！再+1！！
这对我来说也是一个惊喜。我开始像你一样转换为字符串，但后来我决定尝试使用原始数组
@LuisMendo 感谢您的代码。我已经离开太久了，直到现在我才能够尝试！但是从第一行开始，它已经给出了错误！ >> [sorted, Jj] = sortrows(FINALJOIN2); Error using char Cell elements must be character arrays. Error in sortrows>sort_cell_back_to_front (line 136) tmp = char(x(ndx,k)); Error in sortrows (line 88) ndx = sort_cell_back_to_front(x_sub, col);
我认为问题很可能与我的数据类型有关。我会查一下。谢谢。
@LuisMendo 我设法解决了我的问题，我相信您的代码对我来说非常有用。谢谢你。我正在学习很多我从未想象过的新东西。

【解决方案2】：

试试这个 -

%// Input cell array
input_cell_array ={
    19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
    19970101 18290 183 '19981221' '00018290' 'MANTON S' '00001534' 'MERRILL'
    19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'
    19970101 18659 183 '19980820' '00018659' 'RUNYON L' '00001534' 'MERRILL'
    19970101 10280 183 '19980819' '00010280' 'BRENNAN S' '00001534' 'MERRILL'}

%// "Standardize" the cells by converting all into strings
allstrc = cellfun(@num2str,input_cell_array,'uni',0)

%// Group each column as one cell for labelling them
allstrcg = mat2cell(allstrc,size(allstrc,1),ones(1,size(allstrc,2)))

%// Label them with unique command
[~,~,row_ind] = cellfun(@(x) unique(x,'stable'),allstrcg,'uni',0)

%// Sometimes the row_ind obtained from the earlier code are obtained in cells
%// as row or column vectors, so we need to normalize them -
row_ind = cellfun(@(x) reshape(x,[],1),row_ind,'uni',0) 

%// Get a double array of the labels 
mat1 = horzcat(row_ind{:})

%// Get unique rows of the labels
[~,ind] = unique(mat1,'rows','stable')

%// Finally get the desired output by selecting the unique rows from the labels
out = input_cell_array(ind,:)

输出 -

[19970101]    [18659]    [183]    '19980820'    '00018659'    'RUNYON L'     '00001534'    'MERRILL'
[19970101]    [18290]    [183]    '19981221'    '00018290'    'MANTON S'     '00001534'    'MERRILL'
[19970101]    [10280]    [183]    '19980819'    '00010280'    'BRENNAN S'    '00001534'    'MERRILL'

【讨论】：

@Divakar，此代码还在第一行检测到错误>> allstrc = cellfun(@num2str,FINALJOIN2,'uni',0); Undefined function 'abs' for input arguments of type 'cell'. Error in num2str (line 65) xmax = double(max(abs(widthCopy(:))));
如果我忽略了第一步，那么我就会遇到问题，因为如果我没有字符串元胞数组，我将无法使用 unique 函数！ :(
我以为我为此给了你 +1，但现在我发现我没有？
@user3557054 不太清楚为什么allstrc 不适合你，因为它适合我。那里可能存在一些数据不一致。如果可能，将您的实际输入元胞数组 - FINALJOIN2 上传到某个公共共享网站并在此处共享链接？
@LuisMendo 猜我之前没收到。但这一切都很好，真的没什么大不了:)