【发布时间】:2012-09-07 16:05:58
【问题描述】:
我有一个数据集,我想删除在 4 个不同列中具有重复信息的数据行。
foo<- data.frame(g1 = c("1","0","0","1","1"), v1 = c("7","5","4","4","3"), v2 = c("a","b","x","x","e"), y1 = c("y","c","f","f","w"), y2= c("y","y","y","f","c"), y3 = c("y","c","c","f","w"), y4= c("y","y","f","f","c"), y5=c("y","w","f","f","w"), y6=c("y","c","f","f","w"))
foo 然后看起来像:
g1 v1 v2 y1 y2 y3 y4 y5 y6
1 1 7 a y y y y y y
2 0 5 b c y c y w c
3 0 4 x f y c f f f
4 1 4 x f f f f f f
5 1 3 e w c w c w w
现在,我想根据 Y1-6 列删除任何包含重复数据的行。因此,如果操作正确,则仅会删除第 4 行和第 1 行,因为所有 Y 变量都完全相同。它是一个多列条件。
我相信我已经接近了,但它只是无法正常工作。
我试过了:new = foo[!(duplicated(foo[,1:6]))]
考虑使用重复的命令,它会搜索并只找到那些完全匹配的?
我曾想过使用带有 & 的条件语句,但也不知道该怎么做。new = foo[foo$y1==foo$y2|foo$y3|foo$y4|foo$y5|foo$y6]
我想过哪个,但我现在不知所措,迷失了方向。我希望 foo 看起来像:
g1 v1 v2 y1 y2 y3 y4 y5 y6
2 0 5 b c y c y w c
3 0 4 x f y c f f f
5 1 3 e w c w c w w
【问题讨论】:
标签: r select duplicates conditional-statements