【发布时间】:2022-01-21 06:56:52
【问题描述】:
我有一个数据集(相当凌乱 - 但不是我的工作......帮助同事), 它具有多行值,其中一些行在一列中重复,但其他列因某些元素添加了“*”而有所不同。重复如下:-
a <- c("2020", "Rose", "r","r","s","s","i","i","r")
b <- c("2020", "Rose","r*","r*","s*","s*","s*","s*","s*")
c <- c("2020", "Lily","r","r","s","s","i","i","r")
d <- c("2020", "Tulip","r*","r*","r*","r*","s*","r*","r*")
e <- c("2020", "Tulip","s","s","r","s","s","r","r")
data <- rbind(a,b,c,d,e)
所以我的数据框看起来像这样......
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
a "2020" "Rose" "r" "r" "s" "s" "i" "i" "r"
b "2020" "Rose" "r*" "r*" "s*" "s*" "s*" "s*" "s*"
c "2020" "Lily" "r" "r" "s" "s" "i" "i" "r"
d "2020" "Tulip" "r*" "r*" "r*" "r*" "s*" "r*" "r*"
e "2020" "Tulip" "s" "s" "r" "s" "s" "r" "r"
我需要删除第 2 列中重复的行(“Rose”、“Lily”等),并选择性地保留带有 * 的行,使其看起来像这样......
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
b "2020" "Rose" "r*" "r*" "s*" "s*" "s*" "s*" "s*"
c "2020" "Lily" "r" "r" "s" "s" "i" "i" "r"
d "2020" "Tulip" "r*" "r*" "r*" "r*" "s*" "r*" "r*"
我觉得与 lapply 捆绑在一起的功能可能是正确的方法,但不知道如何继续! - 任何想法
【问题讨论】:
-
是否会出现没有重复项中有 * 或多个重复项的情况?这些情况下的规则是什么?
-
我的理解是应该只有 1 个重复项(即:它们是对的),一个有 * 一个没有。
-
那么逻辑是:如果只有1个就保留它。如果有两个保留一个带*?