【问题标题】:R - remove rows from data frame that do not match (exactly) elements of listR - 从数据框中删除不匹配(完全)列表元素的行
【发布时间】:2022-12-02 23:40:57
【问题描述】:

想象一个数据框...

df <- rbind("A*YOU 1.000 0.780", "A*YOUR 1.000 0.780", "B*USE 0.800 0.678", "B*USER 0.700 1.000")
df <- as.data.frame(df)
df

...打印...

> df
                  V1
1  A*YOU 1.000 0.780
2 A*YOUR 1.000 0.780
3  B*USE 0.800 0.678
4 B*USER 0.700 1.000

...并且我想删除其中不完全包含列表的任何元素的任何行(此处称为 tenables)tenables &lt;- c("A*YOU", "B*USE"),以便结果变为:

> df
                  V1
1  A*YOU 1.000 0.780
2  B*USE 0.800 0.678

关于如何解决这个问题的任何想法?提前谢谢了。

【问题讨论】:

    标签: r list dataframe row


    【解决方案1】:
    > df[gsub("\s*\d+\.*", "", df$V1) %in% tenables, ,drop=FALSE]
                     V1
    1 A*YOU 1.000 0.780
    3 B*USE 0.800 0.678
    

    【讨论】:

      【解决方案2】:

      由于您在 tenables 中有正则表达式特价(* 表示“前一个字符/类/组的 0 个或多个”),我们不能在 grep 调用中使用 fixed=TRUE。因此,我们需要找到那些特价商品并对它们进行反斜杠转义。从那里,我们将添加 \b(单词边界)以区分 YOUYOUR,其中添加空格或任何其他字符可能会过度限制。

      ## clean up tenables to be regex-friendly and precise
      gsub("([].*+(){}[])", "\\\1", tenables)
      # [1] "A\*YOU" "B\*USE"
      
      ## combine into a single pattern for simple use in grep
      paste0("\b(", paste(gsub("([].*+(){}[])", "\\\1", tenables), collapse = "|"), ")\b")
      # [1] "\b(A\*YOU|B\*USE)\b"
      
      ## subset your frame
      subset(df, !grepl(paste0("\b(", paste(gsub("([].*+(){}[])", "\\\1", tenables), collapse = "|"), ")\b"), V1))
      #                   V1
      # 2 A*YOUR 1.000 0.780
      # 4 B*USER 0.700 1.000
      

      正则解释:

      \b(A\*YOU|B\*USE)\b
      ^^^                 ^^^  "word boundary", meaning the previous/next chars
                               are begin/end of string or from A-Z, a-z, 0-9, or _
         ^               ^     parens "group" the pattern so we can reference it
                               in the replacement string
          ^^^^^^^              literal "A", "*", "Y", "O", "U" (same with other string)
                 ^             the "|" means "OR", so either the "A*" or the "B*" strings
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-02-14
        • 2021-10-29
        • 1970-01-01
        • 2022-08-18
        • 1970-01-01
        • 1970-01-01
        • 2019-01-26
        • 2022-08-03
        相关资源
        最近更新 更多