删除 R 中的行答案

【问题标题】：Removing rows in R删除 R 中的行
【发布时间】：2026-01-08 01:55:02
【问题描述】：

我有一个包含大约 24 列和许多行的文件，如下所示：

| ID| Pos| S1 | S2| S3| S4|  ...S24
|---|----|----|---|---|---|
| A | 22 | .  | 1 | 0 | . |
| B | 21 | 1  | 0 | . |1  |
| C | 50 | 0  | . | . |.  |
| D | 11 | .  | 1 | . |.  |

我想删除样本（来自 S1 - S24）只有“。”的所有行。和“0”以及样本只有“.”的所有行。和 "1" 这样就像在上面的虚拟表中一样，行 C 和 D 将被删除，而 A 和 B 将被保留。

我尝试在 R 中使用 rowsums 失败；

NEW_FILE <- file[rowSums(file == "." & file == "1") < 24, ]

感谢 R 或其他方面的任何建议。

谢谢！

【问题讨论】：

标签： r subset

【解决方案1】：

我们可以为filtering 使用Vectorized 选项。下面，有三个执行此操作的四个选项

1) 使用str_c 和reduce。我们select名称为starts_with'S'的列，使用reduce（来自purrr）连接为单个字符串（使用str_c），然后使用str_detect检查是否只有一个或多个 0 和 . ([0.]+) 从字符串的开头 (^) 和结尾 ($) 或 (|) 仅一个或多个 1 和 .。否定 (!) 逻辑表达式并保留其余行

library(dplyr)
library(stringr)
library(purrr)
file %>% 
     filter(!str_detect(reduce(select(cur_data(), starts_with('S')), 
        str_c, sep=""), '^([0.]+|[1.]+)$'))
 #  ID Pos S1 S2 S3 S4
 #1  A  22  .  1  0  .
 #2  B  21  1  0  .  1

2) 另一个选项是 if_all 到 filter 仅具有来自“S”列的元素 . 和 0 的行，将 setdiff 与原始数据一起使用获取剩余的行，应用第二个if_all 以生成逻辑表达式n，其中行只有. 和1，取反（!）以返回其余行

file %>% 
  filter(if_all(starts_with('S'), ~ . %in% c('.', 0))) %>% 
  setdiff(file, .) %>%
  filter(!if_all(starts_with('S'), ~ . %in% c('.', 1)))
#  ID Pos S1 S2 S3 S4
#1  A  22  .  1  0  .
#2  B  21  1  0  .  1

3) 我们可以通过在第一个 if_all 之后创建一个临时逻辑列 ('i1') 来避免 setdiff 步骤，并在 filter 和下一个 if_all 中使用它

file %>%
   mutate(i1 = if_all(starts_with('S'), ~ . %in% c('.', 0))) %>% 
   filter(!(i1 | if_all(starts_with('S'), ~ . %in% c('.', 1)))) %>% 
   select(-i1)
#  ID Pos S1 S2 S3 S4
#1  A  22  .  1  0  .
#2  B  21  1  0  .  1

4) 或者我们可以使用rowSums 来创建可以与& 连接在一起的复合逻辑表达式

file %>%
   filter(rowSums(select(cur_data(), starts_with('S')) == '1') > 0 &
          rowSums(select(cur_data(), starts_with('S')) == '0') > 0)
#  ID Pos S1 S2 S3 S4
#1  A  22  .  1  0  .
#2  B  21  1  0  .  1

数据

file <- structure(list(ID = c("A", "B", "C", "D"), Pos = c(22L, 21L, 
50L, 11L), S1 = c(".", "1", "0", "."), S2 = c("1", "0", ".", 
"1"), S3 = c("0", ".", ".", "."), S4 = c(".", "1", ".", ".")), 
class = "data.frame", row.names = c(NA, 
-4L))

【讨论】：

嗨，阿克伦，非常感谢。我从您演示的不同选项中学到了很多东西。

【解决方案2】：

这是一个 dplyr 解决方案：

library(dplyr)
file %>% 
  rowwise() %>%
  filter(sum(!(c_across(-c(ID,Pos)) %in% c(".","0"))) > 0 &
         sum(!(c_across(-c(ID,Pos)) %in% c(".","1"))) > 0)
#  ID      Pos S1    S2    S3    S4   
#  <chr> <int> <chr> <chr> <chr> <chr>
#1 A        22 .     1     0     .    
#2 B        21 1     0     .     1

我们可以使用rowwise dplyr 动词来处理每一行。然后c_across 只处理S 列。我们可以检查是否都在c(".","0") 中，然后对c(".","1") 重复该过程。我们过滤（即保留）两个条件均为TRUE 的行。

如果还有其他非“S”列，您可以改用c_across(starts_with("S"))。

数据：

file <- structure(list(ID = c("A", "B", "C", "D"), Pos = c(22L, 21L, 
50L, 11L), S1 = c(".", "1", "0", "."), S2 = c("1", "0", ".", 
"1"), S3 = c("0", ".", ".", "."), S4 = c(".", "1", ".", ".")), class = "data.frame", row.names = c(NA, 
-4L))

【讨论】：

您好伊恩，感谢您的建议。我已经尝试过了，但它没有按我的意愿工作。我只想在样本中保留所有“。”、“1”和“0”的行。所以我想删除只有“。”的行。和“0”，以及带有“.”的行。并且只有“1”。
对不起，我误解了你的问题，我的编辑应该解决这个问题。

【解决方案3】：

假设在每一行都可以找到.，这也可以是另一种解决方案。但是，如果不是这种情况，我需要进行一些修改：

library(dplyr)
library(stringr)
library(purrr)

file %>%
  mutate(Con = pmap_lgl(file %>% 
                          select(starts_with("S")), ~ all(any(str_detect(c(...), "1")),
                          any(str_detect(c(...), "0"))))) %>%
  filter(Con) %>%
  select(-Con)

  ID Pos S1 S2 S3 S4
1  A  22  .  1  0  .
2  B  21  1  0  .  1

【讨论】：

【解决方案4】：

这是带有正则表达式的base R 解决方案：

file[-which(grepl("^[0.]+$|^[1.]+$", apply(file[,-1], 1, paste, collapse = ""))),]
  ID S1 S2 S3 S4
1  A  .  1  0  .
2  B  1  0  .  1

在这里，我们首先使用apply和paste将行折叠成字符串，然后我们在这些行上子集file which做不匹配它们的模式仅包含.和1或.和0从开始^到结束$

如果您更喜欢dplyr 解决方案：

library(dplyr)
file %>%
  rowwise() %>%
  mutate(string = paste(c_across(starts_with('S')),collapse = "")) %>%
  filter(!grepl("^[0.]+$|^[1.]+$", string)) %>%
  select(-string)

数据：

file <- data.frame(
  ID = LETTERS[1:4],
  S1 = c(".", "1", "0", "."),
  S2 = c("1", "0", ".", "1"),
  S3 = c("0", ".", ".", "."),
  S4 = c(".", "1", ".", ".")
)

【讨论】：

【解决方案5】：

希望这个使用 subset + apply + %in% 的基本 R 选项可以提供帮助（感谢 @akrun 提供数据）

> subset(file, apply(file, 1, function(x) all(c("0", "1", ".") %in% x) | sum(x == ".") + 2 == length(x)))
  ID Pos S1 S2 S3 S4
1  A  22  .  1  0  .
2  B  21  1  0  .  1

【讨论】：