仅当满足条件时才在 dplyr 组中过滤，否则不过滤答案

【问题标题】：Filter in a dplyr group only when the condition is met else do not仅当满足条件时才在 dplyr 组中过滤，否则不过滤
【发布时间】：2026-01-19 09:05:01
【问题描述】：

我有一个right_join 表，其中某些列的NA 值取决于条目来自哪个表。表中的每个“命中”都有一个以 0 开头的“索引”。

我想group_by(hit, indx) 做一些条件过滤。我想最好使用dplyr。

这是数据：

test <- tibble(hit = c(rep("101mA", 4), rep("1914A", 5)), 
               indx = c(0, 0, 0, 1, 0, 0, 0, 0, 1),
               hit_start = c(7, 63, 105, 131, 4, 7, 56, 64, 147), 
               hit_end = c(112, 82, 126, 152, 82, 34, 83, 81, 166), 
               stamp_score = c(NA, 9.32, 9.30, 9.49, NA, NA, NA, 8.16, 9.15), 
               bit_score = c(76.2, NA, NA, NA, 84.7, 8.3, 0.3, NA, NA) 
              )

这是表格：

# A tibble: 9 x 6
  hit    indx hit_start hit_end stamp_score bit_score
  <chr> <dbl>     <dbl>   <dbl>       <dbl>     <dbl>
1 101mA     0         7     112       NA         76.2
2 101mA     0        63      82        9.32      NA  
3 101mA     0       105     126        9.30      NA  
4 101mA     1       131     152        9.49      NA  
5 1914A     0         4      82       NA         84.7
6 1914A     0         7      34       NA          8.3
7 1914A     0        56      83       NA          0.3
8 1914A     0        64      81        8.16      NA  
9 1914A     1       147     166        9.15      NA

在每个group_by(hit, indx) 中，如果“stamp_score”列中甚至有一个NA，我想保留带有NA 条目的行。但是，如果组的“stamp_score”列中没有NA，我想保留所有行。

这是我最后的期望：

# A tibble: 6 x 6
  hit    indx hit_start hit_end stamp_score bit_score
  <chr> <dbl>     <dbl>   <dbl>       <dbl>     <dbl>
1 101mA     0         7     112       NA         76.2
4 101mA     1       131     152        9.49      NA  
5 1914A     0         4      82       NA         84.7
6 1914A     0         7      34       NA          8.3
7 1914A     0        56      83       NA          0.3
9 1914A     1       147     166        9.15      NA

请注意，我打算最终将代码用于具有 >10000 次点击的表格，每个表格都有自己的“索引”。

【问题讨论】：

如果没有NA，我不想过滤任何行是什么意思？意思是你想放弃没有NA的观察或者你想保留它们？
对不起。我的意思是，如果组中的“stamp_score”列中没有 NA 值，我想保留所有行。
请显示预期结果。这将更好地了解您的意图。谢谢。

标签： r dplyr

【解决方案1】：

其实我在另一个相关的question找到了答案。

这使用data.table 一个衬垫，在我的例子中是：

library(data.table)

test <- setDT(test)[, if(any(is.na(stamp_score))) .SD[is.na(stamp_score)] else .SD, .(hit, indx)]

基本上，只有当“stamp_score”列中有NA 时，此代码才会对组进行子集化，否则不会。

感谢所有试图提供帮助的人，并且随着时间的推移帮助我改进了我的问题。

【讨论】：

【解决方案2】：

我不确定您是要在 stamp_score 中保留带有 NA 的值还是删除它们。但我认为这应该可以胜任：

library(dplyr)

# create the df where you only have group with non missing obs
noNAind <- test %>% group_by(indx) %>% filter(!any(is.na(stamp_score))) %>% ungroup()
noNAhit <- test %>% group_by(hit) %>% filter(!any(is.na(stamp_score))) %>% ungroup()

# create the df with all the missing obs 
missind<- test %>% group_by(indx) %>% filter(is.na(stamp_score)) %>% ungroup()
misshit<- test %>% group_by(hit) %>% filter(is.na(stamp_score)) %>% ungroup()

# merge the data
test<- full_join(noNAind,noNAhit) %>% distinct()
test<- full_join(test,missind) %>% distinct()
test<- full_join(test,misshit) %>% distinct()

【讨论】：

抱歉，使用此代码，组 "indx" = 1 不会保留任何行
那你以后可以合并它们吗？
你能告诉我怎么做吗？请注意，我最终必须对数万行执行此操作。
我已经编辑了代码，看看这是否能达到你想要的效果
它适用于这个例子。但我有一个更大的表，在“命中”列中有许多条目。当我对整个表使用这种方法时，它会因为删除许多不必要的行而失败。