使用条件 rowSums 选择 R 数据框中的行（dplyr 方法）答案

【问题标题】：Select rows in R dataframe using conditional rowSums (dplyr method)使用条件 rowSums 选择 R 数据框中的行（dplyr 方法）
【发布时间】：2018-03-10 12:31:02
【问题描述】：

我在 R 中有以下数据框：

我想使用 dplyr 根据不同列的行总和过滤行：

unqA   unqB   unqC   totA   totB    totC
 3       5      8      16    12      9
 5       3      2       8     5      4

我想要 sum(all Unq)

我试过类似的东西：

filter(df, rowsum(matches("unq")) <= 0.10*rowsum(matches("totalC")))

Or:

filter(df, rowsum(unqA, unqB..) <= 0.10*rowsum(totA, totB..))

我只想选择唯一计数总和

但是，它不起作用或只是返回没有行的数据。

任何建议。

【问题讨论】：

标签： r select filter dplyr

【解决方案1】：

除了我使用了mutate 之外，此解决方案与@SamuelReuther 的答案采用了类似的方法。此外，根据我对问题的理解，样本数据中的任何情况都不会满足过滤器，因此我添加了一个额外的情况，即 TRUE 用于过滤条件。

library(tidyverse)
df <- read_table("unqA   unqB   unqC   totA   totB    totC
3       5      8      16    12      9
5       3      2       8     5      4
1       4      3      30    45     25")

df <- df %>% 
  mutate(sum_unq = rowSums(select(., starts_with("unq"))),
         sum_tot = rowSums(select(., starts_with("tot"))))
df  
#> # A tibble: 3 x 8
#>    unqA  unqB  unqC  totA  totB  totC sum_unq sum_tot
#>   <int> <int> <int> <int> <int> <int>   <dbl>   <dbl>
#> 1     3     5     8    16    12     9      16      37
#> 2     5     3     2     8     5     4      10      17
#> 3     1     4     3    30    45    25       8     100
df %>% filter(sum_unq <= 0.1 * sum_tot)
#> # A tibble: 1 x 8
#>    unqA  unqB  unqC  totA  totB  totC sum_unq sum_tot
#>   <int> <int> <int> <int> <int> <int>   <dbl>   <dbl>
#> 1     1     4     3    30    45    25       8     100

【讨论】：

【解决方案2】：

好的，我尝试了一些东西，希望它对你有用（如果我理解你的问题，就不一定了）：

这是您的示例数据框：

df <- data.frame(unqA = c(3, 5),
                 unqB = c(5, 3),
                 unqC = c(8, 2),
                 totA = c(16, 8),
                 totB = c(12, 5),
                 totC = c(9, 4))

作为第一步，我将计算所需的附加列：

library(dplyr)
df_ext <- cbind(df,
  rowSums_unq = df %>%
    select(matches("unq")) %>%
    rowSums(),
  rowSums_tot = df %>%
    select(matches("tot")) %>%
    rowSums())

这给出了：

然后过滤数据框，最后去掉不需要的列：

df_ext %>%
  filter(rowSums_unq <= 0.1 * rowSums_tot) %>%
  select(-rowSums_unq, -rowSums_tot)

【讨论】：