如何根据两个条件选择列？ [复制]答案

【问题标题】：How can I select columns based on two conditions? [duplicate]如何根据两个条件选择列？ [复制]
【发布时间】：2019-08-14 21:44:58
【问题描述】：

我有一个包含很多列的数据框。例如：

sample treatment col5 col6 col7
  1        a       3    0   5  
  2        a       1    0   3
  3        a       0    0   2
  4        b       0    1   1

我想选择 sample 和 treatment 列以及满足以下 2 个条件的所有列：

treatment == 'b'所在行的值为0
至少一行treatment == 'a' 的值非 0。

预期的结果应该是这样的：

sample treatment col5
  1        a       3      
  2        a       1      
  3        a       0      
  4        b       0

示例数据框：

structure(list(sample = 1:4, treatment = structure(c(1L, 1L, 
1L, 2L), .Label = c("a", "b"), class = "factor"), col5 = c(3, 
1, 0, 0), col6 = c(0, 0, 0, 1), col7 = c(5, 3, 2, 1)), class = "data.frame", row.names = c(NA, 
-4L))

【问题讨论】：

Filter data.frame rows by a logical condition 的可能重复项；要向重复问题添加附加条件，只需在条件之间添加&，将每个条件写为data[cond1 & cond2, ]
对不起，我是这个网站的新手，你的评论是什么意思？我检查了你提到的问题，但那是选择行而不是列。
对不起，我匆匆忙忙。从 Shree 的回答中可以看出，选择列可以通过从 data[cond1 & cond2, ] 切换到 data[, cond1 & cond2] 来完成（注意逗号的位置）。基本上，您将data.frames、matrix 索引为data[row, column]，其中row 是您要提取的条件或行号，与列参数类似。查看Hadley's Advanced R 在线图书。尽管它的名字是第一次查看必要的章节，例如 subsetting 可以让你走得更远，而无需阅读高级内容。

标签： r dplyr subset

【解决方案1】：

这是基础 R 中的一种方式 -

cs_a <- colSums(df[df$treatment == "a",-c(1:2)]) > 0
cs_b <- colSums(df[df$treatment == "b",-c(1:2)]) == 0

df[, c(TRUE, TRUE, cs_a & cs_b)]

  sample treatment col5
1      1         a    3
2      2         a    1
3      3         a    0
4      4         b    0

与dplyr -

df %>% 
  select_at(which(c(TRUE, TRUE, cs_a & cs_b)))

【讨论】：

是否可以使用 dplyr::select？

【解决方案2】：

这里是 tidyverse 中更冗长的方式，不需要手动 colSums 来处理每个级别的处理：

library(dplyr)
library(purrr)
library(tidyr)

sample <- 1:4
treatment <- c("a", "a", "a", "b")
col5 <- c(3,1,0,0)
col6 <- c(0,0,0,1)
col7 <- c(5,3,2,1)

dd <- data.frame(sample, treatment, col5, col6, col7)
# first create new columns that report whether the entries are zero
dd2 <- mutate_if(
  .tbl = dd,
  .predicate = is.numeric,
  .funs = function(x)
    x == 0
)

# then find the sum per column and per treatment group
# in R TRUE = 1 and FALSE = 0
number_of_zeros <- dd2 %>% 
  group_by(treatment) %>% 
  summarise_at(.vars = vars(col5:col7), .funs = "sum")

# then find the names of the columns you want to keep
keeper_columns <-
  number_of_zeros %>% 
  select(-treatment) %>% # remove the treatment grouping variable
  map_dfr( # function to check if all entries per column (now per treatment level) are greater zero
    .x = .,
    .f = function(x)
      all(x > 0)
  ) %>% 
  gather(column, keeper) %>% # reformat 
  filter(keeper == TRUE) %>% # to grab the keepers
  select(column) %>% # then select the column with column names
  unlist %>% # and convert to character vector
  unname

# subset the original dataset for the wanted columns
wanted_columns <- dd %>% select(1:2, keeper_columns)

【讨论】：