如何捕获元素向量以便 R dplyr 函数读取它们？答案

【问题标题】：How to catch element vector so that they are read by R dplyr function?如何捕获元素向量以便 R dplyr 函数读取它们？
【发布时间】：2017-11-18 04:07:13
【问题描述】：

我正在尝试使用 dplyr 包，但我遇到了处理变量的问题。

假设我有一个简化的数据框

my.data <- as.data.frame(matrix(NA), ncol=4, nrow=6)
my.data <- as.data.frame(cbind(c("d6", "d7", "d8", "d9", "da", "db"), c(rep("C200", 2), rep("C400", 4)), c(rep("a",5), "b"), c("c", rep("a", 5))))
colnames(my.data) <- c("snp", "gene", "ind1", "ind2")

我先用group_by统计每个基因的snp个数：

new.data <- my.data %>% group_by(gene) %>% mutate(count = n())

但是我想得到每个列的字符串出现百分比：

new.data %>% group_by(gene) %>% filter(grepl("a", ind1)) %>% dplyr::mutate(perc.a.ind1 = n()/count*100)
new.data %>% group_by(gene) %>% filter(grepl("a", ind2)) %>% dplyr::mutate(perc.a.ind2 = n()/count*100)

它工作正常。问题是我有很多人，我需要自动化它。所以我创建了一个名称向量并在 for 循环中运行我的函数（我知道循环不是最好的，我很乐意升级到应用版本或其他东西）

ind.vec <- colnames(my.data[,3:4])
for (i in 1:length(ind.vec){
new.data %>% group_by(gene) %>% filter(grepl("a", ind.vec[i])) %>% mutate(percent = n()/count*100)

}

我最终得到了一个空的 tibble，就像我的 ind.vec 中没有一个元素被识别一样。

我阅读了 https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html 的小插图，这让我认为我已经发现了问题，但我还远远没有理解它并且无法使其与我的数据一起使用。

我做了一些试验

ind.vec <- quote(colnames(my.data[,3:4]))
new.data %>% group_by(gene) %>% filter(grepl("a", !!(ind.vec[i]))) %>% mutate(percent = n()/count*100)

如何让矢量元素被 dplyr 识别？

你能帮忙吗？

【问题讨论】：

@IanWesley，感谢您提及该帖子。它已经解决了我的问题，但在我的情况下，我必须处理 ind.vec[i]，并且索引给我带来了麻烦，因为它没有在 as.name(ind.vec) 中重新定义。

标签： r dplyr

【解决方案1】：

我建议您为此使用 tidyr::gather。

library(tidyverse)
# or library(dplyr);library(tidyr)

my.data %>% 
  group_by(gene) %>% 
  mutate(count = n()) %>% 
  gather(ind, string, ind1, ind2 ) %>% 
  filter(string == "a") %>% 
  group_by(gene, ind, string) %>% 
  mutate(
    n_string = n(),
    freq = n_string /  count * 100 ) 

# A tibble: 10 x 7
# Groups:   gene, ind, string [4]
#      snp   gene count   ind string n_string  freq
#    <fctr> <fctr> <int> <chr>  <chr>    <int> <dbl>
# 1     d6   C200     2  ind1      a        2   100
# 2     d7   C200     2  ind1      a        2   100
# 3     d8   C400     4  ind1      a        3    75
# 4     d9   C400     4  ind1      a        3    75
# 5     da   C400     4  ind1      a        3    75
# 6     d7   C200     2  ind2      a        1    50
# 7     d8   C400     4  ind2      a        4   100
# 8     d9   C400     4  ind2      a        4   100
# 9     da   C400     4  ind2      a        4   100
#10     db   C400     4  ind2      a        4   100

由于某种原因，我收到了警告，但结果与您提供的相同。

【讨论】：

【解决方案2】：

@SollanoRabeloBraga，非常感谢！！它解决了我的问题。我修改了聚集功能以包含更多人 gather(ind, string, ind1:ind5) 然后我做到了

new.data <- test[!duplicated(new.data[, c("gene", "ind", "freq")]),]

new.data <- cast(test2, gene ~ ind)

完善我的结果。

【讨论】：