【发布时间】:2020-02-16 12:05:06
【问题描述】:
我想使用正则表达式来识别用于 group_by 的变量并有效地汇总我的数据。我不能单独做,因为我有大量的变量要汇总,并且 group_by 的变量每次都需要动态传递。 data.table 接受使用正则表达式传递分组变量,但不接受汇总变量。到目前为止,我使用 tidyverse 的尝试也没有成功。任何帮助将不胜感激。
My data:
tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L),
d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L,
3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L,
5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))
View(tempDF)
lLevels<-c("d1")
lContinuum<-c("p_A", "p_B", "p_C")
My attempts:
setDT(tempDF)[ , list(group_means = mean(eval((paste0(lContinuum)))), by=eval((paste0(lLevels))))]
group_means by
1: NA d1
Warning message:
In mean.default(eval((paste0(lContinuum)))) :
argument is not numeric or logical: returning NA
But a single variable works:
setDT(tempDF)[ , list(group_means = mean(p_A)), by=eval((paste0(lLevels)))]
setDT(tempDF)[ , list(group_means = mean(p_B)), by=eval((paste0(lLevels)))]
setDT(tempDF)[ , list(group_means = mean(p_C)), by=eval((paste0(lLevels)))]
Expected output:
tempDF %>%
group_by(d1) %>%
summarise(p_A_mean = mean(p_A), p_B_mean = mean(p_B), p_C_mean = mean(p_C))
# A tibble: 3 x 4
d1 p_A_mean p_B_mean p_C_mean
<chr> <dbl> <dbl> <dbl>
1 A 2 3 2
2 B 3 4 1
3 C 2 3.5 1
【问题讨论】:
标签: r regex data.table tidyverse summarize