【问题标题】:Unable to create a grouped summary dataset in R无法在 R 中创建分组汇总数据集
【发布时间】:2019-08-19 17:17:27
【问题描述】:

我在创建分组汇总统计信息时遇到问题。

以下是我用来创建此摘要数据集的代码

library(dplyr)

#sample dataset 
D           A                 B             C        VAL        PD
Agriculture Services    Bought with Cash 01OCT2014   10      0.4435714
Agriculture Grain       Bought with Cash 01OCT2014   10      0.7266667
Agriculture Livestock   Bought with Cash 01OCT2014   10      1.1372414
Agriculture Fr, ve      Bought with Cash 01OCT2014   10      1.5170370
Agriculture Livestock   Financed         01OCT2014   76      1.1372414
Agriculture Fr, ve      Financed         01OCT2014   76      1.5170370
Agriculture Grain       Financed         01OCT2014   76      0.7266667
Agriculture Services    Financed         01OCT2014   76      0.4435714
Agriculture Services    Insurance        01OCT2014   10      0.4435714
Agriculture Livestock   Insurance        01OCT2014   10      1.1372414

groupDF<-select.other %>% 
   group_by(.dots=c("A","B","C")) %>% 
   summarize(PD=mean(PD),VAL=mean(VAL))

我希望数据集具有按 A、B 和 C 分组的平均 PD 和平均 VAL

    A       B                 C         PD      VAL     
Services  Bought with Cash   01OCT2017   1      10

相反,我得到了

PD           VAL
0.8574816   6059877

任何帮助或指导将不胜感激。

【问题讨论】:

  • 列名不需要引号:group_by(A, B, C)
  • 我也试过这个。它给我一个错误,说它找不到A列

标签: r dplyr summary


【解决方案1】:

如果是字符串,我们可以使用group_by_at

library(dplyr)
select.other %>% 
      group_by_at(vars(c("A","B","C"))) %>% 
       summarize(PD=mean(PD),VAL=mean(VAL))
# A tibble: 10 x 5
# Groups:   A, B [10]
#   A         B                C            PD   VAL
#   <chr>     <chr>            <chr>     <dbl> <dbl>
# 1 Fr, ve    Bought with Cash 01OCT2014 1.52     10
# 2 Fr, ve    Financed         01OCT2014 1.52     76
# 3 Grain     Bought with Cash 01OCT2014 0.727    10
# 4 Grain     Financed         01OCT2014 0.727    76
# 5 Livestock Bought with Cash 01OCT2014 1.14     10
# 6 Livestock Financed         01OCT2014 1.14     76
# 7 Livestock Insurance        01OCT2014 1.14     10
# 8 Services  Bought with Cash 01OCT2014 0.444    10
# 9 Services  Financed         01OCT2014 0.444    76
#10 Services  Insurance        01OCT2014 0.444    10

或者另一种选择是转换为symbols 然后进行评估 (!!!)

select.other %>% 
      group_by(!!! rlang::syms(c("A","B","C"))) %>% 
       summarize(PD=mean(PD),VAL=mean(VAL))

数据

select.other <- structure(list(D = c("Agriculture", "Agriculture", "Agriculture", 
"Agriculture", "Agriculture", "Agriculture", "Agriculture", "Agriculture", 
"Agriculture", "Agriculture"), A = c("Services", "Grain", "Livestock", 
"Fr, ve", "Livestock", "Fr, ve", "Grain", "Services", "Services", 
"Livestock"), B = c("Bought with Cash", "Bought with Cash", "Bought with Cash", 
"Bought with Cash", "Financed", "Financed", "Financed", "Financed", 
"Insurance", "Insurance"), C = c("01OCT2014", "01OCT2014", "01OCT2014", 
"01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014", 
"01OCT2014", "01OCT2014"), VAL = c(10L, 10L, 10L, 10L, 76L, 76L, 
76L, 76L, 10L, 10L), PD = c(0.4435714, 0.7266667, 1.1372414, 
1.517037, 1.1372414, 1.517037, 0.7266667, 0.4435714, 0.4435714, 
1.1372414)), class = "data.frame", row.names = c(NA, -10L))

【讨论】:

  • 嗨,感谢您的评论,我尝试了这两个选项,但它仍然给了我在我的问题中发布的输出。 PD VAL 0.8574816 6059877
  • @Lonewolf 我无法用dplyr_0.8.3重现您的问题
  • @Lonewolf 我已经更新了我使用的数据结构以及我得到的结果
  • 感谢由于一个不相关的问题,我能够得到预期的结果。
猜你喜欢
  • 1970-01-01
  • 2017-12-31
  • 2013-08-26
  • 1970-01-01
  • 2021-07-07
  • 2020-09-06
  • 1970-01-01
  • 2021-04-14
  • 1970-01-01
相关资源
最近更新 更多