【问题标题】:r aggregate multiple factor variabler 聚合多因子变量
【发布时间】:2018-06-24 07:01:36
【问题描述】:

我有一个这样的数据框:

    data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),region=c("north","south","east","North","south"))

Home Weigth  region  
A     0.1     North      
B     0.25    South    
C     0.36    East   
A     0.14    North
C     0.2     South

我想按照两个因素变量汇总我的 data.frame 并对第三个变量求和。结果将给出:

    data.frame(home=c("A","B","C"),north=c(0.24,0,0),south=c(0,0.25,0.2),east=c(0.36,0,0))

Home North  South  East
A     0.24   0      0
B     0      0.25    0
C     0      0.2    0.36

我正在尝试使用聚合等快速简便的功能,但也许唯一的解决方案是使用我想要的手动制作 data.frame

【问题讨论】:

    标签: r


    【解决方案1】:

    基本上有两个步骤,(1)汇总和; (2) 将结果转换为双向表

    library(dplyr)
    df <-  data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),region=c("north","south","east","North","south"))
    df$region <- Hmisc::capitalize(as.character(df$region))
    
    df_sum <- df %>% group_by(home, region) %>% summarize(weight_sum = sum(weight, na.rm=TRUE))
    
    reshape2::dcast(df_sum, home ~ region, function(V) sum(V, na.rm=TRUE))
    

    第二个和是无关的,也不是很必要,我放在这里只是为了避免将NA转换为0的额外步骤。

    【讨论】:

      【解决方案2】:

      我想这样就可以了,h01 是你想要的结果

      x00<-data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),
                      region=c("north","South","East","North","South"),stringsAsFactors = F)
      x00$region<-tolower(x00$region)
      x01<-ddply(x00,.(region,home),summarize,result=sum(weight))
      h01<-data.frame(north=c(0,0,0),south=c(0,0,0),east=c(0,0,0),row.names = c("A","B","C"))
      for (x in 1:nrow(x01)){
        h01[x01$home[x],x01$region[x]]=x01$result[x]
      }
      
      h01$Home=row.names(h01)
      row.names(h01)<-c()
      

      【讨论】:

        【解决方案3】:

        数据

        df <-  data.frame(
            home = c("A", "B", "C", "A", "C"),
            weight = c(0.1, 0.25, 0.36, 0.14, 0.2),
            region = c("north", "south", "east", "North", "south")
          )
        

      • 整理
        library(tidyr)
        spread(df, region, weight, fill = 0)
        

      • 重塑2
        library(reshape2)
        dcast(df, home ~ region, value.var = "weight", fill = 0)
        

      • 基础
        # xtabs
        xtabs(weight ~ home + region, data = df) 
        
        # reshape
        df_wide <-reshape(df, idvar ='home', timevar ='region', direction ='wide')
        df_wide[is.na(df_wide)] <- 0
        

        输出

          home east north North south
        1    A 0.00   0.1  0.14  0.00
        2    B 0.00   0.0  0.00  0.25
        3    C 0.36   0.0  0.00  0.20
        
      • 【讨论】:

        • tidyr 版本不起作用,对于行的重复标识符,reshape2 版本计算行数,而不是求和,xtabs 完美工作
        • @Aurélien 我对您的评论感到困惑,因为对于使用提供的数据框的所有方法,我得到了相同的输出。无论如何,我很高兴它成功了。
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多