【问题标题】:Rowsums conditional on column name in a loop以循环中的列名为条件的行和
【发布时间】:2017-04-20 12:01:10
【问题描述】:

这是这个问题的后续问题:Rowsums conditional on column name

我的数据框名为wiod,如下所示:

VAR1 VAR2 AUS1 ... AUS56 BEL1 ... BEL56 NLD1 ... NLD56
A    D    23   ... 99    0    ... 444   123  ... 675
B    D    55   ... 6456  0    ... 557   567  ... 4345

我想计算变量AUS, BEL, NLD 的行总和,然后删除旧变量。像这样:

wiot <- wiot %>% 
  mutate(AUS = rowSums(.[grep("AUS", names(.))])) %>% 
  mutate(BEL = rowSums(.[grep("BEL", names(.))])) %>% 
  mutate(NLD = rowSums(.[grep("NLD", names(.))])) %>% 
  select(Var1, Var2, AUS, BEL, NLD)

当然,还有大量的变量组,而不仅仅是这三个(准确地说是 43 个)。有没有不使用 43 mutate 命令的便捷方法?

【问题讨论】:

  • imo,您应该将数据转换为长格式(整洁),这将使计算更容易
  • 我同意,你的数据并不“整洁”,所以不要指望dplyr 能正常工作。

标签: r dataframe dplyr rowsum


【解决方案1】:

它可以更轻松地从宽格式转换为长(聚集)格式,然后进行汇总,如果需要,再转换回宽(展开)格式:

library(dplyr)
library(tidyr)

# dataframe from @989 http://stackoverflow.com/a/43519062
df1 %>% 
  gather(key = myKey, value = myValue, -c(VAR1, VAR2)) %>% 
  mutate(myGroup = gsub("\\d", "", myKey)) %>% 
  group_by(VAR1, VAR2, myGroup) %>% 
  summarise(mySum = sum(myValue)) %>% 
  spread(key = myGroup, value = mySum)

# Source: local data frame [2 x 5]
# Groups: VAR1, VAR2 [2]
# 
#     VAR1   VAR2   AUS   BEL   NLD
# * <fctr> <fctr> <int> <int> <int>
# 1      A      D   122   444   798
# 2      B      D  6511   557  4912

【讨论】:

    【解决方案2】:

    你可以试试这个:

    vec <- c("AUS", "BEL", "NLD")
    cbind(df[,grep("VAR", names(df))], 
          sapply(vec, function(x) rowSums(df[,grep(x, names(df))])))
    
    #  VAR1 VAR2  AUS BEL  NLD
    #1    A    D  122 444  798
    #2    B    D 6511 557 4912
    

    您只需要使用您的 43 个变量加载 vec


    数据

    df <- structure(list(VAR1 = structure(1:2, .Label = c("A", "B"), class = "factor"), 
        VAR2 = structure(c(1L, 1L), .Label = "D", class = "factor"), 
        AUS1 = c(23L, 55L), AUS56 = c(99L, 6456L), BEL1 = c(0L, 0L
        ), BEL56 = c(444L, 557L), NLD1 = c(123L, 567L), NLD56 = c(675L, 
        4345L)), .Names = c("VAR1", "VAR2", "AUS1", "AUS56", "BEL1", 
    "BEL56", "NLD1", "NLD56"), class = "data.frame", row.names = c(NA, 
    -2L))
    

    【讨论】:

      【解决方案3】:

      这是另一个版本,它使用了一些 tidyverse 功能而没有 gathering

      library(tidyverse)
      fSumN1 <- function(dat, pat){
            pat1 <- paste(pat, collapse="|")
            newN <- paste0(pats, "_sum")
            dat1 <- dat %>%
                        select(-matches(pat1))
            dat %>%
                 select(matches(pat1)) %>%
                 split.default(sub("\\d+", "", names(.))) %>%
                 map_df(rowSums) %>%
                 rename_at(.vars = pat, funs(paste0(pat, "_sum"))) %>%
                 bind_cols(dat1, .)
      
       }
      
      
      
      pats <- c("AUS", "AUT")
      fSumN1(dfN, pats)
      #  VAR1 VAR2 VAR3 VAR4 AUS_sum AUT_sum
      #1    A    D    0  FCK    1246    3076
      #2    B    D    0  XYC    6678    3349
      

      数据

      dfN <- structure(list(VAR1 = c("A", "B"), VAR2 = c("D", "D"), AUS1 = c(23L, 
      55L), AUS2 = c(234L, 76L), AUS3 = c(34L, 55L), AUS4 = c(856L,  
      36L), AUS56 = c(99L, 6456L), VAR3 = c(0L, 0L), VAR4 = c("FCK", 
      "XYC"), AUT1 = c(598L, 774L), AUT2 = c(992L, 503L), AUT3 = c(819L, 
      944L), AUT4 = c(368L, 717L), AUT56 = c(299L, 411L)), .Names = c("VAR1", 
      "VAR2", "AUS1", "AUS2", "AUS3", "AUS4", "AUS56", "VAR3", "VAR4", 
      "AUT1", "AUT2", "AUT3", "AUT4", "AUT56"), row.names = c(NA, -2L
      ), class = "data.frame")
      

      【讨论】:

      • 执行fSumN1 命令时出现错误:Error in function_list[[i]](value) : could not find function "rename_at"
      • @Laubsauger 我正在使用dplyrdevel 版本(即将发布为0.6.0。请检查您的dplyr 版本
      猜你喜欢
      • 1970-01-01
      • 2017-12-09
      • 1970-01-01
      • 1970-01-01
      • 2021-12-05
      • 2023-03-30
      • 1970-01-01
      • 2020-06-03
      • 1970-01-01
      相关资源
      最近更新 更多