【问题标题】:Summing over columns with if statement in R用R中的if语句对列求和
【发布时间】:2014-09-15 23:52:33
【问题描述】:

我有一个看起来像这样的数据。

bankname    bankid  year    totass  corresbankname1 corresbankloc1  corresdepoin1   corresbankname2 corresbankloc2  corresdepoin2   corresbankname3 corresbankloc3  corresdepoin3   
BankA   1   1881    244789  First Bank  New York    7250.32 Third National Bank Philadelphia    20218.2 Commercial Bank Philadelphia    29513.4   

BankB   2   1881    195755  National Bank Pittsburgh    10243.6 Union Trust Company New York    1851.51 NA  NA  NA   

Bankc   3   1881    107736  Mechanics' Bank New York    13357.8 Wyoming Bank    Wilkes-Barre    17761.2 NA  NA  NA      

BankD   4   1881    170600  Commonwealth Bank   Philadelphia    3.35    Seventh National Bank   Philadelphia    2   NA  NA  NA  

BankE   5   1881    320000  National Bank   New York    351266  Mechanics'  Bank    New York    314012  National Park Bank  New York    206580

这可以复制

bankname <- c("The Anchor Savings Bank of Pittsburgh","The Arsenal Bank","The Ashley Savings Bank","The Bank of America of Philadelphia","The Bank of Pittsburgh")
bankid <- c( 1, 2,  3,  4,  5)
year<- c( 1881, 1881,   1881,   1881,   1881)
totass  <- c(244789,    195755, 107736, 170600, 32000000)
corresbankname1 <- c("First National Bank","National Bank of Commerce","Mechanics' National Bank","Commonwealth National Bank","National Bank of Commerce")
corresbankloc1 <-c("Philadelphia","Pittsburgh","New York","Philadelphia","New York")
corresdepoin1<-c(7250.32,10243.6,13357.8,3.35,351266)
corresbankname2 <- c("Third National Bank","Union Trust Company","Wyoming National Bank","Seventh National Bank","Mechanics' National Bank")
corresbankloc2<-c("New York","New York","Wilkes-Barre","Philadelphia","New York")
corresdepoin2<-c(20218.2,1851.51,17761.2,2,314012)
corresbankname3<-c("Commercial National Bank",NA,NA,NA,"National Park Bank")
corresbankloc3<-c("Philadelphia",NA,NA,NA,"New York")
corresdepoin3<-c(29513.4,NA,NA,NA,206580)
bankdata<-data.frame(bankname, bankid,year,totass,corresbankname1,corresbankloc1,corresdepoin1,corresbankname2,corresbankloc2,corresdepoin2,corresbankname3,corresbankloc3,corresdepoin3)

此数据集显示每家银行在其他银行的投资金额 (corresdepoin) (corresbankname) 及其位置 (corresbankloc)。我有 43 个 corresbankname、corresbankloc 和 corresdepoin 变量。

由于这些银行在同一个城市投资了多家银行,我想知道每个城市的投资总额。因此,如果 correspobankloc 是纽约,我想生成一个名为“total_New York”的新列变量和由 corresdepoin 指示的总金额。如何循环 43 个变量?

例如,BankE 在纽约国家银行 (corresbankname1) (corresbankloc1) 有 351266 美元 (corresdepoin1),在纽约 Mechenics' 银行有 314012 美元,在纽约国家公园银行有 206580 美元。我想要一个名为“纽约总存款”的新列,显示对位于纽约的银行的投资总额为 871858 美元。因此,我想要的是一个条件语句,它循环遍历列(corresponbankloc)并检查这是纽约还是其他人,然后将“corresdepoin”中的相关值相加,以获得每个“银行名”在该城市的总投资总额。 "

另外,在 stata 中,如果我想为多个城市执行此操作,我会生成
当地城市“纽约”“费城”“匹兹堡”
并循环它们。 R中有类似的功能吗?

提前谢谢你。

【问题讨论】:

  • @H Park 目前尚不清楚total_New York 是所有corresdespoin 的总和还是单个corresdepoin1corresdespoin2 等的总和。
  • @H Park 我更新了解决方案。希望对您有所帮助。

标签: r rstudio


【解决方案1】:

另一个选项是reshape 数据集。使用dplyr。您可以创建一个函数来输出数据中的城市子集或整个城市。不知道有没有效率。

library(dplyr)
fun1 <- function(data, city, byloc = TRUE, allcity = TRUE) {
   data1 <- reshape(data, idvar = "bankname", varying = list(grep("corresdepoin", 
    colnames(data)), grep("corresbankloc", colnames(data))), timevar = "Bankloc", 
    direction = "long", v.names = c("corresdepoin", "corresbankloc"))

data1 <- data1[!is.na(data1$corresbankloc), ]
row.names(data1) <- 1:nrow(data1)

funlocorNot <- function(data, city, grouploc = TRUE) {
    dataF <- data %>%
             filter(corresbankloc %in% city)
    if (grouploc) {
        dataF1 <- dataF %>%
                  group_by(bankname, corresbankloc, Bankloc) %>%
                  summarise(Total = sum(corresdepoin, na.rm = TRUE))
    } 
    else {
        dataF1 <- dataF %>% 
                   group_by(bankname, corresbankloc) %>% 
                  summarise(Total = sum(corresdepoin, na.rm = TRUE))
    }

    dataF1[, 2] <- paste("Totalbylocation", dataF1[, 2], sep = "_")
    dataF1
}

funallCity <- function(data, grouploc = TRUE) {
    if (grouploc) {
        dataF1 <- data %>%
                  group_by(bankname, corresbankloc, Bankloc) %>% 
                  summarise(Total = sum(corresdepoin, na.rm = TRUE))
    }
     else {
        dataF1 <- data %>%
                  group_by(bankname, corresbankloc) %>% 
                  summarise(Total = sum(corresdepoin, na.rm = TRUE))
    }

    dataF1[, 2] <- paste("Totalbylocation", dataF1[, 2], sep = "_")
    dataF1
  }

 if (!allcity) {
    if (byloc) {
        funlocorNot(data1, city, TRUE)
    } 
    else {
        funlocorNot(data1, city, FALSE)
    }
  }
  else {
    if (byloc) {
        funallCity(data1, TRUE)
    } 
    else {
        funallCity(data1, FALSE)
    }
  }

}

 as.data.frame(fun1(bankdata, "New York", byloc=TRUE, allcity=FALSE))
 #                                bankname            corresbankloc Bankloc
 #1 The Anchor Savings Bank of Pittsburgh Totalbylocation_New York       2
 #2                      The Arsenal Bank Totalbylocation_New York       2
 #3               The Ashley Savings Bank Totalbylocation_New York       1
 #4                The Bank of Pittsburgh Totalbylocation_New York       1
 #5                The Bank of Pittsburgh Totalbylocation_New York       2
 #6                The Bank of Pittsburgh Totalbylocation_New York       3
 #     Total
 #1  20218.20
 #2   1851.51
 #3  13357.80
 #4 351266.00
 #5 314012.00
 #6 206580.00

 as.data.frame(fun1(bankdata, "New York", byloc=FALSE, allcity=FALSE))
 #                               bankname            corresbankloc     Total
 #1 The Anchor Savings Bank of Pittsburgh Totalbylocation_New York  20218.20
 #2                      The Arsenal Bank Totalbylocation_New York   1851.51
 #3               The Ashley Savings Bank Totalbylocation_New York  13357.80
 #4                The Bank of Pittsburgh Totalbylocation_New York 871858.00


 as.data.frame(fun1(bankdata, c("New York", "Pittsburgh"), byloc=FALSE, allcity=FALSE))

 as.data.frame(fun1(bankdata,  byloc=TRUE, allcity=TRUE))

【讨论】:

  • 感谢您的帮助。我把我的问题说得更清楚了。
猜你喜欢
  • 2022-06-13
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-10-12
  • 2021-03-20
  • 1970-01-01
相关资源
最近更新 更多