【问题标题】:Complex Data Frame calculations in RR中的复杂数据框计算
【发布时间】:2018-10-31 19:30:43
【问题描述】:

我目前正在导入两个这样显示的表(以最基本的形式)

Table 1
State Month Account           Value
NY    Jan   Expected Sales    1.04
NY    Jan   Expected Expenses 1.02

Table 2
State Month Account    Value
NY    Jan   Sales      1,000
NY    Jan   Customers  500
NY    Jan   F Expenses 1,000
NY    Jan   V Expenses 100

我的最终目标是创建第三个数据框,其中包含前两行的值并根据函数计算第四列

NextYearExpenses = (t2 F Expenses + t2 V Expenses)* t1 Expected Expenses
NextYearSales = (t2 sales) * t1 Expected Sales

所以我想要的输出如下

State Month New Account Value
NY    Jan   Sales       1,040
NY    Jan   Expenses    1,122

我对 R 比较陌生,我认为 ifelse 语句可能是我最好的选择。我尝试过合并表格并使用简单的列函数进行计算,但没有真正的进展。

有什么建议吗?

【问题讨论】:

  • 请分享您的数据和可重复的示例,以便人们可以提供帮助,到目前为止您尝试了什么? stackoverflow.com/questions/5963269/…
  • 谢谢。我会尝试重新格式化我的问题并提供必要的信息。

标签: r dplyr data-manipulation


【解决方案1】:

你可能需要做一些数据整理,但没有什么特别的

require(dplyr)
Table1<-tibble(State=c("NY","NY"), Month=c("Jan","Jan"), Account=c("Expected Sales", "Expected Expenses"), Value=c(1.04,1.02))

Table2<-tibble(State=c("NY","NY","NY","NY"), Month=c("Jan","Jan","Jan","Jan"), Account=c("Sales", "Customers", "F Expenses","V Expenses"), Value=c(1000,500,1000,100))

我要做的第一件事是将帐户重命名为通用名称,即费用,这将有助于我稍后合并到 Table1

Table2$Account[Table2$Account=="F Expenses"]<-"Expenses"
Table2$Account[Table2$Account=="V Expenses"]<-"Expenses"

然后我使用 group_by 函数并按州、月和帐户分组并求和

Table2 <- Table2 %>% group_by(State, Month,Account) %>% 
summarise(Tot_Value=sum(Value)) %>% ungroup()
head(Table2)

## State Month Account   Tot_Value
##  <chr> <chr> <chr>         <dbl>
## 1 NY    Jan   Customers       500
## 2 NY    Jan   Expenses       1100
## 3 NY    Jan   Sales          1000

然后与表 1 中的帐户重命名类似

Table1$Account[Table1$Account=="Expected Sales"]<-"Sales"
Table1$Account[Table1$Account=="Expected Expenses"]<-"Expenses"

合并成第三个表,Table 3

Table3<- left_join(Table1,Table2)

使用 mutate 来做需要的操作

Table3 <- Table3 %>% mutate(Value2=Value*Tot_Value)
head(Table3)

## # A tibble: 2 x 6
##   State Month Account  Value Tot_Value Value2
##   <chr> <chr> <chr>    <dbl>     <dbl>  <dbl>
## 1 NY    Jan   Sales     1.04      1000   1040
## 2 NY    Jan   Expenses  1.02      1100   1122

【讨论】:

    【解决方案2】:

    这是我对dplyrtidyr 所做的。 首先,我将您的初始表格与rbind 组合成一个长格式表格。由于您对每个 Account 值都有唯一标识符,因此它们不需要是单独的表。接下来我group_byState 和 Month 对这些进行分组,假设最终您将拥有各种州/月份。接下来我summarise 基于您指定的 Account 值并创建了两个新列。最后为了把它变成你想要的长格式,我使用了tidyr 中的gather 从宽格式转换为长格式。您可以通过删除%&gt;% 之后将这些命令分成更小的块,以便更好地了解每个步骤的作用。

    library(dplyr)
    library(tidyr)
    rbind(df,df2) %>%
      group_by(State,Month) %>%
      summarise(Expenses = (Value[which(Account == "F Expenses")] + Value[which(Account == "V Expenses")]) * Value[which(Account == "Expected Expenses")],
                Sales = Value[which(Account == "Sales")] * Value[which(Account == "Expected Sales")]) %>%
      gather(New_Account,Value, c(Expenses,Sales))
    
    
    # A tibble: 2 x 4
    # Groups:   State [1]
    #  State Month New_Account Value
    #  <chr> <chr> <chr>       <dbl>
    #1 NY    Jan   Expenses     1122
    #2 NY    Jan   Sales        1040
    

    【讨论】:

      【解决方案3】:

      我建议您查看the concept of "tidy data",因为使用您目前拥有的结构处理数据存在一些真正的挑战。例如。创建 t3 应该只需要 2-3 行代码,所有这些只是为了解决您的数据架构:

      library(tidyverse)
      
      t1 <- data.frame(State = rep("NY", 2),
                       Month = rep(as.Date("2018-01-01"), 2),
                       Account = c("Expected Sales", "Expected Expenses"),
                       Value = c(1.04, 1.02),
                       stringsAsFactors = FALSE)
      
      t2 <- data.frame(State = rep("NY", 4),
                       Month = rep(as.Date("2018-01-01"), 4),
                       Account = c("Sales", "Customers", "F Expenses", "V Expenses"),
                       Value = c(1000, 500, 1000, 100),
                       stringsAsFactors = FALSE)
      
      t3 <- t2 %>% 
        spread(Account, Value) %>% 
        inner_join({
          t1 %>% 
            spread(Account, Value)
        }, by = c("State" = "State", "Month" = "Month")) %>% 
        mutate(NewExpenses = (`F Expenses` + `V Expenses`) * `Expected Expenses`,
               NewSales = Sales * `Expected Sales`) %>% 
        select(State, Month, Sales = NewSales, Expenses = NewExpenses) %>% 
        gather(Sales, Expenses, key = `New Account`, value = Value)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-10-07
        • 1970-01-01
        • 2022-12-08
        • 2018-06-18
        • 1970-01-01
        相关资源
        最近更新 更多