【问题标题】:Multiplying year based vector with a year & month based matrix in R在R中将基于年份的向量与基于年份和月份的矩阵相乘
【发布时间】:2019-03-18 18:08:19
【问题描述】:

我有两个数据框

df1 

Year  Farm 1  Farm 2  Farm 3
2015    1000    2000    1500
2016    500     2000    1000

df 2

Year Month  Farm 1 Farm 2 Farm 3
2015  Jan    1        1      3
2015  Feb    1        2      1
2016  Jan    2        2      2
2016  Feb    2        1      2

我想根据年份将 df2 中各个农场的 df1 中的年度值相乘,以便输出为...

df 3 

Year    Month   Farm 1      Farm 2      Farm 3
2015    Jan     1000        2000        4500
2015    Feb     1000        4000        1500
2016    Jan     1000        4000        2000
2016    Feb     1000        2000        2000

我的年份格式正确,但一直在努力寻找 dplyr 中 group_by 的解决方案。我应该尝试不同的路径吗?

【问题讨论】:

    标签: r sorting date split dplyr


    【解决方案1】:

    1) Base R 假设df1df2 在最后的注释中重复显示,合并数据帧得到数据帧m。然后创建一个新的数据框df3,用df2 的相同列和m 的相应列的乘积替换除d2 的前两个列之外的所有列。没有使用任何包。

    m <- merge(df2, df1, by = 1)
    df3 <- replace(df2, -(1:2), df2[-(1:2)] * m[-(1:ncol(df2))] )
    

    给予:

    > df3
      Year Month Farm1 Farm2 Farm3
    1 2015   Jan  1000  2000  4500
    2 2015   Feb  1000  4000  1500
    3 2016   Jan  1000  4000  2000
    4 2016   Feb  1000  2000  2000
    

    2) sqldf 如果你只有几个农场,那么把它们都写出来是可行的:

    library(sqldf)
    
    sqldf("select 
             Year, 
             b.Month, 
             a.Farm1 * b.Farm1 Farm1,
             a.Farm2 * b.Farm2 Farm2,
             a.Farm3 * b.Farm3 Farm3
           from df2 b left join df1 a using (Year)")
    

    给予:

      Year Month Farm1 Farm2 Farm3
    1 2015   Jan  1000  2000  4500
    2 2015   Feb  1000  4000  1500
    3 2016   Jan  1000  4000  2000
    4 2016   Feb  1000  2000  2000
    

    注意

    Lines1 <- "
    Year  Farm1  Farm2  Farm3
    2015    1000    2000    1500
    2016    500     2000    1000"
    
    Lines2 <- "
    Year Month  Farm1 Farm2 Farm3
    2015  Jan    1        1      3
    2015  Feb    1        2      1
    2016  Jan    2        2      2
    2016  Feb    2        1      2"
    
    df1 <- read.table(text = Lines1, header = TRUE)
    df2 <- read.table(text = Lines2, header = TRUE)
    

    【讨论】:

    • 完美运行。谢谢!
    【解决方案2】:

    这是一个从data.table 加入的选项。将第二个数据集 ('df2') 与第一个 ('df1') on 'Year' 列连接起来,并将.SD(基于.SDcols 中指定的列的data.table 的子集)与第一个数据中的相应列,分配(:=)输出以更新第二个数据集中的“农场”列

    library(data.table)
    nm1 <- grep("Farm", names(df1), value = TRUE)
    setDT(df2)[df1, (nm1) := .SD * mget(paste0("i.", names(.SD))), 
               on = .(Year), .SDcols = nm1]
    df2
    #   Year Month Farm1 Farm2 Farm3
    #1: 2015   Jan  1000  2000  4500
    #2: 2015   Feb  1000  4000  1500
    #3: 2016   Jan  1000  4000  2000
    #4: 2016   Feb  1000  2000  2000
    

    【讨论】:

      【解决方案3】:

      我会通过将数据帧转换为长格式,加入它们,然后进行计算来解决这个问题。这是一个例子:

      # Load packages
      library(dplyr)
      library(tidyr)
      
      # Make-up data
      df1 = data.frame(Year = 2008:2018,
                       Farm1 = runif(n = 11, min = 0, max = 2000),
                       Farm2 = runif(n = 11, min = 0, max = 2000),
                       Farm3 = runif(n = 11, min = 0, max = 2000))
      
      df2 = expand.grid(Year = 2008:2018,
                        Month = month.abb[1:12]) %>% 
        mutate(Farm1 = runif(n = 132, min = 0, max = 10),
               Farm2 = runif(n = 132, min = 0, max = 10),
               Farm3 = runif(n = 132, min = 0, max = 10))
      
      # Transform data into long format
      df1.long = df1 %>%
        gather(key = Farm, value = AnnualValue, Farm1:Farm3)
      
      df2.long = df2 %>%
        gather(key = Farm, value = Value, Farm1:Farm3)
      
      # Now left_join on Year and multiply columns
      df.comb = left_join(df1.long, df2.long) %>% 
        mutate(NewValue = Value * AnnualValue)
      
      # Transform back to wide format (if necessary)
      df.comb.wide = df.comb %>% 
        select(-AnnualValue, -Value) %>% # drop values not included in wide format
        spread(key = Farm, value = NewValue)
      

      【讨论】:

      • 完美!谢谢!
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-10-15
      • 2020-03-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多