【问题标题】:How to join tables and generate sums of columns?如何连接表并生成列总和?
【发布时间】:2015-06-30 11:58:19
【问题描述】:

我有几个具有相同结构的表(特别是两个)。我想加入 ID_Position 和 ID_Name 并在输出表中生成 1 月和 2 月的总和(两列中可能都有一些 NA)

ID_Position<-c(1,2,3,4,5,6,7,8,9,10)
Position<-c("A","B","C","D","E","H","I","J","X","W")
ID_Name<-c(11,12,13,14,15,16,17,18,19,20)
Name<-c("Michael","Tobi","Chris","Hans","Likas","Martin","Seba","Li","Sha","Susi")
  jan<-c(10,20,30,22,23,2,22,24,26,28)
  feb<-c(10,30,20,12,NA,3,NA,22,24,26)

df1 <- data.frame(ID_Position,Position,ID_Name,Name,jan,feb)


ID_Position<-c(1,2,3,4,5,6,7,8,9,10)
Position<-c("A","B","C","D","E","H","I","J","X","W")
ID_Name<-c(11,12,13,14,15,16,17,18,19,20)
 Name<-c("Michael","Tobi","Chris","Hans","Likas","Martin","Seba","Li","Sha","Susi")
  jan<-c(10,20,30,22,NA,NA,22,24,26,28)
  feb<-c(10,30,20,12,23,3,3,22,24,26)

  df2 <- data.frame(ID_Position,Position,ID_Name,Name,jan,feb)

我尝试了内部连接和完全连接。但这似乎如我所愿:

   library(plyr)

    test<-join(df1, df2, by =c("ID_Position","ID_Name") , type = "inner", match = "all")

期望的输出:

  ID_Position   Position    ID_Name       Name         jan  feb
      1            A          11          Michael        20 20
      2            B          12          Tobi           40 60
      3            C          13          Chris          60 40
      4            D          14          Hans           44 24
      5            E          15          Likas          23 23
      6            H          16          Martin         2  6
      7            I          17          Seba           44 22
      8            J          18          Li             48 44
      9            X          19          Sha            52 48
     10            W          20          Susi           56 52

【问题讨论】:

  • 那么你想要实现内部连接还是完全连接?此外,您的数据集是相同的。你能提供你想要的输出吗?例如,与以下工作? library(data.table) ;setkey(setDT(df1), ID_Position, ID_Name) ; setkey(setDT(df2), ID_Position, ID_Name) ; df2[df1, .(jan = sum(jan, i.jan, na.rm = TRUE), sum(feb = feb, i.feb, na.rm = TRUE)), by = .EACHI]
  • 您的数据集在六行的feb 中没有任何信息

标签: r


【解决方案1】:

您想要的输出似乎并不完全正确,但这里有一个示例,说明如何使用data.table 二进制连接有效地做到这一点,它允许您在使用@987654322 连接的同时有效地运行函数@选项

library(data.table)
setkey(setDT(df1), ID_Position, ID_Name, Name) 
setkey(setDT(df2), ID_Position, ID_Name, Name)
df2[df1, .(jan = sum(jan, i.jan, na.rm = TRUE), 
           feb = sum(feb, i.feb, na.rm = TRUE)), 
    by = .EACHI]
#     ID_Position ID_Name    Name jan feb
#  1:           1      11 Michael  20  20
#  2:           2      12    Tobi  40  60
#  3:           3      13   Chris  60  40
#  4:           4      14    Hans  44  24
#  5:           5      15   Likas  46   0
#  6:           6      16  Martin   0   6
#  7:           7      17    Seba  44   0
#  8:           8      18      Li  48  44
#  9:           9      19     Sha  52  48
# 10:          10      20    Susi  56  52

【讨论】:

  • @大卫。谢谢你的方法!
猜你喜欢
  • 2020-07-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-03-22
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多