【问题标题】:Join two dataframes where column names are in rows加入列名在行中的两个数据框
【发布时间】:2013-07-09 13:58:51
【问题描述】:

我确实有以下数据结构:

x <- read.table(header=T, text="
variable class value
a a1 1
a a2 2
a a3 3
b b1 4
b b2 5
b b3 6
c c1 7
c c2 8
c a3 9")

y <- read.table(header=T, text="
a b c
a1 b2 c2
a2 b1 c1
a3 b3 a3"
)

现在我需要向 df y - out_a, out_b, out_c 添加三个变量,其中我需要根据列名和类将 x$value 中的值映射到 df y。输出应如下所示:

a b c a_out b_out c_out
a1 b2 c3 1 5 8
a2 b1 c1 2 4 7
a3 b3 c2 3 6 9

我可以使用sqldf 来做到这一点:

sqldf("select y.*, x1.value as a_out , x2.value as b_out, x3.value as c_out
        from 
          y
          join x as x1 on (x1.class=y.a and x1.variable='a')
          join x as x2 on (x2.class=y.b and x2.variable='b')
          join x as x3 on (x3.class=y.c and x3.variable='c')
      ")

在现实世界中,我有很多列 (50+),因此我正在寻找更优雅的东西。

【问题讨论】:

    标签: r


    【解决方案1】:

    我确信有一种更优雅的方法可以做到这一点,而且我不是 100% 理解你想要做什么,但我认为这应该可以解决问题:

    for(col in names(y)){
      tmp <- x[x$variable == col,c("class","value")]
      y[,paste0(col,"_out")] <- tmp$value[match(as.character(y[,col]),as.character(tmp$class))]
    }
    
       a  b  c a_out b_out c_out
    1 a1 b2 c2     1     5     8
    2 a2 b1 c1     2     4     7
    3 a3 b3 a3     3     6     9
    

    【讨论】:

      【解决方案2】:

      这是另一种方法:

      ## Convert "y" to a long data.frame
      y2 <- stack(lapply(y, as.character))
      
      ## Reorder "x" according to "y2"
      x2 <- x[match(do.call(paste, x[1:2]), do.call(paste, rev(y2))), ]
      
      ## Use ave to generate an "id" variable
      x2$id <- ave(x2$variable, x2$variable, FUN = seq_along)
      
      ## "x2" now looks like this
      x2
      #   variable class value id
      # 1        a    a1     1  1
      # 2        a    a2     2  2
      # 3        a    a3     3  3
      # 5        b    b2     5  1
      # 4        b    b1     4  2
      # 6        b    b3     6  3
      # 8        c    c2     8  1
      # 7        c    c1     7  2
      # 9        c    a3     9  3
      
      ## Use reshape to get your data in the wide format that you are looking for
      reshape(x2, direction = "wide", idvar = "id", timevar = "variable")
      #   id class.a value.a class.b value.b class.c value.c
      # 1  1      a1       1      b2       5      c2       8
      # 2  2      a2       2      b1       4      c1       7
      # 3  3      a3       3      b3       6      a3       9
      

      从那里开始,这几乎是装饰性的工作......使用一些 sub/gsub 重命名列,并在必要时重新排序。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-05-18
        • 2019-08-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多