【问题标题】:mathematical operations between the grouped data and a dataframe in RR中分组数据和数据框之间的数学运算
【发布时间】:2020-10-04 08:50:35
【问题描述】:

我将与上述问题相对应的 2 个数据框简化如下:

ss <- structure(list(country = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "k", "v"), class = "factor"), year = c(1961L, 
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 
1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 1962L, 1963L, 1961L, 
1962L, 1963L), x = c(19L, 4L, 3L, 23L, 24L, 16L, 28L, 9L, 29L, 
20L, 14L, 21L, 30L, 1L, 12L, 17L, 25L, 26L, 13L, 8L, 2L, 7L, 
10L, 11L, 6L, 22L, 27L, 5L, 15L, 18L), y = c(23L, 20L, 28L, 7L, 
4L, 25L, 5L, 8L, 10L, 13L, 9L, 1L, 21L, 11L, 26L, 16L, 27L, 2L, 
29L, 24L, 3L, 15L, 6L, 19L, 14L, 22L, 12L, 18L, 17L, 30L), z = c(22L, 
4L, 23L, 16L, 29L, 14L, 11L, 13L, 27L, 26L, 5L, 12L, 2L, 9L, 
10L, 25L, 7L, 21L, 6L, 20L, 3L, 30L, 18L, 8L, 1L, 24L, 17L, 15L, 
28L, 19L)), class = "data.frame", row.names = c(NA, -30L))

zz <- structure(list(country = structure(c(1L, 1L, 1L), .Label = "w", class = "factor"), 
    year = 1961:1963, x = c(2L, 1L, 3L), y = c(3L, 1L, 2L), z = 1:3), class = "data.frame", row.names = c(NA, 
-3L))

数据框ss 代表来自 10 个国家/地区的 3 年数据。并且,数据框zz 代表相应年份的世界数据。 是否有任何方法可以应用诸如ss(for each each group as country)/zz 之类的条件,以便可以提取每个国家/地区的价值作为与世界数据的比率。我的意思是前两列也应保留为ss

我们能否避免使用dplyrtidverse 重塑数据,这只会增加更多的编码行。 谢谢。

【问题讨论】:

    标签: r dataframe sum divide


    【解决方案1】:

    使用match

    cbind(ss[1:2], ss[-(1:2)] / zz[match(ss$year, zz$year), -(1:2)])
    #   country year          x         y         z
    # 1        a 1961  9.5000000  7.666667 22.000000
    # 2        b 1962  4.0000000 20.000000  2.000000
    # 3        c 1963  1.0000000 14.000000  7.666667
    # 4        d 1961 11.5000000  2.333333 16.000000
    # 5        e 1962 24.0000000  4.000000 14.500000
    # 6        f 1963  5.3333333 12.500000  4.666667
    # 7        g 1961 14.0000000  1.666667 11.000000
    # 8        h 1962  9.0000000  8.000000  6.500000
    # 9        k 1963  9.6666667  5.000000  9.000000
    # 10       v 1961 10.0000000  4.333333 26.000000
    # 11       a 1962 14.0000000  9.000000  2.500000
    # 12       b 1963  7.0000000  0.500000  4.000000
    # 13       c 1961 15.0000000  7.000000  2.000000
    # 14       d 1962  1.0000000 11.000000  4.500000
    # 15       e 1963  4.0000000 13.000000  3.333333
    # 16       f 1961  8.5000000  5.333333 25.000000
    # 17       g 1962 25.0000000 27.000000  3.500000
    # 18       h 1963  8.6666667  1.000000  7.000000
    # 19       k 1961  6.5000000  9.666667  6.000000
    # 20       v 1962  8.0000000 24.000000 10.000000
    # 21       a 1963  0.6666667  1.500000  1.000000
    # 22       b 1961  3.5000000  5.000000 30.000000
    # 23       c 1962 10.0000000  6.000000  9.000000
    # 24       d 1963  3.6666667  9.500000  2.666667
    # 25       e 1961  3.0000000  4.666667  1.000000
    # 26       f 1962 22.0000000 22.000000 12.000000
    # 27       g 1963  9.0000000  6.000000  5.666667
    # 28       h 1961  2.5000000  6.000000 15.000000
    # 29       k 1962 15.0000000 17.000000 14.000000
    # 30       v 1963  6.0000000 15.000000  6.333333
    

    【讨论】:

      【解决方案2】:

      这也可以使用包data.table 来完成:

      as.data.table(ss)[zz, .(country, year, x = x/i.x, y = y/i.y, z = z/i.z), on = .(year)]
      #     country year          x         y         z
      #  1:       a 1961  9.5000000  7.666667 22.000000
      #  2:       d 1961 11.5000000  2.333333 16.000000
      #  3:       g 1961 14.0000000  1.666667 11.000000
      #  4:       v 1961 10.0000000  4.333333 26.000000
      #  5:       c 1961 15.0000000  7.000000  2.000000
      #  6:       f 1961  8.5000000  5.333333 25.000000
      #  7:       k 1961  6.5000000  9.666667  6.000000
      #  8:       b 1961  3.5000000  5.000000 30.000000
      #  9:       e 1961  3.0000000  4.666667  1.000000
      # 10:       h 1961  2.5000000  6.000000 15.000000
      # 11:       b 1962  4.0000000 20.000000  2.000000
      # 12:       e 1962 24.0000000  4.000000 14.500000
      # 13:       h 1962  9.0000000  8.000000  6.500000
      # 14:       a 1962 14.0000000  9.000000  2.500000
      # 15:       d 1962  1.0000000 11.000000  4.500000
      # 16:       g 1962 25.0000000 27.000000  3.500000
      # 17:       v 1962  8.0000000 24.000000 10.000000
      # 18:       c 1962 10.0000000  6.000000  9.000000
      # 19:       f 1962 22.0000000 22.000000 12.000000
      # 20:       k 1962 15.0000000 17.000000 14.000000
      # 21:       c 1963  1.0000000 14.000000  7.666667
      # 22:       f 1963  5.3333333 12.500000  4.666667
      # 23:       k 1963  9.6666667  5.000000  9.000000
      # 24:       b 1963  7.0000000  0.500000  4.000000
      # 25:       e 1963  4.0000000 13.000000  3.333333
      # 26:       h 1963  8.6666667  1.000000  7.000000
      # 27:       a 1963  0.6666667  1.500000  1.000000
      # 28:       d 1963  3.6666667  9.500000  2.666667
      # 29:       g 1963  9.0000000  6.000000  5.666667
      # 30:       v 1963  6.0000000 15.000000  6.333333
      #     country year          x         y         z
      

      【讨论】:

      • 非常感谢,这进一步扩大了选择范围。对于 3 列,这很好,但如果一个是 make 操作,例如,说一个 10 列的范围,那么代码会很长,我猜。
      猜你喜欢
      • 1970-01-01
      • 2016-03-15
      • 1970-01-01
      • 2020-06-16
      • 1970-01-01
      • 2020-12-23
      • 2019-10-15
      • 2018-09-28
      • 2019-01-31
      相关资源
      最近更新 更多