【问题标题】:How can I calculate difference between between 2 non-date values in same column in same group?如何计算同一组中同一列中 2 个非日期值之间的差异?
【发布时间】:2020-02-21 12:19:48
【问题描述】:

这是该帖子的后续问题:In R how can I count the number of grouped pairs in which one row's column value is greater than another?

这是我对数据集 df1 的输入:

structure(list(Name = c("A.J. Ellis", "A.J. Ellis", "A.J. Pierzynski", 
"A.J. Pierzynski", "Aaron Boone", "Adam Kennedy", "Adam Melhuse", 
"Adrian Beltre", "Adrian Beltre", "Adrian Gonzalez", "Alan Zinter", 
"Albert Pujols", "Albert Pujols"), Age = c(37, 36, 37, 36, 36, 
36, 36, 37, 36, 36, 36, 37, 36), Year = c(2018, 2017, 2014, 2013, 
2009, 2012, 2008, 2016, 2015, 2018, 2004, 2017, 2016), Tm = c("SDP", 
"MIA", "TOT", "TEX", "HOU", "LAD", "TOT", "TEX", "TEX", "NYM", 
"ARI", "LAA", "LAA"), Lg = c("NL", "NL", "ML", "AL", "NL", "NL", 
"ML", "AL", "AL", "NL", "NL", "AL", "AL"), G = c(66, 51, 102, 
134, 10, 86, 15, 153, 143, 54, 28, 149, 152), PA = c(183, 163, 
362, 529, 14, 201, 32, 640, 619, 187, 40, 636, 650)), row.names = c(NA, 
13L), class = "data.frame")

以下是我之前的问题中正确匹配配对的代码:

df1 %>%
  arrange(Name, Age) %>%
  group_by(Name) %>%
  filter(last(G) < first(G))

每个分组对有两个观察值。每个还有一个名为 G 和一列年份。

以下是使用上述代码对数据进行分组后的样子:https://www.dropbox.com/s/hh2qgkbn4cy4k4l/Data%20after%20grouping.png?dl=0

现在,我想知道每个匹配对的“G 列”值在“37 岁”值和“36 岁”值之间的差异:(36 岁值)- (37 岁的价值)。阴性结果是可以的。

另外,对于数据集中所有匹配的对,我想要这些差异的总和。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    如果我理解正确的话:

    df <- structure(list(Name = c("A.J. Ellis", "A.J. Ellis", "A.J. Pierzynski", 
                            "A.J. Pierzynski", "Aaron Boone", "Adam Kennedy", "Adam Melhuse", 
                            "Adrian Beltre", "Adrian Beltre", "Adrian Gonzalez", "Alan Zinter", 
                            "Albert Pujols", "Albert Pujols"), Age = c(37, 36, 37, 36, 36, 
                                                                       36, 36, 37, 36, 36, 36, 37, 36), Year = c(2018, 2017, 2014, 2013, 
                                                                                                                 2009, 2012, 2008, 2016, 2015, 2018, 2004, 2017, 2016), Tm = c("SDP", 
                                                                                                                                                                               "MIA", "TOT", "TEX", "HOU", "LAD", "TOT", "TEX", "TEX", "NYM", 
                                                                                                                                                                               "ARI", "LAA", "LAA"), Lg = c("NL", "NL", "ML", "AL", "NL", "NL", 
                                                                                                                                                                                                            "ML", "AL", "AL", "NL", "NL", "AL", "AL"), G = c(66, 51, 102, 
                                                                                                                                                                                                                                                             134, 10, 86, 15, 153, 143, 54, 28, 149, 152), PA = c(183, 163, 
                                                                                                                                                                                                                                                                                                                  362, 529, 14, 201, 32, 640, 619, 187, 40, 636, 650)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                      13L), class = "data.frame")
    df1 <- df %>%
      arrange(Name, Age) %>%
      group_by(Name) %>%
      filter(last(G) < first(G)) %>% 
      mutate(g_diff = G[1] - G[2]) %>% 
      ungroup() %>% 
      mutate(sum_g_diff = sum(unique(g_diff)))
    
    > df1
    # A tibble: 4 x 9
      Name              Age  Year Tm    Lg        G    PA g_diff sum_g_diff
      <chr>           <dbl> <dbl> <chr> <chr> <dbl> <dbl>  <dbl>      <dbl>
    1 A.J. Pierzynski    36  2013 TEX   AL      134   529     32         35
    2 A.J. Pierzynski    37  2014 TOT   ML      102   362     32         35
    3 Albert Pujols      36  2016 LAA   AL      152   650      3         35
    4 Albert Pujols      37  2017 LAA   AL      149   636      3         35
    

    或者,如果g_diff 的累积总和(运行总数)是所需的输出(不汇总数据):

    df1 %>%
      group_by(Name) %>%
      mutate(cols = c(g_diff[1], rep(0, n() -1))) %>%
      ungroup() %>%
      mutate(cum_sum = cumsum(cols)) %>%
      select(-cols)
    
    # A tibble: 4 x 9
      Name              Age  Year Tm    Lg        G    PA g_diff cum_sum
      <chr>           <dbl> <dbl> <chr> <chr> <dbl> <dbl>  <dbl>   <dbl>
    1 A.J. Pierzynski    36  2013 TEX   AL      134   529     32      32
    2 A.J. Pierzynski    37  2014 TOT   ML      102   362     32      32
    3 Albert Pujols      36  2016 LAA   AL      152   650      3      35
    4 Albert Pujols      37  2017 LAA   AL      149   636      3      35
    

    (此解决方案基于this问题)

    【讨论】:

    • 您的结果看起来不错。只是好奇——在 sum_g_diff 列中是否有办法获得运行总计?
    • 我已经编辑了我的答案以包括运行总计。如果总结没问题,这个就简单多了,把mutate(sum_g_diff = sum(unique(g_diff)))改成summarize(cumsum_g_diff = cumsum(unique(g_diff)))
    • 当我尝试更简单的方法时,我得到一个错误:df2 %&gt;% arrange(Name, Age) %&gt;% group_by(Name) %&gt;% filter(last(G) &lt; first(G)) %&gt;% mutate(g_diff = G[1] - G[2]) %&gt;% ungroup() %&gt;% summarize(cumsum_g_diff = cumsum(unique(g_diff))) 这是错误:“错误:列cumsum_g_diff必须是长度1(一个汇总值),而不是62”跨度>
    • 我的错,您还必须在此之前删除ungroup()。这应该工作:df1 &lt;- df %&gt;% arrange(Name, Age) %&gt;% group_by(Name) %&gt;% filter(last(G) &lt; first(G)) %&gt;% mutate(g_diff = G[1] - G[2]) %&gt;% summarize(sum_g_diff = cumsum(unique(g_diff)))
    • 谢谢。如何让 sum_g_diff 列按降序排列?
    猜你喜欢
    • 2019-09-25
    • 1970-01-01
    • 1970-01-01
    • 2017-03-10
    • 2018-10-29
    • 1970-01-01
    • 1970-01-01
    • 2013-03-24
    • 2011-04-03
    相关资源
    最近更新 更多