跨列和行汇总组答案

【问题标题】：Summarise groups across columns and rows跨列和行汇总组
【发布时间】：2020-05-20 22:05:41
【问题描述】：

我有一个数据框 df，其中：

 Year      Score    x1   x2   x3 
 2006      102      K    P    8   
 2006      89       L    K    P   
 2006      46       P    3    0   
 2007      76       L    2    1  
 2007      29       L    K    6   
 2008      690      P    4    4   
 2008      301      K    0    1   
 ...       ...      ..   ..   ..

但是，我希望它看起来像这样：

 Year     K    P    L    K_prop  P_prop  L_prop 
 2006     191  191  135  0.37    0.37    0.26    
 2007     29        105  0.22            0.78
 2008     301  690       0.30    0.70
 ...      ..   ..   ..   ..      ..      ..

其中每个x 成为一列，其中包含按年份分组的该列的总和。我还想要另一列代表各列在总分中的比例。

K_prop = K/(K+P+L); P_prop = P/(K+P+L) ; L_prop = L/(K+P+L)

如果这描述性不够，我很抱歉，但我感谢您提供的任何和所有帮助！

【问题讨论】：

是基于输入显示的预期输出
我想如果您解释如何计算 K_prop、P_prop 和 L_prop，也许有人可以帮助您。
嗨对不起，K_prop = K/(K+P+L) 分别！

标签： r dataframe multiple-columns

【解决方案1】：

我们可以使用pivot_longer 重新整形为“长”格式，然后进行计算并将其重新整形为“宽”格式

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
    pivot_longer(cols = starts_with('x')) %>% 
    filter(str_detect(value, '[A-Za-z]')) %>% 
    group_by(Year, value) %>%
    summarise(Score = sum(Score)) %>%
    ungroup %>%        
    group_by(Year) %>%
    mutate(prop = Score/sum(Score)) %>% 
    pivot_wider(names_from = value, values_from = c(Score, prop))
# A tibble: 3 x 7
# Groups:   Year [3]
#   Year Score_K Score_L Score_P prop_K prop_L prop_P
#  <int>   <int>   <int>   <int>  <dbl>  <dbl>  <dbl>
#1  2006     191      89     237  0.369  0.172  0.458
#2  2007      29     105      NA  0.216  0.784 NA    
#3  2008     301      NA     690  0.304 NA      0.696

数据

df <- structure(list(Year = c(2006L, 2006L, 2006L, 2007L, 2007L, 2008L, 
2008L), Score = c(102L, 89L, 46L, 76L, 29L, 690L, 301L), x1 = c("K", 
"L", "P", "L", "L", "P", "K"), x2 = c("P", "K", "3", "2", "K", 
"4", "0"), x3 = c("8", "P", "0", "1", "6", "4", "1")), 
class = "data.frame", row.names = c(NA, 
-7L))

【讨论】：

谢谢！这个结构看起来是正确的，但是我在新列中得到了像和这样的值。有什么想法吗？
@keaton 是基于输入显示的预期输出，因为我认为... 意味着您有更多数据
@keaton 不是问题。如果您可以dput 示例，即帖子中的 10-15 行具有正确的预期输出，则更容易交叉检查
@keaton 在您展示的示例中，2006 年我没有得到 'P' =191
做到了！惊人的阿克伦！