【问题标题】:How do I take an average over multiple columns based on conditions in R?如何根据 R 中的条件对多列取平均值?
【发布时间】:2021-07-29 03:04:03
【问题描述】:
tconst            averageRating language startYear
1 tt0000001           5.7       en      1894
2 tt0000002           6.0       de      1892
3 tt0000003           6.5       ja      1892
4 tt0000004           6.1       es      1892
5 tt0000007           5.4       de      1894
6 tt0000008           5.4       ja      1894

如何找到每种语言每年的平均评分?那么每年所有ja的平均值?我想以包含两列的每种语言的数据框结束,一列包含所有年份,另一列包含当年的平均 averageRating(示例如下)

Year Rating
1990   6.0
1991   5.7
1992   6.2
1993   5.5
1994   6.5
1995   6.7

我能想到的唯一方法是使用三个 for 循环,但这似乎低效无望,一定有更好的方法吗?

谢谢

【问题讨论】:

    标签: r performance average mean


    【解决方案1】:

    startYearlanguage 的第一组。然后通过mean(averageRating) 汇总,然后pivot_wider() 合并所有语言的输出:

    require(tidyr)
    require(dplyr)
    df <- df %>% group_by(startYear,language) %>% 
                 dplyr::summarise(Rating=mean(averageRating)) %>%
                 tidyr::pivot_wider(names_from = language, values_from = Rating)
    > df 
      startYear    de    es    ja    en
          <dbl> <dbl> <dbl> <dbl> <dbl>
    1      1892   6     6.1   6.5  NA  
    2      1894   5.4  NA     5.4   5.7
    

    更整洁(感谢@LMc):

    df %>% tidyr::pivot_wider(id_cols = startYear, names_from = language, values_from = averageRating, values_fn = function(x) mean(x, na.rm = T))
      startYear    de    es    ja    en
          <dbl> <dbl> <dbl> <dbl> <dbl>
    1      1892   6     6.1   6.5  NA  
    2      1894   5.4  NA     5.4   5.7
    

    数据:

    df <- data.frame(tconst = c("tt0000001","tt0000002","tt0000003","tt0000004","tt0000007","tt0000008"), averageRating=c(5.7,6.0,6.5,6.1,5.4,5.4),language=c("en","de","ja","es","de","ja"), startYear = c(1894,1892,1892,1892,1894,1894))
    

    【讨论】:

    • 您可以将其简化为:df %&gt;% tidyr::pivot_wider(id_cols = startYear, names_from = language, values_from = averageRating, values_fn = function(x) mean(x, na.rm = T))
    猜你喜欢
    • 1970-01-01
    • 2018-11-24
    • 2016-12-09
    • 1970-01-01
    • 2021-06-20
    • 1970-01-01
    • 1970-01-01
    • 2021-06-23
    • 2015-04-11
    相关资源
    最近更新 更多