【问题标题】:R conditional rowSums to replace with sums based on percentageR条件rowSums替换为基于百分比的总和
【发布时间】:2021-05-20 22:03:46
【问题描述】:

如果这些行代表

示例数据:

name Year1 Year2 Year3 Total Percent
John 1 2 1 4 0.7029877
Paul 230 100 150 480 84.358524
George 41 30 10 81 14.235501
Ringo 2 1 1 4 0.7029877
# Code for example data
name <- c("John", "Paul", "George", "Ringo")
Year1 <- c(1, 230, 41, 2)
Year2 <- c(2, 100, 30, 1)
Year3 <- c(1, 150, 10, 1)
df <- data.frame(name, Year1, Year2, Year3)
df$Total <- rowSums(select(df,Year1:Year3))
df$Percent <- df$Total/sum(df$Total)*100

在解决方案中,John 和 Ringo 将合并为一个“其他”解决方案,因为两者的百分比

# Code for example solution
name <- c("Paul", "George", "Other(n=2)")
Year1 <- c(230, 41, 3)
Year2 <- c(100, 30, 3)
Year3 <- c(150, 10, 2)
df2 <- data.frame(name, Year1, Year2, Year3)
df2$Total <- rowSums(select(df2,Year1:Year3))
df2$Percent <- df2$Total/sum(df2$Total)*100

示例解决方案:

name Year1 Year2 Year3 Total Percent
Paul 230 100 150 480 84.358524
George 41 30 10 81 14.235501
Other(n=2) 3 3 2 8 1.405975

【问题讨论】:

    标签: r dataframe replace conditional-statements rowsum


    【解决方案1】:
    library(tidyverse) # or use forcats::fct_lump(...
    df %>% 
      mutate(name_lumped = fct_lump(name, w = Percent, prop = 0.01)) %>%
      group_by(name_lumped) %>%
      summarize(across(Year1:Percent, sum))
    
    # A tibble: 3 x 6
      name_lumped Year1 Year2 Year3 Total Percent
      <fct>       <dbl> <dbl> <dbl> <dbl>   <dbl>
    1 George         41    30    10    81   14.2 
    2 Paul          230   100   150   480   84.4 
    3 Other           3     3     2     8    1.41
    

    【讨论】:

    • 谢谢!!我从未听说过 fct_lump(),感谢您向我介绍它!
    猜你喜欢
    • 1970-01-01
    • 2019-01-16
    • 1970-01-01
    • 1970-01-01
    • 2013-12-11
    • 2020-01-20
    • 2019-06-03
    • 2010-11-03
    相关资源
    最近更新 更多