【问题标题】:Reshaping a table with dplyr in R在 R 中使用 dplyr 重塑表格
【发布时间】:2018-05-07 19:33:43
【问题描述】:

欢迎在 R 中正确应用 dplyr 的小建议。 我们有以下数据:

   City            Amount    Category
1  Los Angeles     100       Film
2  Los Angeles     200       Film
3  Los Angeles     400       Music 
4  Seattle         300       Coffee
5  Boston          600       Books
...

最终结果应如下所示:

                        Film   Coffee   Books   ...
City  
Los Angeles, CA         Sum    Sum      Sum     Sum 
Seattle, WA             Sum    Sum      Sum     Sum 
Boston, MA              Sum    Sum      Sum     Sum  

我希望数据透视表汇总每个城市中每个类别的“金额”总值,以便城市在左侧列中,所有类别在顶部作为一行。

试过了:

data %>%                                            
  group_by(Location, Category) %>%
  summarise(Amount = sum(Amount))

看起来更像

   City            Amount    Category
1  Los Angeles     300       Film
3  Los Angeles     400       Music 
4  Seattle         300       Coffee
5  Boston          600       Books

计算是正确的,但如上所述,我们需要将 City 和 Category 作为矩阵,其中包含相应单元格内每个 Amount 的总和。

感谢您的帮助!

【问题讨论】:

    标签: r dplyr transform reshape


    【解决方案1】:

    您正在寻找的是 tidyr::spread 将您的 data.frame 从长格式重塑为宽格式:

    library(tidyverse)
    
    # recreate the data
    data <- tribble(
      ~City,             ~Amount,   ~Category,
      "Los Angeles",     100,       "Film",
      "Los Angeles",     200,       "Film",
      "Los Angeles",     400,       "Music", 
      "Seattle",         300,       "Coffee",
      "Boston",          600,       "Books"
    )
    
    # using your code to get the data in the long-format
    data_long <- data %>% 
      group_by(City, Category) %>%
      summarise(Amount = sum(Amount))
    
    data_long
    #> # A tibble: 4 x 3
    #> # Groups:   City [?]
    #>          City Category Amount
    #>         <chr>    <chr>  <dbl>
    #> 1      Boston    Books    600
    #> 2 Los Angeles     Film    300
    #> 3 Los Angeles    Music    400
    #> 4     Seattle   Coffee    300
    
    # spread to wide using the tidyr-package (in tidyverse)
    data_wide <- spread(data_long, key = "Category", value = "Amount", fill = 0)
    
    data_wide
    #> # A tibble: 3 x 5
    #> # Groups:   City [3]
    #>          City Books Coffee  Film Music
    #> *       <chr> <dbl>  <dbl> <dbl> <dbl>
    #> 1      Boston   600      0     0     0
    #> 2 Los Angeles     0      0   300   400
    #> 3     Seattle     0    300     0     0
    

    迈向矩阵

    mat <- as.matrix(data_wide %>% ungroup %>% select(-City))
    rownames(mat) <- data_wide$City
    
    mat
    #>             Books Coffee Film Music
    #> Boston        600      0    0     0
    #> Los Angeles     0      0  300   400
    #> Seattle         0    300    0     0
    
    str(mat)
    #>  num [1:3, 1:4] 600 0 0 0 0 300 0 300 0 0 ...
    #>  - attr(*, "dimnames")=List of 2
    #>   ..$ : chr [1:3] "Boston" "Los Angeles" "Seattle"
    #>   ..$ : chr [1:4] "Books" "Coffee" "Film" "Music"
    

    【讨论】:

    • 有没有办法将输出转换为一个矩阵,其和为数值?那将是最后的挑战。
    • 见上面的编辑。如果您不需要 col-names,则可以省略该步骤
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-06-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多