【问题标题】:Reshaping the dataframe using dcast()使用 dcast() 重塑数据帧
【发布时间】:2019-08-12 11:04:35
【问题描述】:

我正在尝试使用 dcast() 重塑我的数据框,但出现此错误

找不到对象'newid'

我不清楚错误。这是原始数据框

 Grade    Week     Subject    Location    Marks
   6      January   English     IND        76.50
   6      January   English     US         52.50
   7      January   English     IND        24.00
   7      January   English     US         5.00
   8      February  English     IND        63.00
   8      February  English     US         40.25
   9      February  English     IND        63.00
   9      February  English     US         32.50
   10     March     English     IND        27.00
   10     March     English     US         4.50
   11     March     English     IND        10.00



tmp <- plyr::ddply(monthTotalDataFinal, .(Subject, Grade), 
          transform,newid = paste(Subject))
d2 <- dcast(tmp, formula = Subject+newid ~ Grade+Location+Week, 
              value.var  = 'Marks')

所需数据框如下:

Subject 6_IND 7_IND 6_US 7_US 8_IND 9_IND 8_US 9_US 10_IND 11_IND 10_US

English  77    24    53   5    63    63    40   33   27     10     5

请为此提供一个合适的解决方案。

【问题讨论】:

    标签: r reshape2 dcast


    【解决方案1】:

    使用dplyrtidyr,我们可以uniteGradeLocation列并使用spread获取宽格式数据。

    library(dplyr)
    library(tidyr)
    
    df %>%
      unite(key, Grade, Location) %>%
      select(-Week) %>%
      spread(key, Marks)
    
    #  Subject 10_IND 10_US 11_IND 6_IND 6_US 7_IND 7_US 8_IND  8_US 9_IND 9_US
    #1 English     27   4.5     10  76.5 52.5    24    5    63 40.25    63 32.5
    

    基于 cmets,我们可能需要为多个 Subject 创建一个标识符列

    df %>%
      unite(key, Grade, Location) %>%
      select(-Week) %>%
      group_by(key, Subject) %>%
      mutate(row = row_number()) %>%
      spread(key, Marks)
    

    【讨论】:

    • Do you need to create unique ID with tibble::rowid_to_column()? 我收到此错误。如果有多个科目怎么办
    • @NevedhaAyyanar 它仍然适用于超过 1 个主题。你还有其他专栏吗?您可以使用多个主题的多个dput 更新您的帖子吗?也许你需要df %&gt;% unite(key, Grade, Location) %&gt;% select(-Week) %&gt;% group_by(key, Subject) %&gt;% mutate(row = row_number()) %&gt;% spread(key, Marks)
    【解决方案2】:

    因为这是一个dcast的问题,我们可以使用

    library(data.table)
    dcast(setDT(df), Subject ~ Grade + Location, value.var = 'Marks')
    #   Subject 6_IND 6_US 7_IND 7_US 8_IND  8_US 9_IND 9_US 10_IND 10_US 11_IND
    #1: English  76.5 52.5    24    5    63 40.25    63 32.5     27   4.5     10
    

    数据

    df <- structure(list(Grade = c(6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 
    10L, 11L), Week = c("January", "January", "January", "January", 
    "February", "February", "February", "February", "March", "March", 
    "March"), Subject = c("English", "English", "English", "English", 
    "English", "English", "English", "English", "English", "English", 
    "English"), Location = c("IND", "US", "IND", "US", "IND", "US", 
    "IND", "US", "IND", "US", "IND"), Marks = c(76.5, 52.5, 24, 5, 
    63, 40.25, 63, 32.5, 27, 4.5, 10)), class = "data.frame",
    row.names = c(NA, 
    -11L))
    

    【讨论】:

      猜你喜欢
      • 2022-08-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-04-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-28
      相关资源
      最近更新 更多