数据表重新格式化答案

【问题标题】：DataTable Reformat数据表重新格式化
【发布时间】：2020-04-18 13:57:17
【问题描述】：

    d1=data.frame("Student"=c(1,1,1,2,2,2,3,3,4,4,4),
    "Score"=c(1,1,1,1,2,2,1,3,1,2,3),
    "Grade"=c(5,6,7,3,4,5,2,4,7,8,9),
    "Class"=c(1,1,1,1,1,1,2,2,1,1,1),
    "School"=c(100,100,100,100,100,100,92,92,81,81,81))


    d2=data.frame("Student"=c(1,2,3,4,5),
"Q1"=c(0,1,0,1),
    "VX"=c(0,0,1,1),
    "A"=c(5,3,2,7),
    "B"=c(7,3,4,7),
    "C"=c(7,4,4,8),
    "D"=c(7,5,4,9),
    "Class"=c(1,1,2,1),
    "School"=c(100,100,92,81))

我有数据 'd1' 并希望数据 'd2' 有规则：

学生：只是来自 d1 的学生

Q1：如果来自 d1 的学生的 d1 得分曾经等于 2，则等于 1。如果不是，则等于 0。

VX：如果来自 d1 的学生的分数曾经等于 3，则等于 1。如果不是，则等于 0。

A：等于学生 d1 的第一/最低成绩

B：如果 d1 的分数等于 2，则将成绩放在发生之前。如果没有，则输入最后/最高等级。重要的是第一级不能等于2，所以不用担心丢失数据。

C：如果 d1 的分数等于 2，则在发生这种情况时输入成绩。如果没有，则输入最后/最高等级

D：如果 d1 的分数等于 3，则在发生这种情况时输入成绩。如果没有，则输入最后/最高等级

类：只是来自 d1 的类

学校：来自 d1 的学校

【问题讨论】：

标签： r data.table

【解决方案1】：

在data.table，我们可以做到。：

library(data.table)

setDT(d1)[,.(Q1 = as.integer(any(Score == 2)), 
        VX = as.integer(any(Score == 3)), 
         A = first(Grade), 
         B = if(any(Score == 2)) Grade[which.max(Score == 2) - 1] else max(Grade),
         C = if(any(Score == 2)) Grade[which.max(Score == 2)] else max(Grade),
         D = if(any(Score == 3)) Grade[which.max(Score == 3)] else max(Grade)), 
     .(Student, Class, School)]


#   Student Class School Q1 VX A B C D
#1:       1     1    100  0  0 5 7 7 7
#2:       2     1    100  1  0 3 3 4 5
#3:       3     2     92  0  1 2 4 4 4
#4:       4     1     81  1  1 7 7 8 9

在此处使用dplyr 在打字方面可能有点优势，因为我们可以参考之前创建的Q1 和VX 列。

library(dplyr)

d1 %>%
  group_by(Student, Class, School) %>%
  summarise(Q1 = as.integer(any(Score == 2)), 
            VX = as.integer(any(Score == 3)), 
             A = first(Grade), 
             B = if(Q1) Grade[which.max(Score == 2) - 1] else max(Grade), 
             C = if(Q1) Grade[which.max(Score == 2)] else max(Grade), 
             D = if(VX) Grade[which.max(Score == 3)] else max(Grade))

【讨论】：