【发布时间】:2019-08-29 10:22:49
【问题描述】:
我的数据
mydata=structure(list(ID_WORKES = c(1000561L, 1000561L, 1000561L, 1000561L,
1000561L, 1000561L, 1000562L, 1000562L, 1000562L, 1000562L, 1000562L,
1000562L), ID_SP_0R = c(21L, 463L, 465L, 500L, 600L, 1951L, 21L,
463L, 465L, 500L, 600L, 1951L), KOD_DEPO = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), KOD_DOR = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), COLUMN_MASH = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), prop_violations = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), mash_score = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("ID_WORKES",
"ID_SP_0R", "KOD_DEPO", "KOD_DOR", "COLUMN_MASH", "prop_violations",
"mash_score"), class = "data.frame", row.names = c(NA, -12L))
第二个数据有这样的格式
mydata2=structure(list(ID_SP_NAR = c(146L, 1088L, 1612L, 30L, 745L, 905L
), KOD_DEPO = c(4575L, 8998L, 8134L, 4038L, 9540L, 683L), KOD_DOR = c(94L,
94L, 76L, 76L, 94L, 94L), ID_MASH = c(1000561L, 1000561L, 1000561L,
1000561L, 1000562L, 1000562L), COLUMN_MASH = c(10L, 2L, 1L, 1L,
17L, 5L), n_routes_total = c(15L, 14L, 25L, 11L, 18L, 4L), n_violations = c(15L,
10L, 13L, 8L, 7L, 4L), is_violation = c(1L, 1L, 1L, 1L, 1L, 1L
), prop_violations = structure(c(3L, 4L, 1L, 5L, 2L, 6L), .Label = c("0.04000000",
"0.05555556", "0.06666667", "0.07142857", "0.09090909", "0.25000000"
), class = "factor")), .Names = c("ID_SP_NAR", "KOD_DEPO", "KOD_DOR",
"ID_MASH", "COLUMN_MASH", "n_routes_total", "n_violations", "is_violation",
"prop_violations"), class = "data.frame", row.names = c(NA, -6L
))
变量KOD_DEPO, KOD_DOR, COLUMN_MASH 的值如何为零
从 mydata 数据集中替换为每个 ID_WORKES 的 mydata2 数据集中这些变量的最后一个值
ID_WORKER=ID_MASH 是加入的关键变量。
所以想要的输出。
对于id_mash =1000561,mydata2 中的最后一个 kod_depo 是 4038,kod_dor 是 76,COLUMN_MASH 是 1
对于id_mash =1000562,mydata2 中的最后一个 depo 是 683,kod_dor 是 94,并且 COLUMN_MASH 是 5
ID_WORKES ID_SP_0R KOD_DEPO KOD_DOR COLUMN_MASH prop_violations mash_score
1 1000561 21 4038 76 1 0 0
2 1000561 463 4038 76 1 0 0
3 1000561 465 4038 76 1 0 0
4 1000561 500 4038 76 1 0 0
5 1000561 600 4038 76 1 0 0
6 1000561 1951 4038 76 1 0 0
7 1000562 21 683 94 5 0 0
8 1000562 463 683 94 5 0 0
9 1000562 465 683 94 5 0 0
10 1000562 500 683 94 5 0 0
11 1000562 600 683 94 5 0 0
12 1000562 1951 1 1 1 0 0
怎么做,简单的合并是行不通的。 prop_violations 和 mash_score 不能替代。
【问题讨论】:
标签: r dplyr data.table tidyr