【发布时间】:2024-01-19 14:30:02
【问题描述】:
大家晚上好,我想将相同的 id 行合并为一行,并添加一列,这是我的部分数据。
sample=structure(list(crsp_fundno = c(18021, 18021, 18021, 18021, 22436,
22436, 22436, 22436, 22436, 22436, 49805, 49805, 49805, 55603,
55603, 93362), seq = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 6L,
1L, 2L, 3L, 1L, 2L, 1L), begdt = structure(c(13513, 14298, 15027,
16149, 12417, 13969, 14910, 14918, 15042, 15644, 14782, 14910,
15544, 15505, 15531, 17571), class = "Date"), enddt = structure(c(14297,
15026, 16148, 17621, 13968, 14909, 14917, 15041, 15643, 17621,
14909, 15543, 17621, 15530, 17621, 17621), class = "Date"), crsp_obj_cd = c("EDYG",
"EDYG", "EDYG", "EDYG", "EDYG", "EDYG", "EDYG", "EDYG", "EDYG",
"EDYG", "EF", "EF", "EF", "EDYB", "EDYB", "M"), lipper_class = c("MLGE",
"MCCE", "MCVE", "MLCE", "MLVE", "MLVE", "MLCE", "MLVE", "MLCE",
"MLVE", "IMLC", "IMLG", "IMLC", "MTAM", "MTAC", "MATJ"), lipper_obj_cd = c("G",
"G", "G", "G", "G", "G", "G", "G", "G", "G", "IF", "IF", "IF",
"GI", "GI", "I"), lipper_asset_cd = c("EQ", "EQ", "EQ", "EQ",
"EQ", "EQ", "EQ", "EQ", "EQ", "EQ", "EQ", "EQ", "EQ", "EQ", "EQ",
"EQ")), class = "data.frame", row.names = c(NA, -16L))
我尝试将具有相同 ID 的行合并到一行中,这是我的代码。
temp=list()
dn=unique(sample$crsp_fundno)
for(i in 1:length(dn) ){
part=sample[which(sample$crsp_fundno %in% dn[i]),]
part=reshape(part,idvar='crsp_fundno',timevar='seq',direction='wide')
temp[[i]]=part
}
library(plyr)
sum=rbind.fill(temp[[1]],temp[[2]])
for (i in 3 :length(dn)){sum=rbind.fill(sum,temp[[i]])}
代码有效,但在我的整个数据中太低了(94000 obs 几乎需要 2 小时)。
我认为我不应该过度依赖大型数据集中的 for 循环。
有人知道我该如何改进代码或我的逻辑吗?
感谢您的帮助。
【问题讨论】:
-
这还不够吗?:
sum <- reshape(sample,direction = "wide",idvar = "crsp_fundno",timevar = "seq")。它给出的结果与代码末尾的sum相同! -
@VitaliAvagyan 哦.....有趣。所以我用太复杂的方法做了一件简单的事情??????,谢谢你的帮助????????
-
不客气:)。我会将其添加为答案,并请您接受并投票。
标签: r for-loop reshape rbind coding-efficiency