【发布时间】:2018-07-26 17:49:22
【问题描述】:
背景:这在交换优化算法中运行。这条特定的行在内部 while 循环中运行,因此它被执行了很多次。循环中的其他所有内容都运行得非常快。
以下创建的示例 data.table “Inventory_test”:
NestCount2 <- c(
"1","1","1","1","1","1","1","1","2","2","3","3","3","3","3","3",
"3","3","3","4","4","4","5","5","5","5","5","5","5","5","5","6",
"6","6","6","6","6","6","6","6","",""
)
Part2 <- c(
"Shroud","Shroud","Shroud","Shroud","Shroud","Shroud","Shroud",
"Shroud","S1Nozzle","S1Nozzle","Shroud","Shroud","Shroud","Shroud",
"Shroud","Shroud","Shroud","Shroud","Shroud","S2Nozzle","S2Nozzle",
"S2Nozzle","Shroud","Shroud","Shroud","Shroud","Shroud","Shroud",
"Shroud","Shroud","Shroud","Shroud","Shroud","Shroud","Shroud",
"Shroud","Shroud","Shroud","Shroud","Shroud","*","*"
)
Inventory_test <- data.table(data.frame(NestCount2,Part2))
# Methods already tried (have basically exact same performance using profiler):
ptcts <- table(unique(Inventory_test[,c("Part2","NestCount2")])$Part2)
ptcts2 <- Inventory_test[, .(count = uniqueN(NestCount2)), by = Part2]$count
我注意到(使用 Rstudio 分析器)ptcts 行的大约一半时间只是索引Inventory_test[,c("Part2","NestCount2")] 的列。我一直在寻找更快的方法,但没有找到任何方法:(。任何帮助将不胜感激!
【问题讨论】:
-
这可能对性能无关紧要,但为了理智,可能想要使用
data.table(NestCount2,Part2)而不是data.table(data.frame(NestCount2,Part2))。为了速度,也许...Inventory_test[, .N, by=.(Part2, NestCount2)][, .N, by=Part2]? -
数据表上还有
setkey。 -
谢谢!我会研究一下setkey。只是为了澄清一下,虽然代码中唯一的一行是“ptcts”,但上面的一切只是为了给这里的人们一个示例 dt 来玩。
-
也许
Inventory_test[, uniqueN(NestCount2), by = Part2]$V1定义ptcts2,跳过=和.()似乎稍微加快了速度。 -
通过跳过
[]Inventory_test[, uniqueN(paste(Part2, NestCount2)), by=Part2]forptcts可能会提高一些边际速度
标签: r data.table