【发布时间】:2013-06-12 13:56:10
【问题描述】:
我需要分配一个“第二个”ID 来对我原来的 id 中的一些值进行分组。这是我的示例数据:
dt<-structure(list(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date"))),
class = c("data.table", "data.frame"),
.Names = c("id", "period", "date"),
sorted = "id")
> dt
id period date
1: aaaa start 2012-03-02
2: aaaa end 2012-03-05
3: aaas start 2012-08-21
4: aaas end 2013-02-25
5: bbbb start 2012-03-31
6: bbbb end 2013-02-11
需要根据此列表对列id 进行分组(在id2 中使用相同的值):
> groups
[[1]]
[1] "aaaa" "aaas"
[[2]]
[1] "bbbb"
我使用了以下代码,它似乎通过给出以下warning:
> dt[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
Warning message:
In `[.data.table`(dt, , `:=`(id2, which(vapply(groups, function(x, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this data.table has
been copied by R (or been created manually using structure() or similar). Avoid key<-,
names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use
set* syntax instead to avoid copying: setkey(), setnames() and setattr(). Also,
list (DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects),
use reflist() instead if needed (to be implemented). If this message doesn't help,
please report to datatable-help so the root cause can be fixed.
> dt
id period date id2
1: aaaa start 2012-03-02 1
2: aaaa end 2012-03-02 1
3: aaas start 2012-08-29 1
4: aaas end 2013-02-26 1
5: bbbb start 2012-03-31 2
6: bbbb end 2013-02-11 2
有人可以简要解释此警告的性质以及对最终结果的任何最终影响(如果有的话)吗?谢谢
编辑:
以下代码实际上显示了 dt 的创建时间以及如何将其传递给给出警告的函数:
f.main <- function(){
f2 <- function(x){
groups <- list(c("aaaa", "aaas"), "bbbb") # actually generated depending on the similarity between values of x$id
x <- x[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
return(x)
}
x <- f1()
if(!is.null(x[["res"]])){
x <- f2(x[["res"]])
return(x)
} else {
# something else
}
}
f1 <- function(){
dt<-data.table(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date")))
return(list(res=dt, other_results=""))
}
> f.main()
id period date id2
1: aaaa start 2012-03-02 1
2: aaaa end 2012-03-02 1
3: aaas start 2012-08-29 1
4: aaas end 2013-02-26 1
5: bbbb start 2012-03-31 2
6: bbbb end 2013-02-11 2
Warning message:
In `[.data.table`(x, , `:=`(id2, which(vapply(groups, function(x, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this data.table
has been copied by R (or been created manually using structure() or similar).
Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr().
Also, list(DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects),
use reflist() instead if needed (to be implemented). If this message doesn't help,
please report to datatable-help so the root cause can be fixed.
【问题讨论】:
-
警告说:“使用结构()或类似的手动创建”。使用函数
data.table创建您的data.table。但是,这只是一个警告,您不应该遇到重大问题(除了较慢的性能)。此外,您可以将.BY[[1]]替换为id。 -
@Roland 感谢您的回复,但在实际情况下,该表不是通过
structure装箱的。这只是print(dput(x))的(修改后的)输出,我用来知道我的程序中的表格发生了什么。仔细检查一下,dt是通过函数中的data.table生成的,返回()到主函数,主函数将其作为参数传递给另一个函数,这里发生了warning -
好吧,让你的代码代表你真正的问题。向我们展示如何在函数之间传递 DT。
-
@Roland 完成。请查看编辑
-
fwiw,实现上述目的的更短的表达式是
dt[melt(groups)],使用reshape2::melt
标签: r data.table