有几个问题。其中之一是在tstrsplit 函数本身中,它被定义为:
function (x, ..., fill = NA, type.convert = FALSE, keep, names = FALSE)
{
if (!isTRUEorFALSE(names) && !is.character(names))
stop("'names' must be TRUE/FALSE or a character vector.")
ans = transpose(strsplit(as.character(x), ...), fill = fill,
ignore.empty = FALSE)
if (!missing(keep)) {
keep = suppressWarnings(as.integer(keep))
chk = min(keep) >= min(1L, length(ans)) & max(keep) <=
length(ans)
if (!isTRUE(chk))
stop("'keep' should contain integer values between ",
min(1L, length(ans)), " and ", length(ans),
".")
ans = ans[keep]
}
if (type.convert)
ans = lapply(ans, type.convert, as.is = TRUE)
if (isFALSE(names))
return(ans)
else if (isTRUE(names))
names = paste0("V", seq_along(ans))
if (length(names) != length(ans)) {
str = if (missing(keep))
"ans"
else "keep"
stop("length(names) (= ", length(names), ") is not equal to length(",
str, ") (= ", length(ans), ").")
}
setattr(ans, "names", names)
ans
}
<bytecode: 0x0000019bffd6da98>
<environment: namespace:data.table>
需要注意的重要一点是if 块检查您的keep 是否适合返回大小。在您的示例中,您有第一行返回 NA。这在您的硬编码示例中起作用的原因是 strsplit 是矢量化的,因此 NA 行与工作的行同时运行,因此不会触发此 if 块。您可以通过将 4 更改为 40 来尝试此操作,您将收到此消息 Error in tstrsplit(ValueId, "-", fixed = TRUE, keep = 40) : 'keep' should contain integer values between 1 and 9.,因为在这种情况下没有任何效果。
所以你需要做的是重新定义 tstrsplit 函数,让它返回 NA 而不是停止
tstrsplitNA<-function (x, ..., fill = NA, type.convert = FALSE, keep)
{
ans = transpose(strsplit(as.character(x), ...), fill = fill,
ignore.empty = FALSE)
if (!missing(keep)) {
keep = suppressWarnings(as.integer(keep))
chk = min(keep) >= min(1L, length(ans)) & max(keep) <=
length(ans)
if (!isTRUE(chk))
ans<-NA_character_
ans = ans[keep]
}
if (type.convert)
ans = lapply(ans, type.convert, as.is = TRUE)
return(ans)
ans
}
这还不够,因为strsplit 是矢量化的,所以foo[, newvar := tstrsplitNA(ValueId, split="-", fixed = TRUE, keep = Level)] 不仅仅是逐行运行该函数,而是将整个ValueId 列提供给strsplit,然后将其转置相对于你想要的返回乱码。
您可以告诉 data.table 逐行执行操作,只需将by 参数与Level 和ValueId 一起使用即可
foo[, newvar := tstrsplitNA(ValueId, split="-", fixed = TRUE, keep = Level), by=c('Level','ValueId')]
foo
Level ValueId newvar
1: 2 11983:1055521 NA
2: 2 11983:1055521-5168:290668-198:100798 5168:290668
3: 3 11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771-5162:290728-5166:290620 198:100798
4: 4 11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771 92:91604
5: 3 11983:1055521-5168:290676-198:100794-92:91781-139:95090-135:95353 198:100794