【发布时间】:2021-10-11 00:57:24
【问题描述】:
我有一个特殊的用例,我需要经常在 R 中的键控 data.table 对象中设置单行的值。目前我正在使用:= 表示法,但请阅读那里的帮助页面在某些情况下set() 可以更快。
对于键控 data.tables 是否如此?或者有没有办法将set() 与键控data.tables 一起使用?我想我不确定引擎盖下发生了什么。
library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.2
mt <- as.data.table(mtcars, keep.rownames = TRUE)
setkey(mt, rn)
head(mt)
#> rn mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
#> 2: Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#> 3: Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#> 4: Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#> 5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 6: Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
mt["AMC Javelin", mpg := -10] # want to do this, but faster?
head(mt)
#> rn mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: AMC Javelin -10.0 8 304 150 3.15 3.435 17.30 0 0 3 2
#> 2: Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#> 3: Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
#> 4: Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
#> 5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 6: Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
set(mt, "AMC Javelin", 2L, -10) # this doesn't work
#> Error in set(mt, "AMC Javelin", 2L, -10): i is type 'character'. Must be integer, or numeric is coerced with warning. If i is a logical subset, simply wrap with which(), and take the which() outside the loop if possible for efficiency.
set(mt, 1L, 2L, -10) # this would work if I could get the row number of a given key...
由reprex package (v0.3.0) 于 2021-08-06 创建
更新:Ronak Shah 和 sindri_baldur 的回答和评论非常适合我提出的问题(请参阅下面的基准测试)。不幸的是,我认为我的简单示例与我的实际用例不匹配。就我而言,有多个键控列,因此 match 和 chmatch 不起作用。是否有适用于具有多个键列的 data.tables 的解决方案?
library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.2
library(microbenchmark)
# Original question
mt <- as.data.table(mtcars, keep.rownames = TRUE)
setkey(mt, rn)
key <- "AMC Javenlin"
microbenchmark(
mt[key, mpg := -10],
set(mt, 1L, 2L, -10),
set(mt, match(key, mt$rn), 2L, -10),
set(mt, chmatch(key, mt$rn), 2L, -10)
)
#> Unit: microseconds
#> expr min lq mean median
#> mt[key, `:=`(mpg, -10)] 490.129 568.7480 746.67525 619.0085
#> set(mt, 1L, 2L, -10) 1.597 1.8980 4.17609 2.8475
#> set(mt, match(key, mt$rn), 2L, -10) 3.104 3.7130 6.60660 4.9275
#> set(mt, chmatch(key, mt$rn), 2L, -10) 2.740 3.3025 5.27118 4.3200
#> uq max neval cld
#> 701.094 8996.071 100 b
#> 4.298 87.451 100 a
#> 7.726 45.807 100 a
#> 7.002 11.811 100 a
我的情况更接近这个,有多个键...
dt <- CJ(a = 1:10, b = 1:10, c = 1:60)
setkey(dt)
dt$d <- NA
key <- list(a = 2, b = 7, c = 35)
microbenchmark(
{ dt[key, d := 1] },
{ set(dt, 1L, 4L, 1)}
)
#> Unit: microseconds
#> expr min lq mean median uq
#> { dt[key, `:=`(d, 1)] } 634.125 666.5825 768.59937 756.9030 819.7585
#> { set(dt, 1L, 4L, 1) } 2.019 2.5355 3.95986 3.9325 4.6590
#> max neval cld
#> 1171.794 100 b
#> 22.945 100 a
match(key, dt[, .(a, b, c)]) # doesn't work
#> [1] NA NA NA
chmatch(key, dt[, .(a, b, c)]) # doesn't work
#> Error in chmatch(key, dt[, .(a, b, c)]): table is type 'list' (must be 'character' or NULL)
由reprex package (v0.3.0) 于 2021-08-06 创建
【问题讨论】:
-
这里匹配的类似物是
dt[key, which=TRUE],fwiw。如果您必须一次执行一行,这将比您将一堆行捆绑在一起并立即更新要慢得多(它们可以有不同的d值,只需将 abc -> d 映射全部存储在一张桌子)。
标签: r data.table