【问题标题】:Alternate way for prettyNum to process fasterprettyNum 更快处理的替代方法
【发布时间】:2021-12-05 10:43:09
【问题描述】:

我正在处理比下面附加的更大的数据集,我需要再次编码 double 类型的列。我尝试在一个名为encoder 的函数中使用prettyNum,但它对我的数据的运行速度非常慢。这是我尝试过的方法;

library(data.table)

set.seed(1453)

sample_data <- data.frame(a=sample(1:1000,100,replace=T),
                          b=sample(1:1000,100,replace=T),
                          c=sample(seq(1,1000,0.01),100,replace=T),
                          d=sample(seq(1,1000,0.01),100,replace=T),
                          e=sample(seq(1,1000,0.01),100,replace=T),
                          f=sample(seq(1,1000,0.01),100,replace=T),
                          g=sample(seq(1,1000,0.01),100,replace=T),
                          h=sample(seq(1,1000,0.01),100,replace=T),
                          i=sample(LETTERS,1000,replace=T),
                          j=sample(letters,1000,replace=T))
setDT(sample_data)

options(warn=-1)

double_cols <- which(sapply(sample_data,is.double))

encoder <- function(x) prettyNum(x*1e4,big.mark = '.')

sample_data[,(double_cols):=lapply(.SD,encoder),.SDcols=double_cols]

它已经有效,但我相信有一种更快的解决方案,

提前致谢。

【问题讨论】:

  • 这可以写成更短的方式:sample_data[,(double_cols):=lapply(.SD,encoder),.SDcols=is.double],但这不会让它更快

标签: r types data.table


【解决方案1】:

您可以使用format 代替prettyNum

library(data.table)

setDT(sample_data)

sample_data1 <- copy(sample_data)
sample_data2 <- copy(sample_data)

options(warn=-1)

encoder1 <- function(x) prettyNum(x*1e4,big.mark = '.')
encoder2 <- function(x) format(x*1e4,big.mark = '.', trim = TRUE)

system.time(sample_data1[,(double_cols):=lapply(.SD,encoder1),.SDcols=double_cols])

       user      system       total 
       1.27        0.01        1.26

system.time(sample_data2[,(double_cols):=lapply(.SD,encoder2),.SDcols=double_cols])

       user      system       total 
       0.08        0.00        0.08 

【讨论】:

  • 注意使用trim = TRUE 以获得相同的结果
【解决方案2】:

也许可以试试sprintf。收益似乎很大。

1.使用你的函数

  • 代码
set.seed(1453)

sample_data <- data.frame(a=sample(1:1000,100,replace=T),
                          b=sample(1:1000,100,replace=T),
                          c=sample(seq(1,1000,0.01),100,replace=T),
                          d=sample(seq(1,1000,0.01),100,replace=T),
                          e=sample(seq(1,1000,0.01),100,replace=T),
                          f=sample(seq(1,1000,0.01),100,replace=T),
                          g=sample(seq(1,1000,0.01),100,replace=T),
                          h=sample(seq(1,1000,0.01),100,replace=T),
                          i=sample(LETTERS,1000,replace=T),
                          j=sample(letters,1000,replace=T))



double_cols <- which(sapply(sample_data,is.double))

encoder <- function(x) prettyNum(x*1e4, big.mark = '.')
system.time(setDT(sample_data)[,(double_cols):=lapply(.SD,encoder),.SDcols=is.double][])
  • 输出
utilisateur     système      écoulé 
       2.75        0.00        2.86 

2。带sprintf功能

  • 代码
set.seed(1453)
sample_data <- data.frame(a=sample(1:1000,100,replace=T),
                          b=sample(1:1000,100,replace=T),
                          c=sample(seq(1,1000,0.01),100,replace=T),
                          d=sample(seq(1,1000,0.01),100,replace=T),
                          e=sample(seq(1,1000,0.01),100,replace=T),
                          f=sample(seq(1,1000,0.01),100,replace=T),
                          g=sample(seq(1,1000,0.01),100,replace=T),
                          h=sample(seq(1,1000,0.01),100,replace=T),
                          i=sample(LETTERS,1000,replace=T),
                          j=sample(letters,1000,replace=T))


double_cols <- which(sapply(sample_data,is.double))

encoder2 <- function(x) prettyNum(sprintf("%.9g", 1e4 * x), big.mark = '.')
system.time(setDT(sample_data)[,(double_cols):=lapply(.SD,encoder2),.SDcols=is.double][])
  • 输出
utilisateur     système      écoulé 
       0.16        0.00        0.16 

【讨论】:

    猜你喜欢
    • 2019-06-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-11
    • 2012-07-05
    • 2012-01-23
    • 2013-07-13
    相关资源
    最近更新 更多