【问题标题】:data.table: How do I pass a character vector to a function get data.table to treat its contents as column names?data.table:如何将字符向量传递给函数 get data.table 以将其内容视为列名?
【发布时间】:2025-12-14 05:00:02
【问题描述】:

这是一个data.table:

library(data.table)
DT <- data.table(airquality)

这个例子产生了我想要的输出:

DT[, `:=`(New_Ozone= log(Ozone), New_Wind=log(Wind))]

如何编写函数log_those_columns 使以下代码 sn-p 输出相同的结果?

old_names <- c("Ozone", "Wind")
new_names <- c("New_Ozone", "New_Wind")
log_those_columns(DT, old_names, new_names)

请注意,我需要 old_namesnew_names 足够灵活以包含任意数量的列。

(我从有关此主题的类似 * 问题中看到,答案可能涉及.SDwith=Fparse()eval() 和/或substitute() 的某种组合,但我不能似乎确定了使用哪些以及在哪里使用)。

【问题讨论】:

    标签: r function data.table


    【解决方案1】:

    MichaelChirico's comment,函数定义可以写成:

    log_those_columns <- function(DT, cols_in, cols_new) {
      DT[, (cols_new) := lapply(.SD, log), .SDcols = cols_in]
    }
    

    返回:

    log_those_columns(DT, old_names, new_names)
    DT
    
         Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
      1:    41     190  7.4   67     5   1  3.713572 2.001480
      2:    36     118  8.0   72     5   2  3.583519 2.079442
      3:    12     149 12.6   74     5   3  2.484907 2.533697
      4:    18     313 11.5   62     5   4  2.890372 2.442347
      5:    NA      NA 14.3   56     5   5        NA 2.660260
     ---                                                     
    149:    30     193  6.9   70     9  26  3.401197 1.931521
    150:    NA     145 13.2   77     9  27        NA 2.580217
    151:    14     191 14.3   75     9  28  2.639057 2.660260
    152:    18     131  8.0   76     9  29  2.890372 2.079442
    153:    20     223 11.5   68     9  30  2.995732 2.442347
    

    正如预期的那样。

    更灵活的方法

    用于转换数据的函数也可以作为参数传递:

    fct_those_columns <- function(DT, cols_in, cols_new, fct) {
      DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
    }
    

    电话:

    fct_those_columns(DT, old_names, new_names, log)
    head(DT)
    

    按预期工作:

       Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
    1:    41     190  7.4   67     5   1  3.713572 2.001480
    2:    36     118  8.0   72     5   2  3.583519 2.079442
    3:    12     149 12.6   74     5   3  2.484907 2.533697
    4:    18     313 11.5   62     5   4  2.890372 2.442347
    5:    NA      NA 14.3   56     5   5        NA 2.660260
    6:    28      NA 14.9   66     5   6  3.332205 2.701361
    

    函数名可以作为字符传递:

    fct_those_columns(DT, old_names, new_names, "sqrt")
    head(DT)
    
       Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
    1:    41     190  7.4   67     5   1  6.403124 2.720294
    2:    36     118  8.0   72     5   2  6.000000 2.828427
    3:    12     149 12.6   74     5   3  3.464102 3.549648
    4:    18     313 11.5   62     5   4  4.242641 3.391165
    5:    NA      NA 14.3   56     5   5        NA 3.781534
    6:    28      NA 14.9   66     5   6  5.291503 3.860052
    

    或作为匿名函数:

    fct_those_columns(DT, old_names, new_names, function(x) x^(1/2))
    head(DT)
    
       Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
    1:    41     190  7.4   67     5   1  6.403124 2.720294
    2:    36     118  8.0   72     5   2  6.000000 2.828427
    3:    12     149 12.6   74     5   3  3.464102 3.549648
    4:    18     313 11.5   62     5   4  4.242641 3.391165
    5:    NA      NA 14.3   56     5   5        NA 3.781534
    6:    28      NA 14.9   66     5   6  5.291503 3.860052
    

    更灵活的方法

    下面的函数通过自动在输入列的名称前面加上函数的名称来派生新列的名称:

    fct_those_columns <- function(DT, cols_in, fct) {
      fct_name <- substitute(fct)
      cols_new <- paste(if (class(fct_name) == "name") fct_name else fct_name[3], cols_in, sep = "_")
      DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
    }
    
    DT <- data.table(airquality)
    fct_those_columns(DT, old_names, sqrt)
    fct_those_columns(DT, old_names, data.table::as.IDate)
    fct_those_columns(DT, old_names, function(x) x^(1/2))
    DT
    
         Ozone Solar.R Wind Temp Month Day sqrt_Ozone sqrt_Wind as.IDate_Ozone as.IDate_Wind x^(1/2)_Ozone x^(1/2)_Wind
      1:    41     190  7.4   67     5   1   6.403124  2.720294     1970-02-11    1970-01-08      6.403124     2.720294
      2:    36     118  8.0   72     5   2   6.000000  2.828427     1970-02-06    1970-01-09      6.000000     2.828427
      3:    12     149 12.6   74     5   3   3.464102  3.549648     1970-01-13    1970-01-13      3.464102     3.549648
      4:    18     313 11.5   62     5   4   4.242641  3.391165     1970-01-19    1970-01-12      4.242641     3.391165
      5:    NA      NA 14.3   56     5   5         NA  3.781534           <NA>    1970-01-15            NA     3.781534
     ---                                                                                                               
    149:    30     193  6.9   70     9  26   5.477226  2.626785     1970-01-31    1970-01-07      5.477226     2.626785
    150:    NA     145 13.2   77     9  27         NA  3.633180           <NA>    1970-01-14            NA     3.633180
    151:    14     191 14.3   75     9  28   3.741657  3.781534     1970-01-15    1970-01-15      3.741657     3.781534
    152:    18     131  8.0   76     9  29   4.242641  2.828427     1970-01-19    1970-01-09      4.242641     2.828427
    153:    20     223 11.5   68     9  30   4.472136  3.391165     1970-01-21    1970-01-12      4.472136     3.391165
    

    请注意,x^(1/2)_Ozone 在 R 中不是语法上有效的名称,需要放在反引号中:

    DT$`x^(1/2)_Ozone`
    

    【讨论】:

      【解决方案2】:

      你只需要写一个函数:

      log_those_columns <- function(D,old_names,new_names) 
      DT[,(new_names) := lapply(mget(old_names),log)]
      log_those_columns(DT,old_names,new_names)
      DT
           Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
        1:    41     190  7.4   67     5   1  3.713572 2.001480
        2:    36     118  8.0   72     5   2  3.583519 2.079442
        3:    12     149 12.6   74     5   3  2.484907 2.533697
        4:    18     313 11.5   62     5   4  2.890372 2.442347
        5:    NA      NA 14.3   56     5   5        NA 2.660260
       ---                                                     
      

      【讨论】:

      • 我会在这里使用.SDcols,而不是mget