【问题标题】:Summarize columns where names have a specific pattern in data.table汇总 data.table 中名称具有特定模式的列
【发布时间】:2020-09-14 08:18:24
【问题描述】:

我有一个很大的data.table,我想按组汇总列,其中列名以某种模式开头。

我感兴趣的列总是有相同的格式,即:f<X>_<Y>, m<X>_<Y>, f<X>, m<X>

这是所有可能的列名的列表:

ageColsPossible <- c("m0_9", "m10_19", "m20_29", "m30_39", "m40_49", "m50_59", "m60_69",
                   "f0_9", "f10_19", "f20_29", "f30_39", "f40_49", "f50_59", "f60_69") 

如果没有足够的可用数据,我的 data.table 将只有其中一些列。我想获得一个包含数据中可用列名的向量:

>   names(myData)
 [1] "clientID"             "policyID"             "startYear"            "product"              "NOplans"              "grp"                 
 [7] "policyid"             "personid"             "age"                  "gender"               "dependant"            "location"            
[13] "region"               "exposure"             "startMonth"           "cover_effective_date" "endexposuredate"      "fromdate"            
[19] "enddate"              "planHistSufficiency"  "productRank"          "claim10month"         "claim11month"         "claim12month"        
[25] "claim9month"          "NA20_29"              "NA30_39"              "NA40_49"              "NA50_59"              "f0_9"                
[31] "f10_19"               "f20_29"               "f30_39"               "f40_49"               "f50_59"               "f60_69"              
[37] "m0_9"                 "m10_19"               "m20_29"               "m30_39"               "m40_49"               "m50_59"              
[43] "m60_69"               "u0_9"                 "u10_19"               "u20_29"               "u30_39"               "u40_49"              
[49] "u50_59"               "u60_69"               "uNA" 

我知道regrex 并且正在考虑一些事情:regex = "(m|f)(\\d+)_?(\\d+)?",但我也在某个地方看到了patern() 函数。可惜我找不到了。

有什么想法吗?

【问题讨论】:

  • .SDcols 支持patterns(),因此您可以使用正则表达式为.SD 选择列。
  • grep("^[mf]\\d+(?:_\\d+)?$", names(myData), value=TRUE)?

标签: r string data.table


【解决方案1】:

这样的事情很可能会奏效..假设您只需要一个汇总函数? (本例中为median())...

DT[, lapply( .SD, median), by=.(group), .SDcols = patterns( "^[mf]\\d+" ) ]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-05-29
    • 1970-01-01
    • 1970-01-01
    • 2013-05-07
    • 1970-01-01
    相关资源
    最近更新 更多