【发布时间】:2020-09-14 08:18:24
【问题描述】:
我有一个很大的data.table,我想按组汇总列,其中列名以某种模式开头。
我感兴趣的列总是有相同的格式,即:f<X>_<Y>, m<X>_<Y>, f<X>, m<X>。
这是所有可能的列名的列表:
ageColsPossible <- c("m0_9", "m10_19", "m20_29", "m30_39", "m40_49", "m50_59", "m60_69",
"f0_9", "f10_19", "f20_29", "f30_39", "f40_49", "f50_59", "f60_69")
如果没有足够的可用数据,我的 data.table 将只有其中一些列。我想获得一个包含数据中可用列名的向量:
> names(myData)
[1] "clientID" "policyID" "startYear" "product" "NOplans" "grp"
[7] "policyid" "personid" "age" "gender" "dependant" "location"
[13] "region" "exposure" "startMonth" "cover_effective_date" "endexposuredate" "fromdate"
[19] "enddate" "planHistSufficiency" "productRank" "claim10month" "claim11month" "claim12month"
[25] "claim9month" "NA20_29" "NA30_39" "NA40_49" "NA50_59" "f0_9"
[31] "f10_19" "f20_29" "f30_39" "f40_49" "f50_59" "f60_69"
[37] "m0_9" "m10_19" "m20_29" "m30_39" "m40_49" "m50_59"
[43] "m60_69" "u0_9" "u10_19" "u20_29" "u30_39" "u40_49"
[49] "u50_59" "u60_69" "uNA"
我知道regrex 并且正在考虑一些事情:regex = "(m|f)(\\d+)_?(\\d+)?",但我也在某个地方看到了patern() 函数。可惜我找不到了。
有什么想法吗?
【问题讨论】:
-
.SDcols支持patterns(),因此您可以使用正则表达式为.SD选择列。 -
grep("^[mf]\\d+(?:_\\d+)?$", names(myData), value=TRUE)?
标签: r string data.table