在 readr 中使用 col_type() 时选择列的子集答案

【问题标题】：select subset of columns when using col_type() in readr在 readr 中使用 col_type() 时选择列的子集
【发布时间】：2021-10-06 18:13:43
【问题描述】：

我正在尝试使用 read_delim() 读取文件并选择列的子集（长期运行）以定义为特定类型。

例如，我有一个包含 6 列的文件。我想选择第 1 列（“名称”）作为字符，然后选择第 2-6 列作为整数。我可以通过手动指定列名来做到这一点：

df <- read_delim(file = "data.txt", col_type = list(name = col_character(), id_1 = col_integer(), id_2 = col_integer(), id_3 = col_integer(), id_4 = col_integer(), id_5 = col_integer()), delim = " ")

但是我的数据有 100 列，我想选择列的子集/运行而不手动写出它们。

我试过了：

df <- read_delim(file = "data.txt", col_type = list(name = col_character(), id_1:id_5 = col_integer()), delim = " ")

和

df <- read_delim(file = "data.txt", col_type = list(name = col_character(), select('id_1':'id_5') = col_integer()), delim = " ")

但我得到一个错误：

Error: unexpected '=' in:
"col_type = list(name = col_character(), select('id_1':'id_5') ="

我确信这很简单，但我已经花了好几个小时试图解决它！

【问题讨论】：

标签： r readr

【解决方案1】：

一种选择是使用setNames 传递一个命名的list

df <- read_delim(file = "data.txt", 
     col_type = setNames( c(list(col_character()),  
           rep(list(col_integer()), 5)),
             c("name", paste0("id_", 1:5))), delim = " ")

【讨论】：

太棒了 - 谢谢！
这可以适应在命名解析器中使用正则表达式吗？例如，如果列名都以“id_”开头，但后面是复杂的代码。我试过了：col_type = setNames( c(list(col_character()), rep(list(col_integer()), 5)), c("name", rep(regex("id_"), 5))), delim = " ")，但是得到了命名解析器与列名不匹配的错误。