R：按名称范围索引数据框列答案

【问题标题】：R: Index data frame columns by ranges of their namesR：按名称范围索引数据框列
【发布时间】：2015-02-16 22:19:16
【问题描述】：

我有大量巨大的数据框。通常在这些数据框中，我有一组名称相似的列按顺序出现。以下是此类数据框的简化版本：

> tmp <- data.frame(ID = 1:25,
    Item1 = sample(x = 1:4, size = 25, replace = TRUE),
    Item2 = sample(x = 1:4, size = 25, replace = TRUE),
    Item3 = sample(x = 1:4, size = 25, replace = TRUE),
    Item4 = sample(x = 1:4, size = 25, replace = TRUE),
    Item5 = sample(x = 1:4, size = 25, replace = TRUE),
    Item6 = sample(x = 1:4, size = 25, replace = TRUE),
    Item7 = sample(x = 1:4, size = 25, replace = TRUE),
    Quest = rep(x = 20, times = 25))

我需要找到一种方法来按它们的名称范围索引这些列，而不是按它们的位置。假设我需要索引从Item4 到Item7 的列。我可以做到以下几点：

> tmp[ , c("Item4", "Item5", "Item6", "Item7")]

当您有数百个名称相似的列时，这不是很好。我想做类似的事情：

> tmp[ , c("Item4":"Item7")]

但是它会抛出一个错误：

Error in "Item1":"Item7" : NA/NaN argument
In addition: Warning messages:
1: In `[.data.frame`(tmp, , c("Item1":"Item7")) :
  NAs introduced by coercion
2: In `[.data.frame`(tmp, , c("Item1":"Item7")) :
  NAs introduced by coercion

此外，我想使用这种索引来操纵列的属性，例如（使用前一种方法列出所有列名）

> labels.Item4to7 <- c("Disagree", "Somewhat disagree",
  "Somewhat agree", "Agree")
> tmp[ , c("Item4", "Item5", "Item6", "Item7")] <- lapply(tmp[ , c("Item4",
  "Item5", "Item6", "Item7")], factor, labels = labels.Item4to7)

但是将列名范围定义为Item4:Item7。

提前谢谢你。

【问题讨论】：

你的第二个问题是什么意思 - 你想重命名这些列吗？您可以将希望在子集中包含的列名存储在 cols <- paste0("Item", 4:7) 中，并使用 tmp[, cols] 作为快捷方式。
@lukeA：不，不是要重命名它们，而是要改变它们的属性。您建议的索引类型对于我正在考虑的其他情况也很有用。谢谢！

标签： r indexing dataframe range columnname

【解决方案1】：

使用哪个函数

tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")]

将项目 4 的值更改为 7 可以通过以下方式实现：

labels.Item4to7 <- c("Disagree", "Somewhat disagree",
  "Somewhat agree", "Agree")
tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")]<-
   lapply(tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")],
   factor,labels=labels.Item4to7)

【讨论】：

正是我需要的！非常感谢！

【解决方案2】：

你可以使用paste：

tmp[, paste0("Item", 4:7)]

【讨论】：

对于问题的第二部分，我认为类似于cols <- paste0("Item", 4:7); tmp[cols] <- lapply(tmp[cols], factor, labels=labels.Item4to7)。
@lukeA & thelatemail：是的，这些工作，虽然我想避免使用数字索引。对我经常遇到的其他情况还是很有用的，非常感谢！