在 dplyr 的 for 循环中使用向量中的列名答案

【问题标题】：Use column names from vector in for loop in dplyr在 dplyr 的 for 循环中使用向量中的列名
【发布时间】：2017-11-21 10:15:08
【问题描述】：

这应该很简单，但我正在努力让它工作。我目前有一个列名向量：

columns <- c('product1', 'product2', 'product3', 'support4')

我现在想在 for 循环中使用 dplyr 来改变某些列，但我正在努力让它认识到它是一个列名，而不是一个变量。

for (col in columns) {
  cross.sell.val <- cross.sell.val %>%
    dplyr::mutate(col = ifelse(col == 6, 6, col)) %>%
    dplyr::mutate(col = ifelse(col == 5, 6, col))
}

我可以在这些情况下使用 %>% 吗？谢谢..

【问题讨论】：

您能解释一下您的最终目标吗？
为了stackoverflow的目的，这里的for循环被缩短了，但我本质上有一个很大的数据框，其中某些列（向量）需要修改（有很多规则，上面的那些被简化了），取决于每个单元格中的值。
不，我只是想知道为什么您需要遍历列然后检测该特定列，因为您在迭代时已经找到了该列。

标签： r dplyr

【解决方案1】：

您应该可以在完全不使用for 循环的情况下做到这一点。

由于您没有提供任何数据，我将使用内置的iris 数据集。它的顶部看起来像：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

首先，我保存要分析的列：

columns <- names(iris)[1:4]

然后，对每一列使用mutate_at 以及该特定规则。在每个中，. 代表每列的向量。您的示例暗示每列的规则是相同的，但如果不是这种情况，您可能需要更多的灵活性。

mod_iris <-
  iris %>%
  mutate_at(columns, funs(ifelse(. > 5, 6, .))) %>%
  mutate_at(columns, funs(ifelse(. < 1, 1, .)))

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          6.0         3.5          1.4           1  setosa
2          4.9         3.0          1.4           1  setosa
3          4.7         3.2          1.3           1  setosa
4          4.6         3.1          1.5           1  setosa
5          5.0         3.6          1.4           1  setosa
6          6.0         3.9          1.7           1  setosa

如果您愿意，您可以改为编写一个函数来对列进行所有更改。这也可以让您为每列设置不同的截止值。例如，您可能希望将数据的底部和顶部设置为等于该阈值（出于某种原因控制异常值），或者您可能知道每个变量都使用虚拟值作为占位符（并且该值列不同，但始终是最常见的值）。通过这种方式，您可以轻松添加任何感兴趣的任意规则，并且与将单独的规则链接在一起相比，它为您提供了更多的灵活性（例如，如果您使用平均值，则当您更改某些值时，平均值会发生变化）。

一个示例函数：

modColumns <- function(x){
  botThresh <- quantile(x, 0.25)
  topThresh <- quantile(x, 0.75)

  dummyVal <- as.numeric(names(sort(table(x)))[1])
  dummyReplace <- NA

  x <- ifelse(x < botThresh, botThresh, x)
  x <- ifelse(x > topThresh, topThresh, x)
  x <- ifelse(x == dummyVal, dummyReplace, x)

  return(x)
}

并在使用中：

iris %>%
  mutate_at(columns, modColumns) %>%
  head

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.3          1.6         0.3  setosa
2          5.1         3.0          1.6         0.3  setosa
3          5.1         3.2          1.6         0.3  setosa
4          5.1         3.1          1.6         0.3  setosa
5          5.1         3.3          1.6         0.3  setosa
6          5.4         3.3          1.7         0.4  setosa

【讨论】：