使用 mutate() 和 cross() 创建新列答案

【问题标题】：Creating new columns with mutate() and across()使用 mutate() 和 cross() 创建新列
【发布时间】：2021-06-14 16:08:25
【问题描述】：

这是我正在处理的实际问题的简化版本。在此示例中，我将使用四列，而实际问题需要使用大约 20-30 列。

考虑iris 数据集。假设出于某种原因，我想追加等于 .Length 和 .Width 列的两倍的新列。使用以下代码，这将更改现有列：

library(dplyr)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

df_iris <- iris %>% mutate(across(matches("(\\.)(Length|Width)"), 
                                  function(x) { x * 2 }))
head(df_iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1         10.2         7.0          2.8         0.4  setosa
2          9.8         6.0          2.8         0.4  setosa
3          9.4         6.4          2.6         0.4  setosa
4          9.2         6.2          3.0         0.4  setosa
5         10.0         7.2          2.8         0.4  setosa
6         10.8         7.8          3.4         0.8  setosa

但是，而不是，我希望这个双倍计算创建 NEW 列，例如 .Length.2 和 .Width.2。一种可以做到这一点的方法如下：

double <- function(x) {
  x * 2
}

df_iris <- iris %>%
  mutate(Sepal.Length.2 = double(Sepal.Length),
         Sepal.Width.2 = double(Sepal.Width),
         Petal.Length.2 = double(Petal.Length),
         Petal.Width.2 = double(Petal.Width))

head(df_iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.2 Sepal.Width.2 Petal.Length.2 Petal.Width.2
1          5.1         3.5          1.4         0.2  setosa           10.2           7.0            2.8           0.4
2          4.9         3.0          1.4         0.2  setosa            9.8           6.0            2.8           0.4
3          4.7         3.2          1.3         0.2  setosa            9.4           6.4            2.6           0.4
4          4.6         3.1          1.5         0.2  setosa            9.2           6.2            3.0           0.4
5          5.0         3.6          1.4         0.2  setosa           10.0           7.2            2.8           0.4
6          5.4         3.9          1.7         0.4  setosa           10.8           7.8            3.4           0.8

有没有办法在dplyr 中做到这一点而无需：

依赖于被取代/弃用的函数？
必须手动指定每个列名？

【问题讨论】：

标签： r dplyr

【解决方案1】：

我们可以使用across（使用dplyr1.0.6版本）

library(dplyr)
df_iris <- iris %>%
    mutate(across(where(is.numeric), double, .names = '{.col}.2'))

-输出

head(df_iris, 3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.2 Sepal.Width.2 Petal.Length.2 Petal.Width.2
1          5.1         3.5          1.4         0.2  setosa           10.2           7.0            2.8           0.4
2          4.9         3.0          1.4         0.2  setosa            9.8           6.0            2.8           0.4
3          4.7         3.2          1.3         0.2  setosa            9.4           6.4            2.6           0.4

【讨论】：

我是否正确解释了仅在提供.names 参数时才会创建新列的代码 - 如果未提供.names，mutate() 将覆盖当前内容?
@Clarinetist 是的。如果未指定.names，它将更新across 中指定的列（这里我猜您想在数字列上执行此操作）。使用.names，{.col} 是原始列名，然后添加后缀.2 以修改该列名以创建新