如何从具有相同前缀的.y中减去多个.x答案

【问题标题】：How to substract multiple .x from .y with same prefixes如何从具有相同前缀的.y中减去多个.x
【发布时间】：2021-10-02 18:23:38
【问题描述】：

我有这个小标题：

# A tibble: 2 x 8
    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y
  <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2
2    17    11     0     0    16     2     0     0

df <- structure(list(a.x = c(13L, 17L), b.x = c(13L, 11L), c.x = c(12L, 
0L), d.x = c(11L, 0L), a.y = c(7L, 16L), b.y = 1:2, c.y = c(4L, 
0L), d.y = c(2L, 0L)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

我要计算：a.x - a.y、b.x - b.y、c.x - c.y，等等......

我想要的输出：

    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

我可以通过以下方式实现：

df %>% 
    mutate(a = a.x-a.y,
           b = b.x-b.y,
           c = c.x-c.y,
           d = d.x-d.y)

我想学习：

如何提取新列名的前缀。
自动计算.x - .y。

【问题讨论】：

标签： r dplyr difference

【解决方案1】：

使用cur_column 的一种方法 - 循环遍历ends_with .x 的列，通过将“x”更改为“y”来替换列名（cur_column()）中的子字符串，get 的值，减去并更改.names中的列名

library(dplyr)
library(stringr)
df %>% 
   mutate(across(ends_with('.x'),
     ~ . - get(str_replace(cur_column(), 'x', 'y')), 
         .names = "{str_remove(.col, fixed('.x'))}"))

-输出

# A tibble: 2 x 12
    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

或通过pivot_longer进行整形

library(tidyr)
df %>%
     mutate(rn = row_number()) %>%
     pivot_longer(cols = -rn, names_to = c(".value"), 
          names_pattern = "(.)\\..*") %>% 
     group_by(rn) %>% 
     summarise(across(everything(), ~ -diff(.))) %>%
     select(-rn) %>%
     bind_cols(df, .)
# A tibble: 2 x 12
 a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

【讨论】：

@TarJae 它确实有效。我记得你可能已经测试过tidymodels 并且那些包函数可能已经掩盖了它。我尝试了仅使用这些包的新 R 会话，fixed 工作正常

【解决方案2】：

适合您的基本 R 方法：

cbind(df, mapply(\(x, y) x - y, df[endsWith(names(df), ".x")],
                 df[endsWith(names(df), ".y")]) |>
        as.data.frame() |>
        setNames(letters[seq_len(ncol(df)/2)]))

  a.x b.x c.x d.x a.y b.y c.y d.y a  b c d
1  13  13  12  11   7   1   4   2 6 12 8 9
2  17  11   0   0  16   2   0   0 1  9 0 0

类似tidyverse的解决方案：

library(dplyr)
library(purrr)

df %>%
  bind_cols(
    map2_df(".x", ".y", ~ df[grepl(.x, names(df))] - df[grepl(.y, names(df))]) %>%
      rename_with(~ gsub(".x", "", .), everything())
  )

亲爱的@Henrik建议的一个非常简单和紧凑的方法

cbind(df, setNames(df[endsWith(names(df), ".x")] - df[endsWith(names(df), ".y")], 
                   sub("\\..*","", names(df[endsWith(names(df), ".x")]))))

【讨论】：

谢谢 Anoushiravan。 map 在这种情况下会有所帮助吗？ :-)
是的，这种方式可能与map2 一起使用。虽然我想不是最好的选择。 Arun 的第一个解决方案将是我毫不犹豫的首选。事实上，我是从他那里学到的，哈哈。
简约优雅

【解决方案3】：

另一种方法是：

df %>%
  mutate(across(ends_with('x'),  .names = "{str_remove(.col, '.x')}")
         - across(ends_with('y')))
# A tibble: 2 x 12
    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

在基础 R 中，您可以使用 split.default:

a <- do.call('-', split.default(df, sub('.', '', names(df)))) 
cbind(df, setNames(a, sub('..$', '', names(a))))
  a.x b.x c.x d.x a.y b.y c.y d.y a  b c d
1  13  13  12  11   7   1   4   2 6 12 8 9
2  17  11   0   0  16   2   0   0 1  9 0 0

【讨论】：

这也很棒。跨越——跨越。谢谢！
精彩的课程 Onyambu。已经投票了:)
Onyambu，我在这个stackoverflow.com/q/68564367/2884859 等待你的接近，请看你的时间

【解决方案4】：

我在 github {dplyover} 上有一个用于此类操作的包。我们可以使用dplyover::across2 进行计算。如果我们在.names参数中指定"{pre}"，我们可以提取每对变量的公共前缀。

常规 {dplyr} 解决方案的主要优点是我们不一定需要具有相似名称的列。缺点是across2 的性能不如dplyr::across。

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover


df %>%
  mutate(across2(ends_with(".x"),
                 ends_with(".y"),
                 ~ .x - .y,
                 .names = "{pre}"))

#> # A tibble: 2 x 12
#>     a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
#>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1    13    13    12    11     7     1     4     2     6    12     8     9
#> 2    17    11     0     0    16     2     0     0     1     9     0     0

^{由reprex package (v0.3.0) 于 2021-07-26 创建}

【讨论】：

非常有趣。感谢您分享此信息。我会下载并试用你的包！