如何根据另一列计算一列中的值之间的差异？答案

【问题标题】：How to calculate the difference between values in one column based on another column?如何根据另一列计算一列中的值之间的差异？
【发布时间】：2021-09-27 20:58:18
【问题描述】：

我正在尝试计算时间点 C1 和 C0 的丰度差异。我想对不同的基因做这个，所以我对基因使用了 group_by，但不知道如何找到不同时间点的丰度差异。

这是我的尝试之一：


IgH_CDR3_post_challenge_unique_vv <- IgH_CDR3_post_challenge_unique_v %>% 
  group_by(gene ) %>% 
  mutate(increase_in_abundance = (abunance[Timepoint=='C1'])-(abunance[Timepoint=='C0'])) %>% 
  ungroup()

我的数据如下所示：

gene	Timepoint	abundance
1	C0	5
2	C1	3
1	C1	6
3	C0	2

【问题讨论】：

你能分享一个最小可重现的例子吗？ stackoverflow.com/help/minimal-reproducible-example 另外，请澄清您的数据的确切结构，例如您的表格在 teimpoint c0 处显示了基因 1 的两个条目。我想，在您的真实数据中不会出现这种情况？ IE。每个基因每个时间点只有一个条目？

标签： r dataframe group-by subtraction

【解决方案1】：

假设（！）每个基因和时间点都有一个条目（与问题中发布的表格相反），您可以pivot_wider您的数据，然后计算每个基因的差异。当然，当前的示例对于大部分缺失的情况并不是很有帮助。

df <- data.frame(gene = c(1, 2, 1, 3),
                 Timepoint = c("c0", "c1", "c1", "c0"),
                 abundance = c(5, 3, 6, 2))

library(tidyverse)

df %>%
  pivot_wider(names_from = Timepoint,
              values_from = abundance,
              id_cols = gene) %>%
  mutate(increase_in_abundance = c1 - c0)

# A tibble: 3 x 4
   gene    c0    c1 increase_in_abundance
  <dbl> <dbl> <dbl>                 <dbl>
1     1     5     6                     1
2     2    NA     3                    NA
3     3     2    NA                    NA

【讨论】：