根据行值从数据框中的不同列中提取数据答案

【问题标题】：Extracting data from different columns in a data frame based on row values根据行值从数据框中的不同列中提取数据
【发布时间】：2020-03-21 00:48:31
【问题描述】：

从数据框中的每一行，df，我想提取列中的值，如下所述，并创建一个新的数据框，输出。

当 Year 等于 2003 时，我需要 Y_2001 和 Y_2002 列中的值，在输出数据框中作为 Year 1 和 Year 2。它们是对应于 Year 列中指定的前两年的值。同样，如果年份等于 2006，我需要输出数据框中 Y_2004 和 Y_2005 中的值。同样，对于 Year 列中的所有年份。

> df
     ID Year Y_2001 Y_2002 Y_2003 Y_2004 Y_2005
[1,]  1 2003      2      4      6      4      3
[2,]  2 2004      5      9      7      1      2
[3,]  3 2006      4      3      5      7      8
[4,]  4 2004      7      6      4      8      9

> output
     ID Year Year1 Year2
[1,]  1 2003     2     4
[2,]  2 2004     9     7
[3,]  3 2006     7     8
[4,]  4 2004     6     4

有人可以帮我创建一个代码来获得以上输出吗？非常感谢任何支持。

【问题讨论】：

标签： r dataframe dplyr reshape

【解决方案1】：

这是tidyverse 解决方案：

将获取数据并使用pivot_longer 输入长格式。感兴趣的数据值是“年”“行”比“列”年少 1 年或 2 年。您可以 filter 处理这些差异（filter 这里明确表示 1 或 2 年的差异）。

使用mutate 为您的列名Year1 和Year2 创建了一个附加列（注意Year1 相差2 年，Year2 相差1 年，因此减去这些值从 3 开始反转）。最后，pivot_wider 将数据放回宽格式。

library(tidyverse)

df %>%
  pivot_longer(cols = -c(ID, Year), names_to = c(".value", "Year_Sep"), names_sep = "_", names_ptypes = list(Year_Sep = numeric())) %>%
  filter(Year - Year_Sep == 1 | Year - Year_Sep == 2) %>%
  mutate(YearCol = paste0("Year", 3 - (Year - Year_Sep))) %>%
  pivot_wider(id_cols = c(ID, Year), names_from = YearCol, values_from = Y)

输出

# A tibble: 4 x 4
     ID  Year Year1 Year2
  <int> <int> <int> <int>
1     1  2003     2     4
2     2  2004     9     7
3     3  2006     7     8
4     4  2004     6     4

【讨论】：

【解决方案2】：

有点笨拙的解决方案，但是...

i.col <- function(data, n) { # Returns the column index corresponding to the year
  sapply(data$Year-n, function(x) grep(x, names(data)))
}

df$Year1 <- diag(as.matrix(df[, i.col(df, n=2)]))
df$Year2 <- diag(as.matrix(df[, i.col(df, n=1)]))

编辑：显然使用diag 是非常slow。首选使用cbind 访问矩阵元素。

df$Year1 <- df[cbind(1:4, i.col(df, n=2))] # where 4 is number of rows
df$Year2 <- df[cbind(1:4, i.col(df, n=1))]

df
  ID Year Y_2001 Y_2002 Y_2003 Y_2004 Y_2005 Year1 Year2
1  1 2003      2      4      6      4      3     2     4
2  2 2004      5      9      7      1      2     9     7
3  3 2006      4      3      5      7      8     7     8
4  4 2004      7      6      4      8      9     6     4

【讨论】：

我想知道类似的事情。这对我很有帮助，谢谢。
是的 - 我不喜欢 diag 位，但我不知道如何选择矩阵的各个元素。 >.
非常感谢@Edward。对于更大的数据集，假设我需要提取 50 年的数据 - 有没有办法从一个代码中提取数据，而无需为每年编写代码（df$Year49、df$Year50 等？
可能，使用lapply 或类似的。但在建议实际代码之前，我需要查看数据样本和预期输出。

【解决方案3】：

这是一种按行排列的方法apply，假设您可以找出起始年份 (2001)。

cbind(df[1:2], t(apply(df[-1], 1, function(x) 
               { vals <- x[1] - 2001; x[c(vals:(vals + 1))]})))

#  ID Year 1 2
#1  1 2003 2 4
#2  2 2004 9 7
#3  3 2006 7 8
#4  4 2004 6 4

【讨论】：