根据另一个数据集的值和标题创建一个新列答案

【问题标题】：Create a new column based on the the values and heading of another dataset根据另一个数据集的值和标题创建一个新列
【发布时间】：2021-07-24 19:47:08
【问题描述】：

假设我有一个原始数据集，其第一列中的值是字母表中的 a 到 d df1：

a x1
b x2
c x3
d x4
e x5

然后我有另一个数据集，它有多个列，但其条目引用了上述数据集中的列df2

---------
A | B | C
---------
a   b   c
    d   e

我想使用 R 函数来使用 df2（上面的 a、b、c 和 d）中的唯一值，以便在 df1 数据集中创建一个引用相应标题的新列df2 中的列，即df3

a x1 A
b x2 B
c x3 C
d x4 B
e x5 C

。 工作示例：

> # data frame with numbers and characters
> df1 = data.frame(unique_values=letters[1:5], other_col=paste(rep("x",5), 1:5, sep=""))
> print(df1)
  unique_values other_col
1             a        x1
2             b        x2
3             c        x3
4             d        x4
5             e        x5
> #  Create dataset that is then used to create new column
> df2 = data.frame(A = c("a",NA), B=c("b","d"), C=c("c","e") )
> df2
     A B C
1    a b c
2 <NA> d e

# Using df1 and columns referenging the df1 in df2 create df3
library(dplyr)
#df3?

【问题讨论】：

标签： r dataframe dplyr tidyr data-wrangling

【解决方案1】：

使用merge + stack 的基本 R 选项

merge(df1, setNames(na.omit(stack(df2)), c("unique_values", "names")))

给予

  unique_values other_col names
1             a        x1     A
2             b        x2     B
3             c        x3     C
4             d        x4     B
5             e        x5     C

【讨论】：

【解决方案2】：

将第二个数据重塑为“长”格式，然后进行连接

library(dplyr)
library(tidyr)
pivot_longer(df2, everything(), values_to = 'unique_values', 
    values_drop_na = TRUE) %>%
  left_join(df1)

-输出

# A tibble: 5 x 3
#  name  unique_values other_col
#  <chr> <chr>         <chr>    
#1 A     a             x1       
#2 B     b             x2       
#3 C     c             x3       
#4 B     d             x4       
#5 C     e             x5

【讨论】：

【解决方案3】：

data.table 版本：

library(data.table)

merge(setDT(df1), melt(setDT(df2), measure.vars = names(df2)), 
      by.x = 'unique_values', by.y = 'value')

#   unique_values other_col variable
#1:             a        x1        A
#2:             b        x2        B
#3:             c        x3        C
#4:             d        x4        B
#5:             e        x5        C

【讨论】：