R删除数据框中的公共列答案

【问题标题】：R remove common columns in dataframesR删除数据框中的公共列
【发布时间】：2021-06-15 01:45:04
【问题描述】：

我有 2 个 dfs（简化示例）：

    df1 a b c g ... 
        1 0 0 0
        2 0 0 1

和

    df2 a b d e f ...
        1 1 0 0 0
        2 0 0 0 1

我想合并 2 个 dfs，但在加入之前我想删除 df1 和 df2 中的公共列。所以我会保留列 (c,d,e,f,g)，因为 a 和 b 在 df1 和 df2 中很常见。

所以基本上与这里的回答相反：

delete columns in data frame not in common with another (R)

【问题讨论】：

标签： r tidyverse

【解决方案1】：

使用集合操作，即。 union intersect 和 setdiff 在两个 dfs 的 names 上，我们可以这样做

df1 <- read.table(header = T, text = 'a b c g
        1 0 0 0
        2 0 0 1')

df2 <- read.table(header = T, text = 'a b d e f
        1 1 0 0 0
        2 0 0 0 1')

# uncommon column names
x <- setdiff(union(names(df1), names(df2)), intersect(names(df1), names(df2)))

cbind(df1[names(df1) %in% x], df2[names(df2) %in% x])
#>   c g d e f
#> 1 0 0 0 0 0
#> 2 0 1 0 0 1

^{由reprex package (v2.0.0) 于 2021-06-15 创建}

【讨论】：

【解决方案2】：

在基础 R 中，您可以首先使用 duplicated 函数来计算出两个数据框共有哪些列名。从那里，只需从每个数据框中选择和绑定 不在此列表中 的列。

dupes <- c(names(df1), names(df2))[duplicated(c(names(df1), names(df2)))]

df3 <- cbind(df1[, -which(names(df1) %in% dupes)], df2[, -which(names(df2) %in% dupes)])

按照您的示例，这将生成以下数据框，其中仅包含来自其他每个列的唯一列。这是基于两个数据帧具有相同行数的假设。

df3 c g d e f ...
    0 0 0 0 0
    0 1 0 0 1

【讨论】：

【解决方案3】：

您可以使用 dplyr 包来做到这一点：

df1 <- df1 %>%
   select(c,d,e,f,g)

dplyr 包中的select() 会保留您想要保留的列。

【讨论】：