如何将两列中的项目合并到两个单独的文件中？答案

【问题标题】：How to combine items from two columns in two separate files?如何将两列中的项目合并到两个单独的文件中？
【发布时间】：2015-02-13 07:44:42
【问题描述】：

我有两张表需要比较

表 1：XLOC ID

Column A: Xloc id 
Column B: gene id

表 2：集成 ID

Column A: Ensembl id
Column B: gene Id

在两个表中，有相同的基因 ID（名称，例如 cpa6）。表1有25000个条目，表2有46000个条目。

当 B 列中的两个基因 id 匹配并使用新数据创建输出文件时，我需要将表 2 ColA 中的 Ensemble Id 插入 Table1 的 ColC 中 - 例如

表 1

ENS0002   cpa6

表2：

Xloc0014  cpa6

输出文件，表3：

ENS0002   cpa6   Xloc0014

列的顺序不同，不能按字母顺序排序等。剩下的 21000 个没有相应 Xloc 的条目我会去掉（但可以很容易地完成这个后期输出）。

有谁知道如何在 R、Excel 或其他软件中做到这一点？相对容易？

注意两张表不能按相同的顺序排序，所以我真的需要使用公式/脚本/bash来做到这一点。

【问题讨论】：

How to match/merge data from two different files in R? 的可能重复项
你好，GeneID 是重复的吗？我的意思是，例如，在表 2 中，所有 GeneID 都是唯一的？
见merge，需要读入文件，然后搜索how to merge data.frames，方法很多。

标签： r excel merge

【解决方案1】：

试试这个。我创建了一个示例数据框来展示如何合并并仅保留两个表中存在的值。

如您所见，新表是两者中都存在的这些值的结果，现在您有 3 列具有第二个表的值。

如果您想保留两者中存在的所有行，则必须使用列基因 ID 以保留两者中都存在的这些基因 ID。例如，newTable <- merge(tab1,tab2,by = "gen_id")。

tab1 <- data.frame(col1=c("id1","id2","id3","id4"),col2=c(1,2,3,4))
tab2 <- data.frame(col1=c("id1","id2","id3","id5","id7"),col2=c(1,3,3,5,6))
newTable <- merge(tab1,tab2,by = "col1")

如果您想保留 table1 中的所有内容，但可能它们在 table2 中不存在，请使用此选项。

newTable <- merge(tab1,tab2,by = "col1",all.x=T)

这些将保留 table1 的所有行，并将在 col2.y 处给出一个值，否则您将有 NA。

【讨论】：

【解决方案2】：

在 R 中，我会使用合并函数 merge(Table 1, Table 2,by="cpa6")。

但是，我以前在 Excel 中做过这个，使用 VLOOKUP 函数也很有效。您只需要在 R 中使用 IF 函数，并在其中嵌套 VLOOKUP：

=IF(ISERROR(VLOOKUP(cell with gene name in Table1,array of cells that contain the gen names in Table2, number of the column in the array in Table2,"TRUE" so they match exactly)), Output if true, output if false).

例子：

=IF(ISERROR(VLOOKUP(C4,List1!A1:List1!A$2:A$1000,1,TRUE)), "Does NOT exist in List 1","Exists in List 1")

【讨论】：