【发布时间】:2016-08-07 07:29:49
【问题描述】:
我想根据另一个数据框的行对其某些列进行子集化。所以这两个数据框如下图:
df1 <- structure(list(ID = structure(c(3L, 1L, 2L, 5L, 4L), .Label = c("cg08", "cg09", "cg29", "cg36", "cg65"), class = "factor"), chr = c(16L, 3L, 3L, 1L, 8L), gene = c(534L, 376L, 171L, 911L, 422L), GS12 = c(0.15, 0.87, 0.6, 0.1, 0.72), GS32 = c(0.44, 0.93, 0.92, 0.07, 0.91), GS56 = c(0.46, 0.92, 0.62, 0.06, 0.87), GS87 = c(0.79, 0.93, 0.86, 0.08, 0.88)), .Names = c("ID", "chr", "gene", "GS12", "GS32", "GS56", "GS87"), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
df2 <- structure(list(samples = structure(c(1L, 2L, 4L, 3L, 6L, 5L), .Label = c("GS32", "GS33", "GS55", "GS56", "GS68", "GS87"), class = "factor"), ID2 = structure(c(1L, 6L, 3L, 4L, 5L, 2L), .Label = c("GM1", "GM10", "GM17", "GM18", "GM19", "GM7"), class = "factor")), .Names = c("samples", "ID2" ), class = "data.frame", row.names = c(NA, -6L))
数据:
df1:
ID chr gene GS12 GS32 GS56 GS87
1 cg29 16 534 0.15 0.44 0.46 0.79
2 cg08 3 376 0.87 0.93 0.92 0.93
3 cg09 3 171 0.60 0.92 0.62 0.86
4 cg65 1 911 0.10 0.07 0.06 0.08
5 cg36 8 422 0.72 0.91 0.87 0.88
df2:
samples ID2
GS32 GM1
GS33 GM7
GS56 GM17
GS55 GM18
GS87 GM19
GS68 GM10
我想对 df2 的 ID 列中常见的 df1 中的所有列进行子集化(同时保留最终输出中的所有行),简而言之,我想根据行对一个数据框的列进行子集化另一个数据框,有没有什么功能可以做到这一点?
【问题讨论】:
-
您的预期结果是什么?
-
尝试
df1[intersect(names(df1), df2$samples)]如果df2$samples是factor使用as.character(df2$samples) -
我会看看 data.table 包和函数 foverlaps。也许给我的这个答案也会对你有所帮助:stackoverflow.com/questions/35719047/…
标签: r subset bioinformatics