【发布时间】:2023-03-17 00:33:01
【问题描述】:
我在 R 中有两个数据框,我需要逐行计算元素匹配项,最后得到一列,其中包含两个表的笛卡尔积的长度和两行的 ID。此外,表格很大,行数不同,但列数相同。
我有以下代码,但是多次运行时速度很慢。
library(data.table)
table_1<-data.table(matrix(c(1:24),nrow = 4))
table_2<-data.table(matrix(c(11:34),nrow = 4))
names(table_1)<-c("s1", "s2","s3","s4","s5","s6")
names(table_2)<-c("a1","a2","a3","a4","a5","a6")
table_1$ID<-seq.int(nrow(table_1))
table_2$ID_ap<-seq.int(nrow(table_2))
setcolorder(table_1, c("ID", "s1", "s2","s3","s4","s5","s6"))
setcolorder(table_2, c("ID_ap","a1","a2","a3","a4","a5","a6"))
CJ.table<-function(X,Y) setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]
join<-CJ.table(table_1,table_2)
R<-subset(join, select=c("ID_ap","ID"))
R$Ac<- (join$s1 == join$a1) + (join$s1 ==join$a2) + (join$s1 ==join$a3) + (join$s1 ==join$a4) + (join$s1 ==join$a5) + (join$s1 ==join$a6)+
(join$s2 == join$a1) + (join$s2 ==join$a2) + (join$s2 ==join$a3) + (join$s2 ==join$a4) + (join$s2 ==join$a5) + (join$s2 ==join$a6)+
(join$s3 == join$a1) + (join$s3 ==join$a2) + (join$s3 ==join$a3) + (join$s3 ==join$a4) + (join$s3 ==join$a5) + (join$s3 ==join$a6)+
(join$s4 == join$a1) + (join$s4 ==join$a2) + (join$s4 ==join$a3) + (join$s4 ==join$a4) + (join$s4 ==join$a5) + (join$s4 ==join$a6)+
(join$s5 == join$a1) + (join$s5 ==join$a2) + (join$s5 ==join$a3) + (join$s5 ==join$a4) + (join$s5 ==join$a5) + (join$s5 ==join$a6)+
(join$s6 == join$a1) + (join$s6 ==join$a2) + (join$s6 ==join$a3) + (join$s6 ==join$a4) + (join$s6 ==join$a5) + (join$s6 ==join$a6)
给了
R
ID_ap ID Ac
1: 1 1 0
2: 1 2 0
3: 1 3 4
4: 1 4 0
5: 2 1 0
6: 2 2 0
7: 2 3 0
8: 2 4 4
9: 3 1 3
10: 3 2 0
11: 3 3 0
12: 3 4 0
13: 4 1 0
14: 4 2 3
15: 4 3 0
16: 4 4 0
【问题讨论】:
-
你的“data.frame”的维度是什么?它们包含什么值?
-
大约有 10k 行和 100 行矩阵,填充了小的非零正整数。
-
在一行中,值总是不同的?
-
是的,行中的值总是不同的,而矩阵中的行总是不同的。但是在两个矩阵之间可能有相等的行@Frank
-
关于您的 CJ.table,您可能对这个问题感兴趣:stackoverflow.com/q/25888706
标签: r performance optimization data.table match