【发布时间】:2015-04-02 04:42:13
【问题描述】:
这是我的树。第一列是分支的标识符,其中0 是主干,L 是左侧的第一个分支,R 是右侧的第一个分支。 LL 是第二次分叉后最左边的分支,等等。变量length 包含每个分支的长度。
> tree
branch length
1 0 20
2 L 12
3 LL 19
4 R 19
5 RL 12
6 RLL 10
7 RLR 12
8 RR 17
这是这棵树的图画
这是这棵树上的两个位置
pos1 = tree[3,]; pos1$length = 12
pos2 = tree[6,]; pos2$length = 3
我构建了这个算法来计算树上任意两点之间沿着树枝的最短距离。
distance = function(tree, pos1, pos2){
if (identical(pos1$branch, pos2$branch)){Dist=pos1$length-pos2$length;return(Dist)}
pos1path = strsplit(pos1$branch, "")[[1]]
if (pos1path[1]!="0") {pos1path = c("0", pos1path)}
pos2path = strsplit(pos2$branch, "")[[1]]
if (pos2path[1]!="0") {pos2path = c("0", pos1path)}
CommonTrace="included"; for (i in 1:min(length(pos1path), length(pos2path))) {if (pos1path[i] != pos2path[i]) {CommonTrace = i-1; break}}
if(CommonTrace=="included"){
CommonTrace = min(length(pos1path), length(pos2path))
if (length(pos1path) > length(pos2path)) {longerpos = pos1; shorterpos = pos2; longerpospath = pos1path} else {longerpos = pos2; shorterpos = pos1; longerpospath = pos2path}
distToNode = 0
if ((CommonTrace+1) != length(longerpospath)){
for (i in (CommonTrace+1):(length(longerpospath)-1)){ distToNode = distToNode + tree$length[tree$branch == paste(longerpospath[2:i], collapse='')]}
}
Dist = distToNode + longerpos$length + (tree[tree$branch == shorterpos$branch,]$length-shorterpos$length)
if (identical(shorterpos, pos1)){Dist=-Dist}
return(Dist)
} else { # if they are sisterbranch
Dist=0
if((CommonTrace+1) != length(pos1path)){
for (i in (CommonTrace+1):(length(pos1path)-1)){ Dist = Dist + tree$length[tree$branch == paste(pos1path[2:i], collapse='')]}
}
if((CommonTrace+1) != length(pos2path)){
for (i in (CommonTrace+1):(length(pos2path)-1)){ Dist = Dist + tree$length[tree$branch == paste(pos2path[2:i], collapse='')]}
}
Dist = Dist + pos1$length + pos2$length # signdistance does not apply!
return(Dist)
}
}
我认为该算法运行良好。然后,我将遍历所有感兴趣的位置。
for (i in allpositions){
for (j in allpositions){
mat[i,j] = distance(tree, i, j)
}
}
问题是我有大约 50000 个位置的非常大的树,我想计算任意两个位置之间的距离,即我要计算数倍于 50000^2 的距离。它需要永远!你能帮我改进我的代码吗?
【问题讨论】:
-
你能展示你到目前为止所做的事情吗?
-
你可能做过,但我需要问一下......你确定你已经探索了可能有助于做到这一点的包的所有可能性吗?我在想也许是 igraph 或类似的东西?
标签: r performance tree