【问题标题】:Calculate time utilized from interval data从间隔数据计算使用的时间
【发布时间】:2015-10-22 01:56:32
【问题描述】:

我有这四个以秒为单位的数字:

maxtime = 240.0333
mintime = 181.1333
times1 = c(179.1333, 183.8000, 192.3000, 194.0000, 196.2500, 198.8333, 203.4333, 217.8167)
times2 = c(183.1333, 187.8000, 196.3000, 198.0000, 200.2500, 202.8333, 207.4333, 221.8167)

您会注意到times1times2 的长度相同。每个对应的元素相隔 4 秒。也就是说,times1times2 早 4 秒。

说明我的问题的最佳方式是像这样绘制这些数据:

library(ggplot2)
library(dplyr)
dfplot<- data.frame(ymin=times1, ymax=times2) 
data.frame(x=c(rep("min",length(times1)), rep("max",length(times1))), 
           y=c(times1,times2), 
           id=1:length(times1)) %>%
  ggplot(., aes(id,y,group=id)) + 
  geom_path(lwd=2) +
  coord_flip() +
  geom_hline(yintercept = as.numeric(mintime), lty=2,color='red', lwd=1)+
  geom_hline(yintercept = as.numeric(maxtime), lty=2,color='red', lwd=1)+
geom_rect(data=dfplot,aes(xmin=0,ymin=ymin,xmax=length(times1),ymax=ymax,fill="red"),alpha=0.2,inherit.aes=FALSE) +
  theme(panel.background = element_blank(), plot.background = element_blank())

我要做的是计算times1times2 中每对元素之间的间隔所涵盖的时间。这些由黑色水平线和红色矩形表示。如您所见,其中一些可能重叠。实际上,我想计算两条红色虚线之间的时间比例被黑线/红色矩形覆盖,什么比例没有被覆盖(即白色间隙)。

我希望这是有道理的。

【问题讨论】:

标签: r


【解决方案1】:

以下代码似乎对我有用。

代码的想法是将重叠的段合并成更大的段,计算它们,然后计算它们的长度。

library(dplyr)
library(assertthat)

# Make sure times1 is sorted
assert_that(identical(sort(times1), times1))

# Create segments from specified times
segments <- lapply(seq_along(times1),
                   function(x) {
                     assert_that( times1[[x]] < times2[[x]] )
                     list(c(times1[[x]],times2[[x]]))
                    })

DASHED_BEGIN <- mintime
DASHED_END <-  maxtime

# Cut off the segments based on dashed lines
segments_cut_off <- lapply(segments, function(xx) {
  x <- xx[[1]]

  # Is it within dashed interval?
  if ((x[2] < DASHED_BEGIN) || (x[1] > DASHED_END))
    return (NULL) # No

  # Yes
  (list(c(max(DASHED_BEGIN,x[1]),
              min(DASHED_END,x[2]))))

}) %>% Filter(f = Negate(is.null))

# Function for determining the union of two segments
seg_union <- function(xx,yy) {
  prev_x <- xx[1:length(xx)-1]
  x <- xx[[length(xx)]]
  y <- yy[[1]]

  # Do they intersect
  if (x[[2]]<y[[1]] || y[[2]] < x[[1]]) {
    # No. Return each separately as well as 
    # attach the previous segments
    return (c(prev_x,list(x,y)))
  }

  # Yes. Calculate the union 
  # (and attach the previous segments too)
  (c(prev_x,
     list(c(min(x[[1]],y[[1]]), 
            max(x[[2]],y[[2]])))
  ))

}

# Create the full list of conglomerated segments
union_lst <- Reduce(f = seg_union, x = segments_cut_off)

union_lst

这给了我:

[[1]]
[1] 181.1333 183.1333

[[2]]
[1] 183.8 187.8

[[3]]
[1] 192.3000 202.8333

[[4]]
[1] 203.4333 207.4333

[[5]]
[1] 217.8167 221.8167

现在我们只是将它们的长度相加

vapply(union_lst, function(x) (x[2] - x[1]), 
                  FUN.VALUE = numeric(1)) %>%
   sum

【讨论】:

    【解决方案2】:

    使用来自 BioConductor 的 GenomicRanges 库,对上一个答案有一个帽子提示:https://stackoverflow.com/a/27576114/496803

    由于它只处理整数数据,因此您必须将值相乘以覆盖小数点后的部分。

    df <- data.frame(times1=times1*10000, times2=times2*10000, id=1)
    total <- data.frame(times1=mintime*10000,times2=maxtime*10000, id=1)
    
    #source("http://bioconductor.org/biocLite.R")
    #biocLite("GenomicRanges")
    library(GenomicRanges)
    
    dfR <- makeGRangesFromDataFrame(
     df, start.field="times1", end.field="times2", 
     seqnames.field="id"
    )
    
    totalR <- makeGRangesFromDataFrame(
     total, start.field="times1", end.field="times2", 
     seqnames.field="id"
    )
    
    result <- intersect(totalR, dfR)
    result
    #GRanges object with 5 ranges and 0 metadata columns:
    #      seqnames             ranges strand
    #         <Rle>          <IRanges>  <Rle>
    #  [1]        1 [1811333, 1831333]      *
    #  [2]        1 [1838000, 1878000]      *
    #  [3]        1 [1923000, 2028333]      *
    #  [4]        1 [2034333, 2074333]      *
    #  [5]        1 [2178167, 2218167]      *
    
    sum(round(as.data.frame(result)$width/10000,3))
    #[1] 24.533
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-08-31
      • 2014-02-18
      • 1970-01-01
      • 2016-11-14
      • 2012-08-14
      • 2011-07-02
      • 1970-01-01
      • 2018-07-08
      相关资源
      最近更新 更多