计算区域中的对象数答案

【问题标题】：Count number of objects in a region计算区域中的对象数
【发布时间】：2021-03-03 02:04:17
【问题描述】：

对于许多图像，我有一个带有图像上对象坐标的小标题。我想计算每个对象周围指定大小的框中的图像数量（类似于邻居的数量）。到目前为止，我想出了 for 循环，它对 tibble 进行子集化并计算行数。

raw.data <- structure(list(ImageNumber = c(67, 67, 67, 67, 67, 67, 67, 67, 
67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 
67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 
67), ObjectNumber = c(1, 2, 5, 6, 7, 10, 11, 13, 16, 34, 35, 
42, 44, 46, 54, 58, 67, 77, 82, 90, 94, 107, 153, 158, 169, 201, 
223, 254, 294, 315, 386, 493, 508, 553, 599, 606, 612, 625, 676, 
678, 697), Location_Center_X.nuc = c(46.3557910673732, 189.630407911001, 
238.322766570605, 253.236234458259, 134.482566248257, 45.7193336698637, 
136.949320148331, 292.452631578947, 238.591869918699, 147.364275668073, 
93.859943977591, 169.394435351882, 253.794247787611, 97.1797752808989, 
258.430194805195, 233.346428571429, 202.378378378378, 297.966403162055, 
229.343333333333, 298.730679156909, 243.604806408545, 256.607266435986, 
279.823886639676, 288.966666666667, 278.035714285714, 264.86592178771, 
161.519230769231, 280.364672364672, 299.832929782082, 271.572481572482, 
7.72075471698113, 5.81395348837209, 284.742857142857, 291.826747720365, 
5.4331983805668, 295.924778761062, 198.463709677419, 282.083094555874, 
248.316239316239, 281.019867549669, 19.6458333333333), Location_Center_Y.nuc = c(237.48145344436, 
56.1885043263288, 175.412103746398, 144.548845470693, 199.902370990237, 
122.95406360424, 23.9406674907293, 266.46015037594, 116.671544715447, 
122.617440225035, 20.5756302521008, 152.31914893617, 93.3495575221239, 
167.223314606742, 195.261363636364, 26.0714285714286, 123.351351351351, 
227.009881422925, 85.19, 41.9789227166276, 290.567423230975, 
34.9671280276817, 164.975708502024, 91.5090909090909, 39.7205882352941, 
222.66852886406, 238.157692307692, 73.1880341880342, 191.019370460048, 
128.415233415233, 107.4, 37.5488372093023, 210.244155844156, 
131.577507598784, 150.072874493927, 152.650442477876, 3.77016129032258, 
110.702005730659, 2.28205128205128, 3.02649006622517, 2.59027777777778
)), row.names = c(NA, -41L), class = c("tbl_df", "tbl", "data.frame"
))

radius = 80
raw.data$Density.80 = NA;

for (i in 1:nrow(raw.data)){
  x = raw.data$Location_Center_X.nuc[i]
  y = raw.data$Location_Center_Y.nuc[i]
  imN = raw.data$ImageNumber[i]
  sub_samp = raw.data[which(raw.data$Location_Center_X.nuc >= x-radius &
                              raw.data$Location_Center_X.nuc <= x+radius &
                              raw.data$Location_Center_Y.nuc >= y-radius &
                              raw.data$Location_Center_Y.nuc <= y+radius &
                              raw.data$ImageNumber == imN),]
  raw.data$Density.80[i] = nrow(sub_samp) - 1
}

问题在于，对于大型数据集（成百上千图像中的数万到数十万个对象），此过程需要数小时。因此，盒子大小的优化将花费很长时间。

我想编写一个可以加速该过程的函数。这是我尝试返回每个图像的单个数字，而不是每个对象的数字。而且我还在为如何使用purrr::map_*应用这种功能而苦恼

count_neighbors <- function(.data, radius, ...){
  .data %>%
    group_by(ImageNumber) %>%
    filter(between(Location_Center_X.nuc, Location_Center_X.nuc - radius, Location_Center_X.nuc + radius) &
             between(Location_Center_Y.nuc, Location_Center_Y.nuc - radius, Location_Center_Y.nuc + radius)) %>%
    tally()
    
}

count_neighbors(raw.data, radius = 80)

【问题讨论】：

标签： r dplyr purrr

【解决方案1】：

您可以编写一个函数，计算一个对象在该区域中的对象数量。

count_values <- function(x, y, xVal, yVal, radius) {
  sum(xVal >= x-radius & xVal <= x+radius &
    yVal >= y-radius & yVal <= y+radius) - 1
}

您可以对图像中的每个对象使用此功能。

library(dplyr)
library(purrr)

raw.data %>%
  group_by(ImageNumber) %>%
  mutate(result = map2_dbl(Location_Center_X.nuc, Location_Center_Y.nuc, 
                       ~count_values(.x, .y, Location_Center_X.nuc, 
                                     Location_Center_Y.nuc, 80))) -> raw.data

raw.data

【讨论】：

【解决方案2】：

一种解决方案是使用1:nrow(df) 作为 purrr 地图的主要参数。

get_image_counts <- function(df, distance){
  
purrr::map(1:nrow(df), function(idx){
  
  x <- df[idx,] %>% pull(Location_Center_X.nuc)
  y <- df[idx,] %>% pull(Location_Center_Y.nuc)
  
  df %>% filter(Location_Center_X.nuc > x - distance & Location_Center_X.nuc < x + distance &
                      Location_Center_Y.nuc > y - distance & Location_Center_Y.nuc < y + distance) %>% 
    nrow
  
}) %>% unlist

}

raw.data %>% tibble::add_column(neighbs = get_image_counts(raw.data, radius))

这个解决方案的一个好处是它可以很好地处理多个图像。

raw.data %>% group_split(ImageNumber) %>% purrr::map(function(df){
  
  df %>% tibble::add_column(neighbs = get_image_counts(df, radius))
  
})

这将为您提供一个带有新列的 tibbles 列表，neighbs，它给出了图像中相邻对象的计数，我认为这就是您正在寻找的。在没有看到完整数据的情况下，不能说这是否能解决您的问题。如果速度太慢，您可能需要使用 furrr 包，它提供了并行地图功能。

【讨论】：

感谢您的建议，但是获得的结果与我的 for 循环不同。不知道为什么会这样，但无论如何其他答案解决了我的问题。