【发布时间】:2026-02-08 07:40:02
【问题描述】:
我有以下问题。我有许多涵盖生物梯度的图。从这些图中,我想选择 25 个,它最好地覆盖梯度。为了实现这一点,我提取了最小值和最大值,并计算了最能覆盖梯度的值。然后我选择了与理想值最接近的图。这工作正常。但是,有时一个图与两个理论值最接近,因此,我最终会在列表中出现重复项,我想避免这种情况。显然,我可以增加 length.out 的数量,但从我的角度来看,这不是最佳解决方案。我想最终选择 25 个独特的地块。
以下代码举例说明了这个问题:length.out 设置为 25,但只选择了 19 个图。
data <- structure(list(Plot = c("3", "4", "5", "6", "8", "12", "14",
"15", "17", "18", "19", "20", "21", "22", "23", "25", "26", "28",
"29", "30", "32", "33", "34", "35", "36", "37", "38", "39", "40",
"41", "42", "43", "44", "45", "46", "47", "48", "49"), Value = c(2.19490722347427,
0.817884294633935, 0.834577676660982, 1.19923035999043, 0.293146158435238,
1.93237941781986, 1.74536845664897, 2.22904916731729, 0.789604037117133,
0.439716474953651, 0.834321473446987, 1.07386786707173, 0.977203815084214,
0.539717907433468, 0.950019385036826, 1.10794069639141, 1.41499437622422,
1.12933520841724, 1.99342508363262, 1.05715847816517, 2.27711128641038,
1.9766526350752, 2.16657914911448, 2.01955890337827, 1.1080527140292,
1.16614766657035, 1.04478527637105, 0.980792736677819, 0.818000882117776,
0.656157422806534, 1.07223822052094, 0.799912719334531, 0.4365715090508,
0.824331627537106, 1.19478221856558, 1.06047128780385, 1.54822823084764,
0.582397279167692)), class = "data.frame", row.names = c("3",
"4", "5", "6", "8", "12", "14", "15", "17", "18", "19", "20",
"21", "22", "23", "25", "26", "28", "29", "30", "32", "33", "34",
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45",
"46", "47", "48", "49"))
opt_seq<-seq(min(data$Value), max(data$Value), length.out = 25)
sel_plots <- sapply(opt_seq, function(i) which.min(abs(data$Value - i)))#25 plots
length(unique(sel_plots))
我非常感谢每一个帮助!
【问题讨论】:
-
当 1 个图最接近 2 个值时,您想做什么? IE。如果您的值是 0、4、16、20,并且您的图是 0、2、6、7、8、9、10、14、18、20。如果缺少值为 20 的绘图怎么办?然后怎样呢?当没有“明确”匹配时,您需要决定如何处理情况。
-
如果我理解正确,您可以使用
data.table滚动连接,如所述,例如这里:Find closest value in a vector with binary search.
标签: r function duplicates sapply seq