使用 Collections.binarySearch() 进行谓词搜索（即不完全匹配）答案

【问题标题】：Using Collections.binarySearch() for predicate search (i.e., not complete match)使用 Collections.binarySearch() 进行谓词搜索（即不完全匹配）
【发布时间】：2019-01-09 10:38:15
【问题描述】：

我有一个按升序排列的时间戳列表：

List<Instant> timestamps = ...; // note: sorted in ascending order

现在，给定一个输入时间戳Instant inputTs，我想在timestamps 中找到一个满足t.isBefore(inputTs) && inputTs.isBefore(t.plusMillis(SOME_CONSTANT)) 的条目t，即，我只是在寻找一个t，这样inputTs 就在其中从t 开始的一些固定长度持续时间的界限。我承认理论上可以有多个这样的ts，所以允许搜索在这些之间任意选择。

Collections.binarySearch(...) 重载需要一个键，表明常见/预期用例是搜索“完全匹配”/相同条目（缺少更好的词，抱歉）。但是，在我的情况下，inputTs 将与timestamps 中存在的条目不同，因为inputTs 预计将是在timestamps 中的某些条目t 之后不久的一个时间点。

我的想法是简单地使我提供给 Collections.binarySearch(...) 的 Comparator<Instant> 在谓词成立时返回 0：

public class TimestampFinder {
    private static final long SOME_CONSTANT = 10_000;
    private List<Instant> timestamps = ... ; // initialize and sort

    public Instant find(Instant inputTs) {
        int index = Collections.binarySearch(timestamps, inputTs, (t, key) -> {
            if (t.isBefore(key) && key.isBefore(t.plusMillis(SOME_CONSTANT))) {
                // inputTs is part of the duration after t
                // return 0 to indicate that we've found a match
                return 0;
            }
            // inputTs not in interval
            // use Instant's implementation of Comparator to indicate to binarySearch if it should continue the search in the upper or lower half of the list
            return t.compareTo(key);
        });
        return index >= 0 ? timestamps.get(index) : null;
    }
}

这是解决此问题的正确（有效）方法，还是我忽略了更好的替代方法？请注意，对find(Instant) 的调用次数将大大超过timestamps 中的元素数量，这就是为什么我认为对timestamps 进行排序所产生的开销是合理的。

【问题讨论】：

我没有发现问题。二分搜索最坏情况的性能将是 O(log m)，而您的 Comparator 将是常数 O(1)。
@gtgaxiola 酷，感谢您的评论！不太确定我是否在“滥用”binarySearch（如果有其他更适合此任务的 API）。
这个列表可以重复吗？如果没有，TreeSet 将使这更容易。
@VGR 不，timestamps 将（应该）没有重复。您将使用set.floor(inputTs) 实现它，然后执行后续检查以查看inputTs 是否在floor 返回的元素之后不超过SOME_CONSTANT 毫秒？
差不多，是的。

标签： java collections binary-search java-time

【解决方案1】：

Collections.binarySearch 没有有用于精确匹配。如文档中所述，如果未找到完全匹配，它将返回 -1 - i，其中 i 是列表中下一个更高元素的索引。

只需按自然顺序搜索inputTs。如果没有找到，那么您可以从inputTs 导出下一个更高的Instant 的索引（只需执行-1 - resultOfBinarySearch）。如果该索引处的Instant 是before(inputTs.plusMillis(CONSTANT))，那么你就完成了，否则，不存在这样的Instant。

我确实认为您现有的解决方案在某些方面滥用了binarySearch，这是值得的。

【讨论】：

感谢您分享您的想法； +1。能否请您详细说明一下为什么您认为这种使用 binarySearch 令人担忧？
我也很好奇。
它主要违反了Comparator 合同。这不是传递 Comparator。特别是因为这个可以用一个完全有效的正常Comparator来解决，这似乎是错误的。
嗯，这很尴尬，但我不明白为什么它不具有传递性......根据文档，传递性被定义为“((compare(x, y)>0) && (compare(y, z)>0)) 暗示compare(x, z)>0”。使用上面定义的Comparator，只有当val1 和val2 位于不同的区间时，compare(val1,val2) 才会大于零。因此，要首先保持lhs，x、y和z必须都在不同的区间，然后compare(x,z)也将大于0。
compare(x, y) 可以是 0 和 compare(y, z) 可以是 0 但 compare(x, z) 可以是非零值，例如y = x + SOME_CONSTANT - 1 和 z = y + SOME_CONSTANT - 1.