如何检测Java中“无符号”长乘法的溢出？答案

【问题标题】：How to detect overflow of "unsigned" long multiplication in Java?如何检测Java中“无符号”长乘法的溢出？
【发布时间】：2013-11-18 22:41:54
【问题描述】：

当然，Java 没有“无符号”长值，但有时有符号长值被有效地视为无符号长值（例如，System.nanoTime() 的结果）。从这个意义上说，算术溢出并不意味着值的溢出，而是意味着 64 位表示的溢出。例子：

Long.MAX_VALUE * 2L  // overflows the signed product but not the unsigned product
Long.MAX_VALUE * 4L  // overflows the signed product and the unsigned product
-1L * 2L             // overflows the unsigned product but not the signed product

测试乘法是否溢出似乎有些复杂，因为运算的符号会妨碍。请注意，任何负值乘以除 0 或 1 以外的任何值都会溢出无符号乘积，因为负值的最高位已设置。

确定两个“无符号”长值（实际上是有符号长值）的乘积是否会溢出 64 位表示的最佳方法是什么？使用BigInteger 的实例是一个明显的解决方案，我得出了一个仅涉及原始操作的复杂测试，但我觉得我遗漏了一些明显的东西。

【问题讨论】：

我认为 System.nanoTime() 不需要总是返回正值。
正是我的观点。它返回相对于某些未指定来源的值，这意味着负值与正值一样合法。实际上，System.nanoTime() 返回的值可以被认为是无符号的（或符号无关的）。
哦，我明白了。这为问题增加了另一个维度，我只考虑了正值的一半......

标签： java long-integer multiplication integer-overflow unsigned-integer

【解决方案1】：

给定两个我们假装为无符号长值的有符号长值，这里是如何确定无符号乘积是否会溢出，只使用有符号的原始操作（请原谅迂腐）：

boolean unsignedMultiplyOverflows(final long a, final long b) {
    if ((a == 0L) || (b == 0L)) {
        // Unsigned overflow of a * b will not occur, since the result would be 0.
        return false;
    }
    if ((a == 1L) || (b == 1L)) {
        // Unsigned overflow of a * b will not occur, since the result would be a or b.
        return false;
    }
    if ((a < 0L) || (b < 0L)) {
        // Unsigned overflow of a * b will occur, since the highest bit of one argument is set, and a bit higher than the lowest bit of the other argument is set.
        return true;
    }
    /*
     * 1 < a <= Long.MAX_VALUE
     * 1 < b <= Long.MAX_VALUE
     *
     * Let n == Long.SIZE (> 2), the number of bits of the primitive representation.
     * Unsigned overflow of a * b will occur if and only if a * b >= 2^n.
     * Each side of the comparison must be re-written such that signed overflow will not occur:
     *
     *     [a.01]  a * b >= 2^n
     *     [a.02]  a * b > 2^n - 1
     *     [a.03]  a * b > ((2^(n-1) - 1) * 2) + 1
     *
     * Let M == Long.MAX_VALUE == 2^(n-1) - 1, and substitute:
     *
     *     [a.04]  a * b > (M * 2) + 1
     *
     * Assume the following identity for non-negative integer X and positive integer Y:
     *
     *     [b.01]  X == ((X / Y) * Y) + (X % Y)
     *
     * Let X == M and Y == b, and substitute:
     *
     *     [b.02]  M == ((M / b) * b) + (M % b)
     *
     * Substitute for M:
     *
     *     [a.04]  a * b > (M * 2) + 1
     *     [a.05]  a * b > ((((M / b) * b) + (M % b)) * 2) + 1
     *     [a.06]  a * b > ((M / b) * b * 2) + ((M % b) * 2) + 1
     *
     * Assume the following identity for non-negative integer X and positive integer Y:
     *
     *     [c.01]  X == ((X / Y) * Y) + (X % Y)
     *
     * Let X == ((M % b) * 2) + 1 and Y == b, and substitute:
     *
     *     [c.02]  ((M % b) * 2) + 1 == (((((M % b) * 2) + 1) / b) * b) + ((((M % b) * 2) + 1) % b)
     *
     * Substitute for ((M % b) * 2) + 1:
     *
     *     [a.06]  a * b > ((M / b) * b * 2) + ((M % b) * 2) + 1
     *     [a.07]  a * b > ((M / b) * b * 2) + (((((M % b) * 2) + 1) / b) * b) + ((((M % b) * 2) + 1) % b)
     *
     * Divide each side by b (// represents real division):
     *
     *     [a.08]  (a * b) // b > (((M / b) * b * 2) + (((((M % b) * 2) + 1) / b) * b) + ((((M % b) * 2) + 1) % b)) // b
     *     [a.09]  (a * b) // b > (((M / b) * b * 2) // b) + ((((((M % b) * 2) + 1) / b) * b) // b) + (((((M % b) * 2) + 1) % b) // b)
     *
     * Reduce each b-divided term that otherwise has a known factor of b:
     *
     *     [a.10]  a > ((M / b) * 2) + ((((M % b) * 2) + 1) / b) + (((((M % b) * 2) + 1) % b) // b)
     *
     * Let c == ((M % b) * 2) + 1), and substitute:
     *
     *     [a.11]  a > ((M / b) * 2) + (c / b) + ((c % b) // b)
     *
     * Assume the following tautology for integers X, Y and real Z such that 0 <= Z < 1:
     *
     *     [d.01]  X > Y + Z <==> X > Y
     *
     * Assume the following tautology for non-negative integer X and positive integer Y:
     *
     *     [e.01]  0 <= (X % Y) // Y < 1
     *
     * Let X == c and Y == b, and substitute:
     *
     *     [e.02]  0 <= (c % b) // b < 1
     *
     * Let X == a, Y == ((M / b) * 2) + (c / b), and Z == ((c % b) // b), and substitute:
     *
     *     [d.01]  X > Y + Z <==> X > Y
     *     [d.02]  a > ((M / b) * 2) + (c / b) + ((c % b) // b) <==> a > ((M / b) * 2) + (c / b)
     *
     * Drop the last term of the right-hand side:
     *
     *     [a.11]  a > ((M / b) * 2) + (c / b) + ((c % b) // b)
     *     [a.12]  a > ((M / b) * 2) + (c / b)
     *
     * Substitute for c:
     *
     *     [a.13]  a > ((M / b) * 2) + ((((M % b) * 2) + 1) / b)
     *
     * The first term of the right-hand side is clearly non-negative.
     * Determine the upper bound for the first term of the right-hand side (note that the least possible value of b == 2 produces the greatest possible value of (M / b) * 2):
     *
     *     [f.01]  (M / b) * 2 <= (M / 2) * 2
     *
     * Assume the following tautology for odd integer X:
     *
     *     [g.01]  (X / 2) * 2 == X - 1
     *
     * Let X == M and substitute:
     *
     *     [g.02]  (M / 2) * 2 == M - 1
     *
     * Substitute for (M / 2) * 2:
     *
     *     [f.01]  (M / b) * 2 <= (M / 2) * 2
     *     [f.02]  (M / b) * 2 <= M - 1
     *
     * The second term of the right-hand side is clearly non-negative.
     * Determine the upper bound for the second term of the right-hand side (note that the <= relation is preserved across positive integer division):
     *
     *     [h.01]  M % b < b
     *     [h.02]  M % b <= b - 1
     *     [h.03]  (M % b) * 2 <= (b - 1) * 2
     *     [h.04]  ((M % b) * 2) + 1 <= (b * 2) - 1
     *     [h.05]  (((M % b) * 2) + 1) / b <= ((b * 2) - 1) / b
     *     [h.06]  (((M % b) * 2) + 1) / b <= 1
     *
     * Since the upper bound of the first term is M - 1, and the upper bound of the second term is 1, the upper bound of the right-hand side is M.
     * Each side of the comparison has been re-written such that signed overflow will not occur.
     */
    final boolean unsignedMultiplyOverflows = (a > ((Long.MAX_VALUE / b) * 2L) + ((((Long.MAX_VALUE % b) * 2L) + 1L) / b));
    return unsignedMultiplyOverflows;
}

【讨论】：

【解决方案2】：

如果从确定哪个值更大开始，就会有一些较大和较小的值来保证溢出可以或不会发生。假设 X 较大； Y 更小。

如果 X 小于 2^31，或者 Y 小于 2，则不可能溢出；否则，如果 X 大于 2^62 或 Y 不小于 2^32，则肯定会溢出。如果任一条件适用，则返回。

否则，由于 X 的下限，已知 V=(X>>31)>31 小于 2^31 Y 小于 2^32，T=(V>>31)*Y（也等于 (X>>31)*Y) 可以不用计算溢出。因为V是2^31的倍数，T也等于(V*Y)>>31，所以我们知道T在(X*Y)>>31和(X*Y)>>32之间。

如果 T 小于 2^31，则 X*Y 必须小于 2^63 并且不可能溢出。如果 T 不小于 2^32，则 X*Y 必须至少为 2^63，并且肯定会溢出。

如果两个条件都不适用，则乘积将在 2^62 到 2^64 的范围内。溢出可以通过直接进行乘法并检查结果的符号来确定。与 C 中带符号整数溢出产生未定义行为不同，Java 保证如果 x 和 y 为正且 x*y 小于 2^64，则算术溢出将产生负结果。

总而言之，代码应该从对 X 和 Y 进行排序开始，然后进行四次比较和条件返回。如果它们都没有产生决定性的结果，它可能会计算 (X>>31)*Y 并再进行两次比较。如果这些没有产生决定性的结果，则再进行一次乘法和测试将产生最终答案，在最坏的情况下，使用八次比较、一次移位和两次乘法（如果 X 和 Y 的等级未知，则添加另一个比较以对其进行排名）。

请注意，如果原始数字可能为负数，则需要进行更多检查以处理一些额外情况。不过，上述方法应该比需要一个或多个部门的方法更快。

【讨论】：

不错。我必须仔细检查边界，但它似乎有效。大范围输入的短路返回也很好，这完全避免了大部分计算。（更通用的结果可以使用 X 和 Y 的前导零的数量，而不是 V 和 T 的中间计算，但单独计算前导零的数量可能比您的解决方案更多周期.) 不要担心负面情况；如问题中所述，这些都是微不足道的。
@NathanRyan：使用处理器和框架计算前导零的速度比乘法的速度快，因此可以避免乘法步骤。尽管有些处理器包含快速计数前导零指令，但我不知道有什么方法可以通过 Java 来利用它们。在某些平台上，将两个参数都转换为 float 或 double 并将它们相乘可能是最快的方法；如果结果远小于 2^63，则整数结果不会溢出，如果更大，则肯定会溢出。当它接近 2^63 时，整数结果要么正确，要么为负。
在 Java 中计算前导零数量的“正确”方法是 Long.numberOfLeadingZeros(long)，它使用基线 Hacker's Delight 实现。利用特定于处理器的优化的唯一方法是通过 JNI，即使在 JVM 优化期间由 JIT 编译器内联之后，它也会产生比它可能的价值更多的开销。不过，我喜欢 cast-and-verify 解决方案。我猜单个倍数可能只需要大约 10 个时钟周期，而且它的实现很容易理解。
@NathanRyan：如果你做cast-to-float方法，9.223373E+18以上的产品应该被认为是溢出；低于 9.223371E+18 的一个应该被认为是安全的。真正需要测试的是中间的几个值。
感谢您的详细说明。

【解决方案3】：

编辑
正如我的一个 cmets 对原始帖子所承诺的那样，我现在将发布严格和正确的解决方案。与 Nathan 自己的公式相比，它的划分更少（对于那些感兴趣的人，请参阅他的答案中的最后一行代码），但它有额外的分支，所以我不确定它在性能方面会更好。
而且，唉，它不是单行的。这里是：

    static boolean unsignedMultiplyOverflows(final long a, final long b) {
        if (a == 0 || b == 0) {
            return false;
        }

        // now proceed with non-zero input
        // we branch based upon parity of a and b
        final long aHalf = a >>> 1;
        final long bHalf = b >>> 1;
        final byte aLastBit = (byte) (a & 1);
        final byte bLastBit = (byte) (b & 1);
        if (aLastBit == 0) { // a = 2 * aHalf, meaning unsigned representation of a
            return Long.MAX_VALUE / b < aHalf;
        } else if (bLastBit == 0) { // b = 2 * bHalf
            return Long.MAX_VALUE / a < bHalf; // symmetrical to previous case
        } else { // a = 2 * aHalf + 1; b = 2 * bHalf + 1
            return (Long.MAX_VALUE - bHalf) / b < aHalf;
        }
    }

正式证明是基于对2个案例的调查，1.乘数中至少有一个是偶数，2.a,b都是奇数。如果有人感兴趣，我可以添加它。
我已经对全范围的字符进行了单元测试：0 ~ 0xffff 用于 16 位数字的溢出，以及一些随机的long 输入，将结果与 Nathan 的方法和BigInteger 解决方案进行比较作为参考。

希望对您有所帮助。

【讨论】：

检查阴性是不够的。
好吧，问题只是关于溢出unsigned 表示。我同意问题中的陈述，即“请注意，任何负值乘以 0 或 1 以外的任何值都会溢出未签名的产品”
我的观点是，一对正数（有符号或无符号）的乘法溢出不会必然设置最高位。换句话说，您的测试同时给出了误报和误报。不是吗？
@StephenC 不，它没有。它会检查 unsigned 0xfffffffffffffff 除以a 是否小于b，这意味着64 位不足以容纳a * b。但是，在我编辑时，有些结果不准确，我会在上班时尝试更新更好的解决方案。
@kiruwka 已接受，尽管我认为我们的两个答案都足够复杂，正确的做法是使用 BigInteger 来实现，但该死的性能。我没有做任何类似于计时测试的事情，尽管我怀疑 BigInteger 方法会更慢。尽管如此，我的总体理念是实现简单的方法，并在性能成为问题时进行分析和重新实现。