让我首先通过引用 Donald Knuth 的文章“使用 go to 语句进行结构化编程”来重复其他人已经说过的话:
我们应该忘记小的效率,比如大约 97% 的时间:过早优化是万恶之源。
然而,我们不应该放弃那关键的 3% 的机会。一个好的程序员不会因为这样的推理而自满,他会明智地仔细查看关键代码;但只有在识别出该代码之后。
(我强调)
因此,如果您发现像计算三个数字的最小值这样看似微不足道的操作是应用程序中的实际瓶颈(即“关键 3%”),那么您可以考虑优化一下。
而在这种情况下,这实际上是可能的:Java 中的Math#min(double,double) 方法具有非常特殊的语义:
返回两个双精度值中较小的一个。也就是说,结果是更接近负无穷大的值。如果参数具有相同的值,则结果是相同的值。如果任一值为 NaN,则结果为 NaN。与数值比较运算符不同,此方法认为负零严格小于正零。如果一个参数为正零而另一个为负零,则结果为负零。
可以看一下实现,发现它实际上相当复杂:
public static double min(double a, double b) {
if (a != a)
return a; // a is NaN
if ((a == 0.0d) &&
(b == 0.0d) &&
(Double.doubleToRawLongBits(b) == negativeZeroDoubleBits)) {
// Raw conversion ok since NaN can't map to -0.0.
return b;
}
return (a <= b) ? a : b;
}
现在,指出这种行为与简单比较不同可能很重要。这可以通过以下示例轻松检查:
public class MinExample
{
public static void main(String[] args)
{
test(0.0, 1.0);
test(1.0, 0.0);
test(-0.0, 0.0);
test(Double.NaN, 1.0);
test(1.0, Double.NaN);
}
private static void test(double a, double b)
{
double minA = Math.min(a, b);
double minB = a < b ? a : b;
System.out.println("a: "+a);
System.out.println("b: "+b);
System.out.println("minA "+minA);
System.out.println("minB "+minB);
if (Double.doubleToRawLongBits(minA) !=
Double.doubleToRawLongBits(minB))
{
System.out.println(" -> Different results!");
}
System.out.println();
}
}
但是:如果NaN 和正/负零的处理与您的应用程序无关,您可以将基于Math.min 的解决方案替换为基于简单比较的解决方案,并查看是否它有所作为。
当然,这将取决于应用程序。这是一个简单的人工微基准测试(持保留态度!)
import java.util.Random;
public class MinPerformance
{
public static void main(String[] args)
{
bench();
}
private static void bench()
{
int runs = 1000;
for (int size=10000; size<=100000; size+=10000)
{
Random random = new Random(0);
double data[] = new double[size];
for (int i=0; i<size; i++)
{
data[i] = random.nextDouble();
}
benchA(data, runs);
benchB(data, runs);
}
}
private static void benchA(double data[], int runs)
{
long before = System.nanoTime();
double sum = 0;
for (int r=0; r<runs; r++)
{
for (int i=0; i<data.length-3; i++)
{
sum += minA(data[i], data[i+1], data[i+2]);
}
}
long after = System.nanoTime();
System.out.println("A: length "+data.length+", time "+(after-before)/1e6+", result "+sum);
}
private static void benchB(double data[], int runs)
{
long before = System.nanoTime();
double sum = 0;
for (int r=0; r<runs; r++)
{
for (int i=0; i<data.length-3; i++)
{
sum += minB(data[i], data[i+1], data[i+2]);
}
}
long after = System.nanoTime();
System.out.println("B: length "+data.length+", time "+(after-before)/1e6+", result "+sum);
}
private static double minA(double a, double b, double c)
{
return Math.min(a, Math.min(b, c));
}
private static double minB(double a, double b, double c)
{
if (a < b)
{
if (a < c)
{
return a;
}
return c;
}
if (b < c)
{
return b;
}
return c;
}
}
(免责声明:Java 中的微基准测试是一门艺术,为了获得更可靠的结果,应该考虑使用JMH 或Caliper)。
使用 JRE 1.8.0_31 运行它可能会导致类似
....
A: length 90000, time 545.929078, result 2.247805342620906E7
B: length 90000, time 441.999193, result 2.247805342620906E7
A: length 100000, time 608.046928, result 2.5032781001456387E7
B: length 100000, time 493.747898, result 2.5032781001456387E7
这至少表明可能会在这里挤出几个百分点(再次,在一个非常人为的例子中)。
通过查看创建的热点反汇编输出来进一步分析
java -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly MinPerformance
可以看到minA和minB这两种方法的优化版本。
首先,使用Math.min的方法的输出:
Decoding compiled method 0x0000000002992310:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000000001c010910} 'minA' '(DDD)D' in 'MinPerformance'
# parm0: xmm0:xmm0 = double
# parm1: xmm1:xmm1 = double
# parm2: xmm2:xmm2 = double
# [sp+0x60] (sp of caller)
0x0000000002992480: mov %eax,-0x6000(%rsp)
0x0000000002992487: push %rbp
0x0000000002992488: sub $0x50,%rsp
0x000000000299248c: movabs $0x1c010cd0,%rsi
0x0000000002992496: mov 0x8(%rsi),%edi
0x0000000002992499: add $0x8,%edi
0x000000000299249c: mov %edi,0x8(%rsi)
0x000000000299249f: movabs $0x1c010908,%rsi ; {metadata({method} {0x000000001c010910} 'minA' '(DDD)D' in 'MinPerformance')}
0x00000000029924a9: and $0x3ff8,%edi
0x00000000029924af: cmp $0x0,%edi
0x00000000029924b2: je 0x00000000029924e8 ;*dload_0
; - MinPerformance::minA@0 (line 58)
0x00000000029924b8: vmovsd %xmm0,0x38(%rsp)
0x00000000029924be: vmovapd %xmm1,%xmm0
0x00000000029924c2: vmovapd %xmm2,%xmm1 ;*invokestatic min
; - MinPerformance::minA@4 (line 58)
0x00000000029924c6: nop
0x00000000029924c7: callq 0x00000000028c6360 ; OopMap{off=76}
;*invokestatic min
; - MinPerformance::minA@4 (line 58)
; {static_call}
0x00000000029924cc: vmovapd %xmm0,%xmm1 ;*invokestatic min
; - MinPerformance::minA@4 (line 58)
0x00000000029924d0: vmovsd 0x38(%rsp),%xmm0 ;*invokestatic min
; - MinPerformance::minA@7 (line 58)
0x00000000029924d6: nop
0x00000000029924d7: callq 0x00000000028c6360 ; OopMap{off=92}
;*invokestatic min
; - MinPerformance::minA@7 (line 58)
; {static_call}
0x00000000029924dc: add $0x50,%rsp
0x00000000029924e0: pop %rbp
0x00000000029924e1: test %eax,-0x27623e7(%rip) # 0x0000000000230100
; {poll_return}
0x00000000029924e7: retq
0x00000000029924e8: mov %rsi,0x8(%rsp)
0x00000000029924ed: movq $0xffffffffffffffff,(%rsp)
0x00000000029924f5: callq 0x000000000297e260 ; OopMap{off=122}
;*synchronization entry
; - MinPerformance::minA@-1 (line 58)
; {runtime_call}
0x00000000029924fa: jmp 0x00000000029924b8
0x00000000029924fc: nop
0x00000000029924fd: nop
0x00000000029924fe: mov 0x298(%r15),%rax
0x0000000002992505: movabs $0x0,%r10
0x000000000299250f: mov %r10,0x298(%r15)
0x0000000002992516: movabs $0x0,%r10
0x0000000002992520: mov %r10,0x2a0(%r15)
0x0000000002992527: add $0x50,%rsp
0x000000000299252b: pop %rbp
0x000000000299252c: jmpq 0x00000000028ec620 ; {runtime_call}
0x0000000002992531: hlt
0x0000000002992532: hlt
0x0000000002992533: hlt
0x0000000002992534: hlt
0x0000000002992535: hlt
0x0000000002992536: hlt
0x0000000002992537: hlt
0x0000000002992538: hlt
0x0000000002992539: hlt
0x000000000299253a: hlt
0x000000000299253b: hlt
0x000000000299253c: hlt
0x000000000299253d: hlt
0x000000000299253e: hlt
0x000000000299253f: hlt
[Stub Code]
0x0000000002992540: nop ; {no_reloc}
0x0000000002992541: nop
0x0000000002992542: nop
0x0000000002992543: nop
0x0000000002992544: nop
0x0000000002992545: movabs $0x0,%rbx ; {static_stub}
0x000000000299254f: jmpq 0x000000000299254f ; {runtime_call}
0x0000000002992554: nop
0x0000000002992555: movabs $0x0,%rbx ; {static_stub}
0x000000000299255f: jmpq 0x000000000299255f ; {runtime_call}
[Exception Handler]
0x0000000002992564: callq 0x000000000297b9e0 ; {runtime_call}
0x0000000002992569: mov %rsp,-0x28(%rsp)
0x000000000299256e: sub $0x80,%rsp
0x0000000002992575: mov %rax,0x78(%rsp)
0x000000000299257a: mov %rcx,0x70(%rsp)
0x000000000299257f: mov %rdx,0x68(%rsp)
0x0000000002992584: mov %rbx,0x60(%rsp)
0x0000000002992589: mov %rbp,0x50(%rsp)
0x000000000299258e: mov %rsi,0x48(%rsp)
0x0000000002992593: mov %rdi,0x40(%rsp)
0x0000000002992598: mov %r8,0x38(%rsp)
0x000000000299259d: mov %r9,0x30(%rsp)
0x00000000029925a2: mov %r10,0x28(%rsp)
0x00000000029925a7: mov %r11,0x20(%rsp)
0x00000000029925ac: mov %r12,0x18(%rsp)
0x00000000029925b1: mov %r13,0x10(%rsp)
0x00000000029925b6: mov %r14,0x8(%rsp)
0x00000000029925bb: mov %r15,(%rsp)
0x00000000029925bf: movabs $0x515db148,%rcx ; {external_word}
0x00000000029925c9: movabs $0x2992569,%rdx ; {internal_word}
0x00000000029925d3: mov %rsp,%r8
0x00000000029925d6: and $0xfffffffffffffff0,%rsp
0x00000000029925da: callq 0x00000000512a9020 ; {runtime_call}
0x00000000029925df: hlt
[Deopt Handler Code]
0x00000000029925e0: movabs $0x29925e0,%r10 ; {section_word}
0x00000000029925ea: push %r10
0x00000000029925ec: jmpq 0x00000000028c7340 ; {runtime_call}
0x00000000029925f1: hlt
0x00000000029925f2: hlt
0x00000000029925f3: hlt
0x00000000029925f4: hlt
0x00000000029925f5: hlt
0x00000000029925f6: hlt
0x00000000029925f7: hlt
可以看出,处理特殊情况需要付出一些努力——与使用简单比较的输出相比,这相当简单:
Decoding compiled method 0x0000000002998790:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000000001c0109c0} 'minB' '(DDD)D' in 'MinPerformance'
# parm0: xmm0:xmm0 = double
# parm1: xmm1:xmm1 = double
# parm2: xmm2:xmm2 = double
# [sp+0x20] (sp of caller)
0x00000000029988c0: sub $0x18,%rsp
0x00000000029988c7: mov %rbp,0x10(%rsp) ;*synchronization entry
; - MinPerformance::minB@-1 (line 63)
0x00000000029988cc: vucomisd %xmm0,%xmm1
0x00000000029988d0: ja 0x00000000029988ee ;*ifge
; - MinPerformance::minB@3 (line 63)
0x00000000029988d2: vucomisd %xmm1,%xmm2
0x00000000029988d6: ja 0x00000000029988de ;*ifge
; - MinPerformance::minB@22 (line 71)
0x00000000029988d8: vmovapd %xmm2,%xmm0
0x00000000029988dc: jmp 0x00000000029988e2
0x00000000029988de: vmovapd %xmm1,%xmm0 ;*synchronization entry
; - MinPerformance::minB@-1 (line 63)
0x00000000029988e2: add $0x10,%rsp
0x00000000029988e6: pop %rbp
0x00000000029988e7: test %eax,-0x27688ed(%rip) # 0x0000000000230000
; {poll_return}
0x00000000029988ed: retq
0x00000000029988ee: vucomisd %xmm0,%xmm2
0x00000000029988f2: ja 0x00000000029988e2 ;*ifge
; - MinPerformance::minB@10 (line 65)
0x00000000029988f4: vmovapd %xmm2,%xmm0
0x00000000029988f8: jmp 0x00000000029988e2
0x00000000029988fa: hlt
0x00000000029988fb: hlt
0x00000000029988fc: hlt
0x00000000029988fd: hlt
0x00000000029988fe: hlt
0x00000000029988ff: hlt
[Exception Handler]
[Stub Code]
0x0000000002998900: jmpq 0x00000000028ec920 ; {no_reloc}
[Deopt Handler Code]
0x0000000002998905: callq 0x000000000299890a
0x000000000299890a: subq $0x5,(%rsp)
0x000000000299890f: jmpq 0x00000000028c7340 ; {runtime_call}
0x0000000002998914: hlt
0x0000000002998915: hlt
0x0000000002998916: hlt
0x0000000002998917: hlt
很难说这种优化是否真的会对应用程序产生影响。但至少,底线是:
-
Math#min(double,double)方法不和简单的比较一样,特殊情况的处理也不是免费的
- 在某些情况下,
Math#min 所做的特殊情况处理不是必需的,然后基于比较的方法可能更有效
- 正如其他答案中已经指出的那样:在大多数情况下,性能差异并不重要。然而,对于这个特定的例子,一个人可能应该创建一个实用方法
min(double,double,double),为了更好的方便和可读性,然后很容易用不同的实现进行两次运行,看看它是否真的影响性能。
(旁注:整数类型的方法,例如Math.min(int,int) 实际上是一个简单的比较 - 所以我希望这些没有差异)。