【发布时间】:2014-09-16 18:33:53
【问题描述】:
在尝试 JDK 8 Streaming 功能时,我决定尝试并行/串行流性能测试。我尝试使用在单位正方形上投掷随机飞镖并检查单位圆内有多少土地来解决 pi 的值。我找到了 apache-spark 的示例。
这里是代码。
package org.sample;
import java.util.concurrent.TimeUnit;
import java.util.stream.IntStream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class MyBenchmark {
@Param({
"1000000",
"10000000"
}) int MAX_COUNT;
@Benchmark
public double parallelPiTest() {
long count = IntStream.range(1, MAX_COUNT).parallel().filter(i -> {
double x= Math.random();
double y= Math.random();
return (x*x + y* y) < 1.0 ;
}).count();
double pi = 4 * count * 1.0 /MAX_COUNT;
return pi;
}
@Benchmark
public double sequentialPiTest() {
long count = IntStream.range(1, MAX_COUNT).filter(i -> {
double x= Math.random();
double y= Math.random();
return (x*x + y* y) < 1.0 ;
}).count();
double pi = 4 * count * 1.0 /MAX_COUNT;
return pi;
}
在我的 8 核机器(Windows 7 笔记本电脑)上进行简单测试,并行执行时间几乎是串行执行时间的 5 倍,所有核心的 CPU 利用率几乎都在 100%。另一方面,串行使用了大约 20% 的内核!由于结果令人困惑,我使用 JMH(上面的代码)和 JunitBenchmarks 尝试了基准测试。结果与串行执行几乎一致,总是比并行执行好 5 倍。我也尝试了 100 次迭代,但结果仍然与下面的 5 次迭代相似。我在这里错过了一些基本的东西吗?
JMH 基准测试结果:
C:\Users\local\lunaeeworkspace\benchmarktest>mvn clean install
"******::" C:\Progra~1\Java\jdk1.8.0_20
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Auto-generated JMH benchmark 1.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ benchmarktest ---
[INFO] Deleting C:\Users\local\lunaeeworkspace\benchmarktest\target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ benchmarktest ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Users\local\lunaeeworkspace\benchmarktest\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ benchmarktest ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 1 source file to C:\Users\local\lunaeeworkspace\benchmarktest\target\classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ benchmarktest ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Users\local\lunaeeworkspace\benchmarktest\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ benchmarktest ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ benchmarktest ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ benchmarktest ---
[INFO] Building jar: C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarktest-1.0.jar
[INFO]
[INFO] --- maven-shade-plugin:2.2:shade (default) @ benchmarktest ---
[INFO] Including org.openjdk.jmh:jmh-core:jar:1.1 in the shaded jar.
[INFO] Including net.sf.jopt-simple:jopt-simple:jar:4.6 in the shaded jar.
[INFO] Including org.apache.commons:commons-math3:jar:3.2 in the shaded jar.
[INFO] Replacing C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarks.jar with C:\Users\local\lunaeework
space\benchmarktest\target\benchmarktest-1.0-shaded.jar
[INFO]
[INFO] --- maven-install-plugin:2.5.1:install (default-install) @ benchmarktest ---
[INFO] Installing C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarktest-1.0.jar to C:\Users\local\.m2\
repository\org\sample\benchmarktest\1.0\benchmarktest-1.0.jar
[INFO] Installing C:\Users\local\lunaeeworkspace\benchmarktest\pom.xml to C:\Users\local\.m2\repository\org\sample
\benchmarktest\1.0\benchmarktest-1.0.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.070 s
[INFO] Finished at: 2014-09-15T13:25:03-07:00
[INFO] Final Memory: 22M/221M
[INFO] ------------------------------------------------------------------------
C:\Users\local\lunaeeworkspace\benchmarktest>java -jar target/benchmarks.jar
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.parallelPiTest
# Parameters: (MAX_COUNT = 1000000)
# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration 1: 2810219990.000 ns/op
# Warmup Iteration 2: 679604930.000 ns/op
# Warmup Iteration 3: 708517299.500 ns/op
# Warmup Iteration 4: 613861141.500 ns/op
# Warmup Iteration 5: 747273386.500 ns/op
Iteration 1: 636085288.500 ns/op
Iteration 2: 726300915.500 ns/op
Iteration 3: 720032270.000 ns/op
Iteration 4: 758523073.500 ns/op
Iteration 5: 776964284.500 ns/op
Result: 723581166.400 ¦(99.9%) 208666306.733 ns/op [Average]
Statistics: (min, avg, max) = (636085288.500, 723581166.400, 776964284.500), stdev = 54189977.210
Confidence interval (99.9%): [514914859.667, 932247473.133]
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.parallelPiTest
# Parameters: (MAX_COUNT = 10000000)
# Run progress: 25.00% complete, ETA 00:00:52
# Fork: 1 of 1
# Warmup Iteration 1: 9589247518.000 ns/op
# Warmup Iteration 2: 8049867519.000 ns/op
# Warmup Iteration 3: 7864790757.000 ns/op
# Warmup Iteration 4: 7766442122.000 ns/op
# Warmup Iteration 5: 7723210219.000 ns/op
Iteration 1: 7525308107.000 ns/op
Iteration 2: 8067847130.000 ns/op
Iteration 3: 7647547652.000 ns/op
Iteration 4: 6964833740.000 ns/op
Iteration 5: 7471811305.000 ns/op
Result: 7535469586.800 ¦(99.9%) 1523035846.762 ns/op [Average]
Statistics: (min, avg, max) = (6964833740.000, 7535469586.800, 8067847130.000), stdev = 395527572.797
Confidence interval (99.9%): [6012433740.038, 9058505433.562]
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.sequentialPiTest
# Parameters: (MAX_COUNT = 1000000)
# Run progress: 50.00% complete, ETA 00:01:37
# Fork: 1 of 1
# Warmup Iteration 1: 208653523.167 ns/op
# Warmup Iteration 2: 171440852.571 ns/op
# Warmup Iteration 3: 176369103.714 ns/op
# Warmup Iteration 4: 172637171.571 ns/op
# Warmup Iteration 5: 168770237.714 ns/op
Iteration 1: 171262591.714 ns/op
Iteration 2: 168976818.714 ns/op
Iteration 3: 174889950.143 ns/op
Iteration 4: 171272031.714 ns/op
Iteration 5: 167857761.571 ns/op
Result: 170851830.771 ¦(99.9%) 10391714.091 ns/op [Average]
Statistics: (min, avg, max) = (167857761.571, 170851830.771, 174889950.143), stdev = 2698695.149
Confidence interval (99.9%): [160460116.681, 181243544.862]
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.sequentialPiTest
# Parameters: (MAX_COUNT = 10000000)
# Run progress: 75.00% complete, ETA 00:00:37
# Fork: 1 of 1
# Warmup Iteration 1: 1898167075.000 ns/op
# Warmup Iteration 2: 1734706264.000 ns/op
# Warmup Iteration 3: 1705265893.000 ns/op
# Warmup Iteration 4: 1704804614.000 ns/op
# Warmup Iteration 5: 1781362794.000 ns/op
Iteration 1: 1725992648.000 ns/op
Iteration 2: 1721125803.000 ns/op
Iteration 3: 1714455544.000 ns/op
Iteration 4: 1719110033.000 ns/op
Iteration 5: 1719564255.000 ns/op
Result: 1720049656.600 ¦(99.9%) 15980153.846 ns/op [Average]
Statistics: (min, avg, max) = (1714455544.000, 1720049656.600, 1725992648.000), stdev = 4149995.207
Confidence interval (99.9%): [1704069502.754, 1736029810.446]
# Run complete. Total time: 00:02:10
Benchmark (MAX_COUNT) Mode Samples Score Score error Units
o.s.MyBenchmark.parallelPiTest 1000000 avgt 5 723581166.400 208666306.733 ns/op
o.s.MyBenchmark.parallelPiTest 10000000 avgt 5 7535469586.800 1523035846.762 ns/op
o.s.MyBenchmark.sequentialPiTest 1000000 avgt 5 170851830.771 10391714.091 ns/op
o.s.MyBenchmark.sequentialPiTest 10000000 avgt 5 1720049656.600 15980153.846 ns/op
【问题讨论】:
-
好像我发现了我的问题。 Math.random() 是全局线程安全的,因此并行执行必然会在生成随机数时产生严重的争用。我对 ThreadLocalRandom.current().nextDouble() 重新做了同样的事情,现在并行执行的结果比顺序执行快 3 倍。
标签: java parallel-processing java-8 java-stream microbenchmark