【发布时间】:2011-12-20 00:54:01
【问题描述】:
编辑:我的样本量太小。当我在 8 个 CPU 上运行真实数据时,我看到速度提高了 7.2 倍。在我的代码中添加 4 个字符并不太破旧;)
我目前正在尝试向管理层“推销”使用 Scala 的好处,尤其是在使用 CPU 进行扩展时。为此,我创建了一个简单的测试应用程序,该应用程序执行大量矢量数学运算,并且有点惊讶地发现运行时在我的四核机器上并没有明显好转。有趣的是,我发现运行时在您第一次浏览集合时是最差的,并且在随后的调用中变得更好。并行集合中是否有一些懒惰的东西导致了这种情况,或者我只是做错了?应该注意的是,我来自 C++/C# 世界,所以我完全有可能以某种方式搞砸了我的配置。无论如何,这是我的设置:
InteliJ Scala 插件
Scala 2.9.1.final
Windows 7 64 位,四核处理器(无超线程)
import util.Random
// simple Vector3D class that has final x,y,z components a length, and a '-' function
class Vector3D(val x:Double, val y:Double, val z:Double)
{
def length = math.sqrt(x*x+y*y+z*z)
def -(rhs : Vector3D ) = new Vector3D(x - rhs.x, y - rhs.y, z - rhs.z)
}
object MainClass {
def main(args : Array[String]) =
{
println("Available CPU's: " + Runtime.getRuntime.availableProcessors())
println("Parallelism Degree set to: " + collection.parallel.ForkJoinTasks.defaultForkJoinPool.getParallelism);
// my position
val myPos = new Vector3D(0,0,0);
val r = new Random(0);
// define a function nextRand that gets us a random between 0 and 100
def nextRand = r.nextDouble() * 100;
// make 10 million random targets
val targets = (0 until 10000000).map(_ => new Vector3D(nextRand, nextRand, nextRand)).toArray
// take the .par hit before we start profiling
val parTargets = targets.par
println("Created " + targets.length + " vectors")
// define a range function
val rangeFunc : (Vector3D => Double) = (targetPos) => (targetPos - myPos).length
// we'll select ones that are <50
val within50 : (Vector3D => Boolean) = (targetPos) => rangeFunc(targetPos) < 50
// time it sequentially
val startTime_sequential = System.currentTimeMillis()
val numTargetsInRange_sequential = targets.filter(within50)
val endTime_sequential = System.currentTimeMillis()
println("Sequential (ms): " + (endTime_sequential - startTime_sequential))
// do the parallel version 10 times
for(i <- 1 to 10)
{
val startTime_par = System.currentTimeMillis()
val numTargetsInRange_parallel = parTargets.filter(within50)
val endTime_par = System.currentTimeMillis()
val ms = endTime_par - startTime_par;
println("Iteration[" + i + "] Executed in " + ms + " ms")
}
}
}
这个程序的输出是:
Available CPU's: 4
Parallelism Degree set to: 4
Created 10000000 vectors
Sequential (ms): 216
Iteration[1] Executed in 227 ms
Iteration[2] Executed in 253 ms
Iteration[3] Executed in 76 ms
Iteration[4] Executed in 78 ms
Iteration[5] Executed in 77 ms
Iteration[6] Executed in 80 ms
Iteration[7] Executed in 78 ms
Iteration[8] Executed in 78 ms
Iteration[9] Executed in 79 ms
Iteration[10] Executed in 82 ms
那么这里发生了什么?我们做过滤器的前 2 次,它变慢了,然后速度加快了?我知道并行性启动成本是固有的,我只是想弄清楚在我的应用程序中表达并行性的意义,特别是我希望能够向管理人员展示一个运行 3-4 次的程序在四核盒子上更快。这不是一个好问题吗?
想法?
【问题讨论】:
-
如果您正在寻找有关如何销售管理的一些想法,您可以查看scala-boss.heroku.com/#1(使用箭头键查看下一张幻灯片)。
-
一般来说,并行数组优于并行向量,至少在将 concats 添加到向量之前是这样。
-
@huynhjl - 当我看到前两部漫画中描绘的我的生活时,我知道这种展示是值得的。谢谢!
标签: scala runtime scalability multicore parallel-processing