【发布时间】:2013-07-03 17:01:37
【问题描述】:
我已经使用 openMP 在 C++ 中实现了一个凸壳算法。
代码可以在这里找到:http://codepad.org/VVQdSdfM
Below are the results when tested in my Mac Book Pro:
Processor Name: Intel Core i5
Processor Speed: 2.5 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 4 GB
处理器运行代码所需的时间:
With two Threads:
(here size represents the number of points in the input and time in Seconds)
Average Sequential Time Elapsed in seconds for size:10=8.29697e-06
Average Parallel Time Elapsed in seconds for size:10=5.0807e-05
Average Sequential Time Elapsed in seconds for size:100=5.18084e-05
Average Parallel Time Elapsed in seconds for size:100=8.13007e-05
Average Sequential Time Elapsed in seconds for size:1000=0.000471377
Average Parallel Time Elapsed in seconds for size:1000=0.000283003
Average Sequential Time Elapsed in seconds for size:10000=0.00483506
Average Parallel Time Elapsed in seconds for size:10000=0.0032198
Average Sequential Time Elapsed in seconds for size:100000=0.0471328
Average Parallel Time Elapsed in seconds for size:100000=0.0333489
Average Sequential Time Elapsed in seconds for size:1000000=0.460131
Average Parallel Time Elapsed in seconds for size:1000000=0.267305
With four threads:
Average Sequential Time Elapsed in seconds for size:10=1.00136e-05
Average Parallel Time Elapsed in seconds for size:10=0.000106597
Average Sequential Time Elapsed in seconds for size:100=5.91993e-05
Average Parallel Time Elapsed in seconds for size:100=0.000114727
Average Sequential Time Elapsed in seconds for size:1000=0.000503755
Average Parallel Time Elapsed in seconds for size:1000=0.000302839
Average Sequential Time Elapsed in seconds for size:10000=0.00478158
Average Parallel Time Elapsed in seconds for size:10000=0.00235724
Average Sequential Time Elapsed in seconds for size:100000=0.0465738
Average Parallel Time Elapsed in seconds for size:100000=0.0223478
Average Sequential Time Elapsed in seconds for size:1000000=0.466074
Average Parallel Time Elapsed in seconds for size:1000000=0.221905
我在我的 CPU 活动监视器中找到了四个插槽,我知道这个版本的英特尔处理器支持超线程。
如果是这样的话,我不应该在使用 4 个线程时获得 4 的加速吗?
请提供任何可以帮助我使用英特尔处理器中的超线程功能的建议。
谢谢, 维杰
【问题讨论】:
-
我认为这里的一个重要问题是“凸包算法是否适合线性加速的并行化?” (我不知道答案;只是想知道)另外,我没有提到哪个是有问题的算法(似乎有几种凸包算法)。而且也没有代码,所以如果我们不知道你有什么,我们怎么知道我们建议的内容是对你有什么的改进?
-
是的,该算法需要线性时间。该算法在点的排序列表上运行。
-
我对所讨论的算法一无所知,但我已经看到算法在较小的硬件上运行得更快,因为程序和硬件相互“匹配”得更好。例如,如果一台计算机的处理器性能较差,但由于内存布局而导致缓存未命中率较低,那么它的运行速度会更快。只是想一想。
-
我在 4 核机器上运行了相同的代码,并观察到在 Linux 操作系统上的速度提高了 3.7
-
由于代码在 4 核处理器上运行速度快了 3.7 倍,这消除了算法无法扩展到 2 核以上的可能性。超线程并不等同于额外的内核。
标签: c++ macos openmp processor hyperthreading