【发布时间】:2016-03-01 06:04:49
【问题描述】:
对 hash 进行了简单的性能测试,似乎 C++ 版本比 perl 版本和 golang 版本都慢。
- perl 版本耗时约 200 毫秒,
- C++ 版本耗时 280 毫秒。
- golang 版本耗时 56 毫秒。
在我的电脑上使用 Core(TM) i7-2670QM CPU @ 2.20GHz,Ubuntu 14.04.3LTS,
有什么想法吗?
perl 版本
use Time::HiRes qw( usleep ualarm gettimeofday tv_interval nanosleep
clock_gettime clock_getres clock_nanosleep clock
stat );
sub getTS {
my ($seconds, $microseconds) = gettimeofday;
return $seconds + (0.0+ $microseconds)/1000000.0;
}
my %mymap;
$mymap{"U.S."} = "Washington";
$mymap{"U.K."} = "London";
$mymap{"France"} = "Paris";
$mymap{"Russia"} = "Moscow";
$mymap{"China"} = "Beijing";
$mymap{"Germany"} = "Berlin";
$mymap{"Japan"} = "Tokyo";
$mymap{"China"} = "Beijing";
$mymap{"Italy"} = "Rome";
$mymap{"Spain"} = "Madrad";
$x = "";
$start = getTS();
for ($i=0; $i<1000000; $i++) {
$x = $mymap{"China"};
}
printf "took %f sec\n", getTS() - $start;
C++ 版本
#include <iostream>
#include <string>
#include <unordered_map>
#include <sys/time.h>
double getTS() {
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + tv.tv_usec/1000000.0;
}
using namespace std;
int main () {
std::unordered_map<std::string,std::string> mymap;
// populating container:
mymap["U.S."] = "Washington";
mymap["U.K."] = "London";
mymap["France"] = "Paris";
mymap["Russia"] = "Moscow";
mymap["China"] = "Beijing";
mymap["Germany"] = "Berlin";
mymap["Japan"] = "Tokyo";
mymap["China"] = "Beijing";
mymap["Italy"] = "Rome";
mymap["Spain"] = "Madrad";
double start = getTS();
string x;
for (int i=0; i<1000000; i++) {
mymap["China"];
}
printf("took %f sec\n", getTS() - start);
return 0;
}
Golang 版本
package main
import "fmt"
import "time"
func main() {
var x string
mymap := make(map[string]string)
mymap["U.S."] = "Washington";
mymap["U.K."] = "London";
mymap["France"] = "Paris";
mymap["Russia"] = "Moscow";
mymap["China"] = "Beijing";
mymap["Germany"] = "Berlin";
mymap["Japan"] = "Tokyo";
mymap["China"] = "Beijing";
mymap["Italy"] = "Rome";
mymap["Spain"] = "Madrad";
t0 := time.Now()
sum := 1
for sum < 1000000 {
x = mymap["China"]
sum += 1
}
t1 := time.Now()
fmt.Printf("The call took %v to run.\n", t1.Sub(t0))
fmt.Println(x)
}
更新 1
为了改进C++版本,把x = mymap["China"];改成mymap["China"];,但是性能差别很小。
更新 2
我在没有任何优化的情况下编译时得到了原始结果:g++ -std=c++11 unorderedMap.cc。使用“-O2”优化,只需大约一半的时间(150ms)
更新 3
为了删除可能的 char* 到 string 构造函数调用,我创建了一个字符串常量。时间下降到大约 220 毫秒(编译中没有优化)。感谢@neil-kirk 的建议,经过优化(-O2 标志),时间约为 80 毫秒。
double start = getTS();
string x = "China";
for (int i=0; i<1000000; i++) {
mymap[x];
}
更新 4
感谢@steffen-ullrich 指出 perl 版本存在语法错误。我改变了它。性能数约为150ms。
更新 5
看来执行指令的数量很重要。使用命令valgrind --tool=cachegrind <cmd>
适用于 Go 版本
$ valgrind --tool=cachegrind ./te1
==2103== Cachegrind, a cache and branch-prediction profiler
==2103== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
==2103== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==2103== Command: ./te1
==2103==
--2103-- warning: L3 cache found, using its data for the LL simulation.
The call took 1.647099s to run.
Beijing
==2103==
==2103== I refs: 255,763,381
==2103== I1 misses: 3,709
==2103== LLi misses: 2,743
==2103== I1 miss rate: 0.00%
==2103== LLi miss rate: 0.00%
==2103==
==2103== D refs: 109,437,132 (77,838,331 rd + 31,598,801 wr)
==2103== D1 misses: 352,474 ( 254,714 rd + 97,760 wr)
==2103== LLd misses: 149,260 ( 96,250 rd + 53,010 wr)
==2103== D1 miss rate: 0.3% ( 0.3% + 0.3% )
==2103== LLd miss rate: 0.1% ( 0.1% + 0.1% )
==2103==
==2103== LL refs: 356,183 ( 258,423 rd + 97,760 wr)
==2103== LL misses: 152,003 ( 98,993 rd + 53,010 wr)
==2103== LL miss rate: 0.0% ( 0.0% + 0.1% )
对于 C++ 优化版本(无优化标志)
$ valgrind --tool=cachegrind ./a.out
==2180== Cachegrind, a cache and branch-prediction profiler
==2180== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
==2180== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==2180== Command: ./a.out
==2180==
--2180-- warning: L3 cache found, using its data for the LL simulation.
took 64.657681 sec
==2180==
==2180== I refs: 5,281,474,482
==2180== I1 misses: 1,710
==2180== LLi misses: 1,651
==2180== I1 miss rate: 0.00%
==2180== LLi miss rate: 0.00%
==2180==
==2180== D refs: 3,170,495,683 (1,840,363,429 rd + 1,330,132,254 wr)
==2180== D1 misses: 12,055 ( 10,374 rd + 1,681 wr)
==2180== LLd misses: 7,383 ( 6,132 rd + 1,251 wr)
==2180== D1 miss rate: 0.0% ( 0.0% + 0.0% )
==2180== LLd miss rate: 0.0% ( 0.0% + 0.0% )
==2180==
==2180== LL refs: 13,765 ( 12,084 rd + 1,681 wr)
==2180== LL misses: 9,034 ( 7,783 rd + 1,251 wr)
==2180== LL miss rate: 0.0% ( 0.0% + 0.0% )
C++优化版
$ valgrind --tool=cachegrind ./a.out
==2157== Cachegrind, a cache and branch-prediction profiler
==2157== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
==2157== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==2157== Command: ./a.out
==2157==
--2157-- warning: L3 cache found, using its data for the LL simulation.
took 9.419447 sec
==2157==
==2157== I refs: 1,451,459,660
==2157== I1 misses: 1,599
==2157== LLi misses: 1,549
==2157== I1 miss rate: 0.00%
==2157== LLi miss rate: 0.00%
==2157==
==2157== D refs: 430,486,197 (340,358,108 rd + 90,128,089 wr)
==2157== D1 misses: 12,008 ( 10,337 rd + 1,671 wr)
==2157== LLd misses: 7,372 ( 6,120 rd + 1,252 wr)
==2157== D1 miss rate: 0.0% ( 0.0% + 0.0% )
==2157== LLd miss rate: 0.0% ( 0.0% + 0.0% )
==2157==
==2157== LL refs: 13,607 ( 11,936 rd + 1,671 wr)
==2157== LL misses: 8,921 ( 7,669 rd + 1,252 wr)
==2157== LL miss rate: 0.0% ( 0.0% + 0.0% )
【问题讨论】:
-
C++ 实现是否有可能在每次查找时都在构造一个新的
std::string? -
是的,将键缓存在 for 循环外的本地字符串变量中。
-
你开启优化了吗?
-
我实际上并不关心这些基准,因为至少目前 Perl 代码没有做它应该做的事情,即使在它得到“修复”之后也是如此。我不知道其他代码,但它也可能是错误的,或者编译器会优化东西。绝对远离任何可靠的基准。
-
除此之外:使用运行不到一秒的基准测试是没有用的,因为处理器当前处于哪种电源模式以及哪些进程可能只是并行运行并占用 CPU 时间纯属运气。真正的基准测试运行数小时,以确保基准测试不会受到此类问题的过多影响,