【发布时间】:2014-08-08 10:42:03
【问题描述】:
我很感兴趣 std::inner_product() 与手动点积计算相比的表现如何,所以我做了一个测试。
std::inner_product() 比手动实施快 4 倍。我觉得这很奇怪,因为确实没有那么多方法可以计算它,当然?!我也看不到在计算时使用的任何 SSE/AVX 寄存器。
设置:VS2013/MSVC(12?),Haswell i7 4770 CPU,64位编译,发布模式。
这里是 C++ 测试代码:
#include <iostream>
#include <functional>
#include <numeric>
#include <cstdint>
int main() {
const int arraySize = 1000;
const int numTests = 500;
unsigned int x, y = 0;
unsigned long long* array1 = new unsigned long long[arraySize];
unsigned long long* array2 = new unsigned long long[arraySize];
//Initialise arrays
for (int i = 0; i < arraySize; i++){
unsigned long long val = __rdtsc();
array1[i] = val;
array2[i] = val;
}
//std::inner_product test
unsigned long long timingBegin1 = __rdtscp(&s);
for (int i = 0; i < numTests; i++){
volatile unsigned long long result = std::inner_product(array1, array1 + arraySize, array2, static_cast<uint64_t>(0));
}
unsigned long long timingEnd1 = __rdtscp(&s);
f, s = 0;
//Manual Dot Product test
unsigned long long timingBegin2 = __rdtscp(&f);
for (int i = 0; i < numTests; i++){
volatile unsigned long long result = 0;
for (int i = 0; i < arraySize; i++){
result += (array1[i] * array2[i]);
}
}
unsigned long long timeEnd2 = __rdtscp(&f);
std::cout << "STL: : " << static_cast<double>(finish1 - start1) / numTests << " CPU cycles per dot product" << std::endl;
std::cout << "Manually : " << static_cast<double>(finish2 - start2) / numTests << " CPU cycles per dot product" << std::endl;
【问题讨论】:
-
您还必须考虑到使用 operator[] 访问数组比基于迭代器的访问数组要慢。 STL 也利用了这一点。如果您以与 STL 类似的方式自己实现 inner_product ,则结合 volatile 关键字。您将获得比 STL(已测试)更快的代码;)
标签: c++ arrays performance optimization inner-product