测量缓存线大小的简单测试答案

【问题标题】：Simple test to measure cache lines size测量缓存线大小的简单测试
【发布时间】：2015-10-05 14:45:43
【问题描述】：

从这篇文章开始 - Igor Ostrovsky 的 Gallery of Processor Cache Effects - 我想在我自己的机器上玩他的例子。这是我的第一个示例的代码，它着眼于触摸不同的缓存行如何影响运行时间：

#include <iostream>
#include <time.h>

using namespace std;

int main(int argc, char* argv[])
{
    int step = 1;

    const int length = 64 * 1024 * 1024;
    int* arr = new int[length];

    timespec t0, t1;
    clock_gettime(CLOCK_REALTIME, &t0);
    for (int i = 0; i < length; i += step) 
        arr[i] *= 3;
    clock_gettime(CLOCK_REALTIME, &t1);

    long int duration = (t1.tv_nsec - t0.tv_nsec);
    if (duration < 0)
        duration = 1000000000 + duration;

    cout<< step << ", " << duration / 1000 << endl;

    return 0;
}

使用不同的step值，我看不到运行时间的跳跃：

step, microseconds
1, 451725
2, 334981
3, 287679
4, 261813
5, 254265
6, 246077
16, 215035
32, 207410
64, 202526
128, 197089
256, 195154

我希望看到类似的东西：

但是从 16 开始，我们每翻一倍步长，运行时间就会减半。

我在 Ubuntu13、Xeon X5450 上对其进行了测试，并使用：g++ -O0 进行编译。我的代码是否存在缺陷，或者结果实际上还可以？对我所缺少的任何见解将不胜感激。

【问题讨论】：

用-O0 衡量性能是浪费时间。不要这样做。尝试-O2 甚至-march=native，尤其是。如果玩缓存线。
@erenon - -O0 是必需的，因此不会优化数组分配。结果没有被使用，所以可以删除计算。
@CraigS.Anderson：这不是在没有优化的情况下衡量性能的理由。然后使用结果——很简单。
尝试禁用硬件预取，跳过行可能会欺骗基于流的行，因此您不会按预期保存 BW。
@erenon - 我实际上并没有尝试衡量性能。只是想“亲眼看看”缓存线的效果。

标签： c++ linux performance cpu-cache

【解决方案1】：

我看到你想观察缓存行大小的影响，我推荐工具 cachegrind，它是 valgrind 工具集的一部分。你的方法是正确的，但还没有接近结果。

#include <iostream>
#include <time.h>
#include <stdlib.h>

using namespace std;

int main(int argc, char* argv[])
{
    int step = atoi(argv[1]);

    const int length = 64 * 1024 * 1024;
    int* arr = new int[length];

    for (int i = 0; i < length; i += step) 
        arr[i] *= 3;
    return 0;
}

运行工具 valgrind --tool=cachegrind ./a.out $cacheline-size 你应该会看到结果。绘制此图后，您将获得准确的预期结果。快乐的实验！！

【讨论】：

【解决方案2】：

public class CacheLine {

public static void main(String[] args) {
    CacheLine cacheLine = new CacheLine();
    cacheLine.startTesting();
}

private void startTesting() {
    byte[] array = new byte[128 * 1024];
    for (int testIndex = 0; testIndex < 10; testIndex++) {
        testMethod(array);
        System.out.println("--------- // ---------");
    }

}

private void testMethod(byte[] array) {
    for (int len = 8192; len <= array.length; len += 8192) {

        long t0 = System.nanoTime();
        for (int i = 0; i < 10000; i++) {
            for (int k = 0; k < len; k += 64) {
                array[k] = 1;
            }
        }

        long dT = System.nanoTime() - t0;
        System.out.println("len: " + len / 1024 + " dT: " + dT + " dT/stepCount: " + (dT) / len);
    }
}
}

此代码可帮助您确定 L1 数据缓存大小。您可以在此处详细了解它。 https://medium.com/@behzodbekqodirov/threading-in-java-194b7db6c1de#.kzt4w8eul

【讨论】：