哪种方式更适合数组访问？答案

【问题标题】：Which way is better for array access?哪种方式更适合数组访问？
【发布时间】：2013-09-12 13:18:12
【问题描述】：

我有一个函数，其中我使用了一个常量数组：

void function(int Id){
int array1[4] = {4 constants};
int array2[4] = {4 constants};
   for(int i=0; i<4; i++){
   //accessing the array 1&2 for computation;
   }
}

从main() 访问void function(int Id) 将有近百万次。

我的问题是，是在头文件中声明 array1 和 array2 并在 function() 中访问更好，还是像现在这样动态声明它们更好？

哪种方式更快（考虑从头文件访问或动态声明）？

编辑：数组只在function()内部被访问而不被修改。

【问题讨论】：

初始化器是常量，但是数组是不可变的还是你在函数内部改变它们？
运行一个实验怎么样。尝试两种方式。您的编译器可能足够聪明，可以为您优化大量常量值。
函数内部不修改数组
你在循环中到底在做什么？您是否分析了代码以实际确定这个特定位是一个问题？ function 是否可供编译器使用以便内联？
如果数组没有在函数内部被修改，你最好定义它们static int const array...。

标签： c++ c arrays algorithm

【解决方案1】：

如果数组不会改变，并且不会在另一个函数中重用，最好将它们设为静态。这避免了每次调用函数时都在堆栈上构造数组的必要性。

void function(int Id){
    static const int array1[4] = {4 constants};
    static const int array2[4] = {4 constants};
    for(int i=0; i<4; i++){
        //accessing the array 1&2 for computation;
   }
}

编辑添加最好避免在数组声明和循环表达式中使用“幻数”4。如果不这样做，很容易更改数组大小而忘记更改循环表达式。这可以通过将数组大小设为常量来完成，或者通过在循环表达式中使用 sizeof() 来完成，如以下堆栈溢出问题所示：How do I determine the size of my array in C?

【讨论】：

此方法的另一个优点是数组只在函数内可见；他们不会不必要地接触到程序的其余部分。

【解决方案2】：

我认为最好的方法是：

void function(int Id){
    static const int array1[4] = {4 constants};
    static const int array2[4] = {4 constants};
   for(int i=0; i<4; i++){
   //accessing the array 1&2 for computation;
   }
}

但最好只做一个小测试，看看哪个最快。拉克斯万。

【讨论】：

此外，如果您的函数对性能至关重要，您可能需要研究别名问题和其他编译器特定优化，例如：msdn.microsoft.com/en-us/library/k649tyc7.aspx
尝试为您的解决方案添加一些解释。人们常常来这里学习，而不是简单地解决他们的问题。（可能两者都有）

【解决方案3】：

我猜没有区别。你可能想写：

**const** int array1[4]

为了更好地向编译器解释你的意思。这可能会为其提供更多优化选项。

【讨论】：

【解决方案4】：

我尝试了一个测试用例，它比较了三个选项 - global, local, local static 用于 4d 向量的简单向量内积的大约 2000 万次操作。这是在 VS2010 32 位发行版上完成的。结果如下：

DPSUM:600000000 时间:78| DPSUM:600000000 时间:62| DPSUM:600000000 时间：63| DPSUM:600000000 时间:47| DPSUM:600000000 时间:46| DPSUM:600000000 时间:78| DPSUM:600000000 时间:47| DPSUM:600000000 时间：47| DPSUM:600000000 时间:78| DPSUM:600000000 时间:47| DPSUM:600000000 时间:47| DPSUM:600000000 时间:62| DPSUM:600000000 时间：62| DPSUM:600000000 时间:47| DPSUM:600000000 时间:63| DPSUM:600000000 时间:46| DPSUM:600000000 时间:63| DPSUM:600000000 时间：62| DPSUM:600000000 时间:47| DPSUM:600000000 时间:47| DPSUM:600000000 时间:78| DPSUM:600000000 时间:47| DPSUM:600000000 时间:46| DPSUM:600000000 时间:78| DPSUM:600000000 时间:47| DPSUM:600000000 时间:47| DPSUM:600000000 时间:62| DPSUM:600000000 时间：63| DPSUM:600000000 时间:47| DPSUM:600000000 时间:62|

第一列是static const，第二列是local，第三列是global。如果您想在您的平台上尝试，我将发布示例代码。看起来 static local 和 local 速度一样快 - 至少对于这个编译器来说（可能是由于一些内部优化。

代码如下：

#include <stdio.h>
#include <windows.h>

int ag[] = {1,2,3,4}; int bg[] = {1,2,3,4};
int dp1(){
    static const int a[] = {1,2,3,4}; static const int b[] = {1,2,3,4};
    return a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
}

int dp2(){
    int a[] = {1,2,3,4}; int b[] = {1,2,3,4};
    return a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
}

int dp3(){
    return ag[0]*bg[0] + ag[1]*bg[1] + ag[2]*bg[2] + ag[3]*bg[3];
}

int main(){
    int numtrials = 10;
    typedef int (*DP)();
    DP dps[] = {dp1, dp2, dp3};

    for (int t = 0; t < numtrials; ++t){
        int dpsum[] = {0,0,0};
        for (int jj =0; jj <3; ++jj){
            DWORD bef, aft;
            bef = GetTickCount();
            for (int ii =0; ii< 20000000; ++ii){
                dpsum[jj] += dps[jj]();
            }
            aft = GetTickCount();
            printf("DPSUM:%d TIME:%d| ", dpsum[jj], aft - bef);
        }
        printf("\n");
    }
    getchar();
}

【讨论】：

请注意，在合理的优化下（VC 可能会或可能不会这样做），dp1 和 dp2 应该都优化为单个 return <number> 语句。