结构初始化性能答案

【问题标题】：Structure initialization performance结构初始化性能
【发布时间】：2013-10-01 15:19:09
【问题描述】：

我正在尝试提高我的程序的性能（在 ARC 平台上运行，使用 arc-gcc 编译。话虽如此，我并不期待特定于平台的答案）。

我想知道以下哪种方法更优化以及为什么。

typedef struct _MY_STRUCT
{
    int my_height;
    int my_weight;
    char my_data_buffer[1024];
}MY_STRUCT;

int some_function(MY_STRUCT *px_my_struct)
{
    /*Many operations with the structure members done here*/
    return 0;
}

void poorly_performing_function_method_1()
{
    while(1)
    {
        MY_STRUCT x_struct_instance = {0}; /*x_struct_instance is automatic variable under WHILE LOOP SCOPE*/
        x_struct_instance.my_height = rand();
        x_struct_instance.my_weight = rand();
        if(x_struct_instance.my_weight > 100)
        {
            memcpy(&(x_struct_instance.my_data_buffer),"this is just an example string, there could be some binary data here.",sizeof(x_struct_instance.my_data_buffer));
        }
        some_function(&x_struct_instance);

        /******************************************************/
        /* No need for memset as it is initialized before use.*/
        /* memset(&x_struct_instance,0,sizeof(x_struct_instance));*/
        /******************************************************/
    }
}

void poorly_performing_function_method_2()
{
    MY_STRUCT x_struct_instance = {0}; /*x_struct_instance is automatic variable under FUNCTION SCOPE*/
    while(1)
    {
        x_struct_instance.my_height = rand();
        x_struct_instance.my_weight = rand();
        if(x_struct_instance.my_weight > 100)
        {
            memcpy(&(x_struct_instance.my_data_buffer),"this is just an example string, there could be some binary data here.",sizeof(x_struct_instance.my_data_buffer));
        }
        some_function(&x_struct_instance);
        memset(&x_struct_instance,0,sizeof(x_struct_instance));
    }
}

在上面的代码中，poorly_performing_function_method_1() 性能更好还是poorly_performing_function_method_2() 性能更好？为什么？

需要考虑的事情很少..

在方法 #1 中，结构内存的重新分配、重新分配是否会增加更多开销？
在方法#1 中，在初始化期间，是否进行了优化？喜欢calloc（优化内存分配和在零填充页面中分配内存）？

我想澄清一下，我的问题更多是关于 WHICH 方法更优化，而不是关于如何使此代码更优化。此代码只是一个示例。

关于使上述代码更优化，@Skizz 给出了正确答案。

【问题讨论】：

我不熟悉 ARC - 是否有任何特定于平台的东西会阻止您对其进行基准测试？
@us2012 更新了问题。不，我不是在查看特定于平台的答案。只是为了完成而提到的平台。
使用memset()的函数，性能会下降，因为memset是高成本的内存操作
@CCoder：你分析过它吗？
@nneonneo 是的。我已经使用 oprofile 对其进行了分析，我看到 memset 出现在方法 2 的列表中。但我不确定方法 1 是否提供了更好的整体性能，因为我在分析期间运行了许多其他代码。

标签： c linux performance

【解决方案1】：

一般来说，不做某事会比做某事更快。

在您的代码中，您正在清除一个结构，然后使用数据对其进行初始化。您正在执行两次内存写入，第二次只是覆盖第一次。

试试这个：-

void function_to_try()
{
  MY_STRUCT x_struct_instance;
  while(1)
  {
    x_struct_instance.my_height = rand();
    x_struct_instance.my_weight = rand();
    x_struct_instance.my_name[0]='\0';
    if(x_struct_instance.my_weight > 100)
    {
        strlcpy(&(x_struct_instance.my_name),"Fatty",sizeof(x_struct_instance.my_name));
    }
    some_function(&x_struct_instance);
  }
}

更新

要回答这个更优化的问题，我建议使用方法#1，但它可能是边缘化的，并且取决于编译器和其他因素。我的理由是没有任何分配/释放，数据在堆栈上，编译器创建的函数前导码将为函数分配足够大的堆栈帧，这样它就不需要调整它的大小。在任何情况下，在堆栈上分配只是移动堆栈指针，所以它不是一个很大的开销。

此外，memset 是一种用于设置内存的通用方法，其中可能有额外的逻辑来处理边缘条件，例如未对齐的内存。编译器可以比通用算法更智能地实现初始化程序（至少，人们希望如此）。

【讨论】：

很抱歉问题中没有说清楚。我写的结构只是一个示例结构。如果我有一个缓冲区来在结构内存储一些二进制数据，这可能不起作用。
@CCoder：基本思路是健全的，不要写入内存（清除它）然后再写入它（设置数据）。如果您有一个大小不确定的二进制 blob 存储在固定大小的缓冲区中，请添加一些有效字节字段或用零（或其他）填充缓冲区，不要清除整个缓冲区，然后覆盖已清除的数据。
好的。没事儿。我的问题更多是关于哪种方法更优化，而不是关于如何使这段代码更优化。
正如@Skizz 所说。如果你有一个指向 10MB 数据的指针，你就不会每次都去 memset（除了安全问题）。您只需重置指示缓冲区中有多少数据的值。对于 char 字符串，这是通过将空终止符写入 [0] 来实现的。抛开退出情况不谈，这两种情况在任何现代编译器上都将接近相同，因为您的两个代码路径之间的区别在于 memset 的完成位置。第一个将在循环开始时进行 memset，第二个将在循环结束时进行 memset。你可能会得到一个 sub esp, xxx in case 2 每次迭代。