图像处理进一步优化答案

【问题标题】：image proccessing further optimization图像处理进一步优化
【发布时间】：2018-07-05 09:28:54
【问题描述】：

我是优化新手，我被分配了一项任务来优化一个尽可能多地处理图像的函数。它拍摄一张图像，对其进行模糊处理，然后保存模糊的图像，然后继续锐化图像，同时保存锐化的图像。

这是我的代码：

typedef struct {
   unsigned char red;
   unsigned char green;
   unsigned char blue;
} pixel;

// I delete the other struct because we can do the same operations with     use of only addresses

//use macro instead of function is more efficient
#define calculateIndex(i, j, n) ((i)*(n)+(j))


// I combine all the functions in one because it is time consuming
void myfunction(Image *image, char* srcImgpName, char* blurRsltImgName,    char* sharpRsltImgName) {
    // use variable from type 'register int' is much more efficient from 'int'
    register int i,j, ii, jj, sum_red, sum_green, sum_blue; 
    //using local variable is much more efficient than using pointer to   pixels from the original image,and updat its value in each iteration
    pixel current_pixel , p;

    //dst will point on the first pixel in the image
    pixel* dst = (pixel*)image->data;

    int squareN = n*n;
    //instead of multiply by 3 - I used shift 
    register int sizeToAllocate = ((squareN)<<1)+(squareN); // use variable from type 'register int' is much more efficient from 'int'
    pixel* src = malloc(sizeToAllocate);

    register int index;

    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);

    ///////////////////////////////////////// first step : smooth //////////////////////////////////////////////////////////////////////


    /**the smooth blur is step that apply the blur-kernel (matrix of ints) over each pixel in the bouns - and make the image more smooth.
*this function was originally used this matrix :
* [1, 1, 1]
* [1, 1, 1]
* [1, 1, 1]
*because the matrix is full of 1 , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    index = calculateIndex(1, 1, n);
    for (i = 1 ; i < n - 1; ++i) {
        for (j =  1 ; j < n - 1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            for(ii = i-1; ii <= i+1; ++ii) {
                for(jj =j-1; jj <= j+1; ++jj) {
                    //take care of the [ii,jj] pixel in the matrix
                    //calculate the adrees of the current pixel
                    pixel p = src[calculateIndex(ii, jj, n)];       
                    //sum the colors' values of the neighbors of the current pixel
                    sum_red += p.red;
                    sum_green +=  p.green;
                    sum_blue += p.blue;
                }
            }
            //calculate the avarage of the colors' values around the current pixel - as written in the instructions
            sum_red = (((sum_red) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_green = (((sum_green) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_blue = (((sum_blue) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient

            current_pixel.red = (unsigned char)sum_red;
            current_pixel.green = (unsigned char)sum_green;
            current_pixel.blue = (unsigned char)sum_blue;
            dst[index++] = current_pixel;
        }
    }
    // write result image to file
    writeBMP(image, srcImgpName, blurRsltImgName);
    
    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);


    ///////////////////////////////////////// second step : sharp //////////////////////////////////////////////////////////////////////



    /** I want to sharp the smooth image . In this step I apply the sharpen kernel (matrix of ints) over each pixel in the bouns - and make the image more sharp.
*this function was originally used this matrix :
* [-1, -1, -1]
* [-1, 9, -1]
* [-1, -1, -1]
*because the matrix is full of (-1) , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable. I operato like that : insted of multiply in (-1) in the end of the step , I define counter initializes with zero , and
*substruct all te colors' values from it. the result is actually the same as multiply by (-1), in more efficient way.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    for (i = 1 ; i < n-1; ++i) {
        for (j =  1 ; j < n-1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            // Do central pixel first
            p=src[calculateIndex(i,j,n)];
            sum_red   = 10*p.red;
            sum_green = 10*p.green;
            sum_blue  = 10*p.blue;

            for(ii =i-1; ii <= i + 1; ++ii) {
                for(jj = j-1; jj <= j + 1; ++jj) {
                    p = src[calculateIndex(ii, jj, n)];
                    //operate according to the instructions
                    sum_red -= p.red;
                    sum_green -= p.green;
                    sum_blue -= p.blue;
                }
            }

            //each pixel's colors' values must match the range [0,255] - I used the idea from the original code

            //the red value must be in the range [0,255]
            if (sum_red < 0) {
                sum_red = 0;
            } else if (sum_red > 255 ) {
                sum_red = 255;
            }
            current_pixel.red = (unsigned char)sum_red;


            //the green value must be in the range [0,255]
            if (sum_green < 0) {
            sum_green = 0;
            } else if (sum_green > 255 ) {
            sum_green = 255;
            }
            current_pixel.green = (unsigned char)sum_green;


            //the blue value must be in the range [0,255]
            if (sum_blue < 0) {
                sum_blue = 0;
            } else if (sum_blue > 255 ) {
                sum_blue = 255;
            }
            current_pixel.blue = (unsigned char)sum_blue;


            // put the updated pixel in [i,j] in the image
            dst[calculateIndex(i, j, n)] = current_pixel;
        }
    }

    //free the allocated space to prevent memory leaks
    free(src);

    // write result image to file
    writeBMP(image, srcImgpName, sharpRsltImgName);
}

我想问一下 if 语句，有什么更好的东西可以代替它们吗？而且更一般地说，任何人都可以在这里发现优化错误，或者可以提供他的意见吗？

非常感谢！

更新代码：

typedef struct {
   unsigned char red;
   unsigned char green;
   unsigned char blue;
} pixel;

// I delete the other struct because we can do the same operations with use of only addresses

//use macro instead of function is more efficient
#define calculateIndex(i, j, n) ((i)*(n)+(j))


// I combine all the functions in one because it is time consuming
void myfunction(Image *image, char* srcImgpName, char* blurRsltImgName, char* sharpRsltImgName) {
    // use variable from type 'register int' is much more efficient from 'int'
register int i,j, ii, jj, sum_red, sum_green, sum_blue; 
    //using local variable is much more efficient than using pointer to pixels from the original image,and updat its value in each iteration
    pixel current_pixel , p;

    //dst will point on the first pixel in the image
    pixel* dst = (pixel*)image->data;

    int squareN = n*n;
    //instead of multiply by 3 - I used shift 
    register int sizeToAllocate = ((squareN)<<1)+(squareN); // use variable from type 'register int' is much more efficient from 'int'
    pixel* src = malloc(sizeToAllocate);

    register int index;

    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);

    ///////////////////////////////////////// first step : smooth //////////////////////////////////////////////////////////////////////


    /**the smooth blur is step that apply the blur-kernel (matrix of ints) over each pixel in the bouns - and make the image more smooth.
*this function was originally used this matrix :
* [1, 1, 1]
* [1, 1, 1]
* [1, 1, 1]
*because the matrix is full of 1 , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    index = calculateIndex(1, 1, n);
    for (i = 1 ; i < n - 1; ++i) {
        for (j =  1 ; j < n - 1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            for(ii = i-1; ii <= i+1; ++ii) {
                for(jj =j-1; jj <= j+1; ++jj) {
                    //take care of the [ii,jj] pixel in the matrix
                    //calculate the adrees of the current pixel
                    pixel p = src[calculateIndex(ii, jj, n)];       
                    //sum the colors' values of the neighbors of the current pixel
                    sum_red += p.red;
                    sum_green +=  p.green;
                    sum_blue += p.blue;
                }
            }
            //calculate the avarage of the colors' values around the current pixel - as written in the instructions
            sum_red = (((sum_red) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_green = (((sum_green) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_blue = (((sum_blue) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient

            current_pixel.red = (unsigned char)sum_red;
            current_pixel.green = (unsigned char)sum_green;
            current_pixel.blue = (unsigned char)sum_blue;
            dst[index++] = current_pixel;
        }
        index += 2;
    }
    // write result image to file
    writeBMP(image, srcImgpName, blurRsltImgName);
    
    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);


    ///////////////////////////////////////// second step : sharp //////////////////////////////////////////////////////////////////////



    /** I want to sharp the smooth image . In this step I apply the sharpen kernel (matrix of ints) over each pixel in the bouns - and make the image more sharp.
*this function was originally used this matrix :
* [-1, -1, -1]
* [-1, 9, -1]
* [-1, -1, -1]
*because the matrix is full of (-1) , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable. I operato like that : insted of multiply in (-1) in the end of the step , I define counter initializes with zero , and
*substruct all te colors' values from it. the result is actually the same as multiply by (-1), in more efficient way.
*/

    index = calculateIndex(1,1,n);
    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    for (i = 1 ; i < n-1; ++i) {
        for (j =  1 ; j < n-1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            // Do central pixel first
            p=src[index];
            sum_red   = 10*p.red;
            sum_green = 10*p.green;
            sum_blue  = 10*p.blue;

            for(ii =i-1; ii <= i + 1; ++ii) {
                for(jj = j-1; jj <= j + 1; ++jj) {
                    p = src[calculateIndex(ii, jj, n)];
                    //operate according to the instructions
                    sum_red -= p.red;
                    sum_green -= p.green;
                    sum_blue -= p.blue;
                }
                index += 2;
            }

            //each pixel's colors' values must match the range [0,255] - I used the idea from the original code

            //the red value must be in the range [0,255]
            if (sum_red < 0) {
                sum_red = 0;
            } else if (sum_red > 255 ) {
                sum_red = 255;
            }
            current_pixel.red = (unsigned char)sum_red;


            //the green value must be in the range [0,255]
            if (sum_green < 0) {
                sum_green = 0;
            } else if (sum_green > 255 ) {
                sum_green = 255;
            }
            current_pixel.green = (unsigned char)sum_green;


            //the blue value must be in the range [0,255]
            if (sum_blue < 0) {
                sum_blue = 0;
            } else if (sum_blue > 255 ) {
                sum_blue = 255;
            }
            current_pixel.blue = (unsigned char)sum_blue;


            // put the updated pixel in [i,j] in the image
            dst[calculateIndex(i, j, n)] = current_pixel;
        }
    }

    //free the allocated space to prevent memory leaks
    free(src);

    // write result image to file
    writeBMP(image, srcImgpName, sharpRsltImgName);
}

----------------------------------- -------------------------------------------更新代码：

typedef struct {
   unsigned char red;
   unsigned char green;
   unsigned char blue;
} pixel;

// I delete the other struct because we can do the same operations with use of only addresses

//use macro instead of function is more efficient
#define calculateIndex(i, j, n) ((i)*(n)+(j))


// I combine all the functions in one because it is time consuming
void myfunction(Image *image, char* srcImgpName, char* blurRsltImgName, char* sharpRsltImgName) {
    // use variable from type 'register int' is much more efficient from 'int'
    register int i,j, ii, jj, sum_red, sum_green, sum_blue; 
    //using local variable is much more efficient than using pointer to pixels from the original image,and updat its value in each iteration
    pixel current_pixel , p;

    //dst will point on the first pixel in the image
    pixel* dst = (pixel*)image->data;

    int squareN = n*n;
    //instead of multiply by 3 - I used shift 
    register int sizeToAllocate = ((squareN)<<1)+(squareN); // use    variable from type 'register int' is much more efficient from 'int'
    pixel* src = malloc(sizeToAllocate);

    register int index;

    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);

    ///////////////////////////////////////// first step : smooth //////////////////////////////////////////////////////////////////////


    /**the smooth blur is step that apply the blur-kernel (matrix of ints) over each pixel in the bouns - and make the image more smooth.
*this function was originally used this matrix :
* [1, 1, 1]
* [1, 1, 1]
* [1, 1, 1]
*because the matrix is full of 1 , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    index = n + 1;
    for (i = 1 ; i < n - 1; ++i) {
        for (j =  1 ; j < n - 1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            for(ii = i-1; ii <= i+1; ++ii) {
                for(jj =j-1; jj <= j+1; ++jj) {
                    //take care of the [ii,jj] pixel in the matrix
                    //calculate the adrees of the current pixel
                    pixel p = src[calculateIndex(ii, jj, n)];       
                    //sum the colors' values of the neighbors of the current pixel
                    sum_red += p.red;
                    sum_green +=  p.green;
                    sum_blue += p.blue;
                }
            }
            //calculate the avarage of the colors' values around the current pixel - as written in the instructions
            sum_red = (((sum_red) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_green = (((sum_green) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_blue = (((sum_blue) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient

            current_pixel.red = (unsigned char)sum_red;
            current_pixel.green = (unsigned char)sum_green;
            current_pixel.blue = (unsigned char)sum_blue;
            dst[index++] = current_pixel;
        }
        index += 2;
    }
    // write result image to file
    writeBMP(image, srcImgpName, blurRsltImgName);
    
    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);


    ///////////////////////////////////////// second step : sharp //////////////////////////////////////////////////////////////////////



    /** I want to sharp the smooth image . In this step I apply the sharpen kernel (matrix of ints) over each pixel in the bouns - and make the image more sharp.
*this function was originally used this matrix :
* [-1, -1, -1]
* [-1, 9, -1]
* [-1, -1, -1]
*because the matrix is full of (-1) , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable. I operate like that : instead of multiply in (-1) in the end of the step , I define counter initializes with zero , and
*substruct all te colors' values from it. the result is actually the same as multiply by (-1), in more efficient way.
*/

    index = calculateIndex(1,1,n);
    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    for (i = 1 ; i < n-1; ++i) {
        for (j =  1 ; j < n-1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            // Do central pixel first
            p=src[index];
            sum_red   = 10*p.red;
            sum_green = 10*p.green;
            sum_blue  = 10*p.blue;

            for(ii =i-1; ii <= i + 1; ++ii) {
                for(jj = j-1; jj <= j + 1; ++jj) {
                    p = src[calculateIndex(ii, jj, n)];
                    //operate according to the instructions
                    sum_red -= p.red;
                    sum_green -= p.green;
                    sum_blue -= p.blue;
                }
            }

            //each pixel's colors' values must match the range [0,255] - I used the idea from the original code

            //the red value must be in the range [0,255]
            if (sum_red < 0) {
                sum_red = 0;
            } else if (sum_red > 255 ) {
                sum_red = 255;
            }
            current_pixel.red = (unsigned char)sum_red;


            //the green value must be in the range [0,255]
            if (sum_green < 0) {
                sum_green = 0;
            } else if (sum_green > 255 ) {
                sum_green = 255;
            }
            current_pixel.green = (unsigned char)sum_green;


            //the blue value must be in the range [0,255]
            if (sum_blue < 0) {
                sum_blue = 0;
            } else if (sum_blue > 255 ) {
                sum_blue = 255;
            }
            current_pixel.blue = (unsigned char)sum_blue;


            // put the updated pixel in [i,j] in the image
            dst[calculateIndex(i, j, n)] = current_pixel;
        }
        index += 2;
    }

    //free the allocated space to prevent memory leaks
    free(src);

    // write result image to file
    writeBMP(image, srcImgpName, sharpRsltImgName);
}

【问题讨论】：

像高斯模糊滤镜这样的模糊滤镜是可分离的。这是一个非常重要的优化。此外，您可以并行执行一些操作。事实上，在这种情况下，“shift”作为优化是微不足道的。
我的建议，在您执行@IharobAlAsimi 建议的重新：可分离过滤器之前，不要费心优化任何东西。然后，您需要衡量代码的性能。
如果您有工作代码，并且您只是希望人们审查它以尝试改进它，您应该改为询问Code Review；这正是它被创建的原因。本网站是针对您遇到的代码问题提出的问题，而不是关于改进或优化的建议。
另外，根据图像的大小，您可以使用快速傅里叶变换来执行卷积，因为它只是点对点乘法而不是求和。但是如果图像不够大，那么较小的内核和普通卷积可能会更快，在计算内核 + 图像和卷积的 FFT 的成本和直接卷积的成本之间存在权衡。 /跨度>
ifs 在四个嵌套循环中。即使您对它们进行了优化，您仍然会遇到缓慢的算法。重新思考算法。

标签： c image image-processing optimization

【解决方案1】：

一些通用优化指南：

如果您在 x86 上运行，请编译为 64 位二进制文件。 x86 实际上是一个寄存器匮乏的 CPU。在 32 位模式下，您几乎只有 5 或 6 个 32 位通用寄存器可用，如果您在 GCC 上使用 -fomit-frame-pointer 之类的优化进行编译，则只能获得“全部”6 个。在 64 位模式下，您将拥有 13 个或 14 个 64 位通用寄存器。
获得一个好的编译器并使用尽可能高的通用优化级别。
简介！轮廓！轮廓！实际分析您的代码，以便真正知道性能瓶颈在哪里。任何关于性能瓶颈位置的猜测都可能是错误的。
一旦找到瓶颈，检查编译器生成的实际指令并查看瓶颈区域，看看发生了什么。也许瓶颈在于编译器由于寄存器压力而不得不做很多register spilling and filling。如果您可以深入到指令级别，这将非常有用。
使用分析和检查生成的指令的见解来改进您的代码和编译参数。例如，如果您看到大量寄存器溢出和填充，则需要减少寄存器压力，可能通过手动合并循环或使用编译器选项禁用预取。
尝试不同的页面大小选项。如果单行像素占页面大小的很大一部分，则到达其他行更有可能到达另一页并导致TLB miss。使用更大的内存页面可能会显着减少这种情况。

您的代码的一些具体想法：

只使用一个外循环。您必须尝试找到处理“额外”边缘像素的最快方法。最快的方法可能是不做任何特殊的事情，像“正常”像素一样直接滚动它们，然后忽略它们中的值。
手动展开两个内部循环 - 你只做 9 个像素。
不要使用calculateIndex() - 使用当前像素的地址并通过从当前像素地址中减去或添加适当的值来查找其他像素。例如，内部循环中左上角像素的地址类似于currentPixelAddress - n - 1。

这些会将您的四深嵌套循环转换为单个循环，只需很少的索引计算。

【讨论】：

好建议。在 2) 上更具体一点，您可能会建议在编译器上使用以下开关 -march=native -O3
@MarkSetchell 好吧，OP 没有指定正在使用的编译器，所以我不想深入了解。
很公平，这是你的答案 :-) 我经常发现一些 OP 和初学者不理解更有经验的人所建议的重要性，所以他们忽略了这条建议 - 我记得希望多年前我刚开始工作时，人们已经给出了他们所知道的所有东西的具体例子。我们随便说“分析您的代码”，但没有提及如何编译或运行分析。我们说“符号链接最新版本的XYZ”，但不说如何符号链接或从哪里下载。

【解决方案2】：

一些想法 - 未经测试。

您有 if(ii==i && jj=j) 来测试锐化循环中的中心像素，您对每个像素执行 9 倍。我认为删除 if 并为每个像素执行完全相同的操作会更快，然后在循环外通过添加 10 倍中心像素进行校正。

    // Do central pixel first
    p=src[calculateIndex(i,j,n)];
    sum_red   = 10*p.red;
    sum_green = 10*p.green;
    sum_blue  = 10*p.blue;

    for(ii =i-1; ii <= i + 1; ++ii) {
        for(jj = j-1; jj <= j + 1; ++jj) {
            p = src[calculateIndex(ii, jj, n)];
            //operate according to the instructions
            sum_red -= p.red;
            sum_green -= p.green;
            sum_blue -= p.blue;
        }
    }

在dst[calculateIndex(i, j, n)] = current_pixel; 处，您可能可以在循环开始之前计算一次索引，然后在循环内每次写入时递增指针 - 假设您的数组是连续的且未填充。

index=calculateIndex(1,1,n)
for (i = 1 ; i < n - 1; ++i) {
    for (j =  1 ; j < n - 1 ; ++j) {
        ...
        dst[index++] = current_pixel;
    }
    index+=2; // skip over last pixel of this line and first pixel of next line
}

当您在图像上移动 9 像素的 3x3 窗口时，您可以“记住”最左边的 3 像素列，而不是每个像素添加 9 个像素，您将对离开窗口的最左边的列进行一次减法，并为进入右侧窗口的新列进行 3 次加法，即 4 次计算而不是 9 次。

【讨论】：

嗨，马克，你能详细说明一下吗？我不确定我是否了解如何实施您的想法。非常感谢！
我尝试实施您的第一个建议，结果我的图像变成了全白，知道为什么吗？
尝试在你的调试器下运行，并在修改后的代码开始的地方设置一个断点并单步执行。
知道了。我现在正在尝试删除索引计算...非常感谢！
嗨，在你的第二个建议中，我不明白我们如何在 i 和 j 未初始化的循环之前使用 calcIndex ？也只是为了尝试，我尝试这样做并得到一个“需要左值作为赋值的左操作数”错误......