【问题标题】:image proccessing further optimization图像处理进一步优化
【发布时间】:2018-07-05 09:28:54
【问题描述】:

我是优化新手,我被分配了一项任务来优化一个尽可能多地处理图像的函数。它拍摄一张图像,对其进行模糊处理,然后保存模糊的图像,然后继续锐化图像,同时保存锐化的图像。

这是我的代码:

typedef struct {
   unsigned char red;
   unsigned char green;
   unsigned char blue;
} pixel;

// I delete the other struct because we can do the same operations with     use of only addresses

//use macro instead of function is more efficient
#define calculateIndex(i, j, n) ((i)*(n)+(j))


// I combine all the functions in one because it is time consuming
void myfunction(Image *image, char* srcImgpName, char* blurRsltImgName,    char* sharpRsltImgName) {
    // use variable from type 'register int' is much more efficient from 'int'
    register int i,j, ii, jj, sum_red, sum_green, sum_blue; 
    //using local variable is much more efficient than using pointer to   pixels from the original image,and updat its value in each iteration
    pixel current_pixel , p;

    //dst will point on the first pixel in the image
    pixel* dst = (pixel*)image->data;

    int squareN = n*n;
    //instead of multiply by 3 - I used shift 
    register int sizeToAllocate = ((squareN)<<1)+(squareN); // use variable from type 'register int' is much more efficient from 'int'
    pixel* src = malloc(sizeToAllocate);

    register int index;

    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);

    ///////////////////////////////////////// first step : smooth //////////////////////////////////////////////////////////////////////


    /**the smooth blur is step that apply the blur-kernel (matrix of ints) over each pixel in the bouns - and make the image more smooth.
*this function was originally used this matrix :
* [1, 1, 1]
* [1, 1, 1]
* [1, 1, 1]
*because the matrix is full of 1 , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    index = calculateIndex(1, 1, n);
    for (i = 1 ; i < n - 1; ++i) {
        for (j =  1 ; j < n - 1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            for(ii = i-1; ii <= i+1; ++ii) {
                for(jj =j-1; jj <= j+1; ++jj) {
                    //take care of the [ii,jj] pixel in the matrix
                    //calculate the adrees of the current pixel
                    pixel p = src[calculateIndex(ii, jj, n)];       
                    //sum the colors' values of the neighbors of the current pixel
                    sum_red += p.red;
                    sum_green +=  p.green;
                    sum_blue += p.blue;
                }
            }
            //calculate the avarage of the colors' values around the current pixel - as written in the instructions
            sum_red = (((sum_red) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_green = (((sum_green) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_blue = (((sum_blue) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient

            current_pixel.red = (unsigned char)sum_red;
            current_pixel.green = (unsigned char)sum_green;
            current_pixel.blue = (unsigned char)sum_blue;
            dst[index++] = current_pixel;
        }
    }
    // write result image to file
    writeBMP(image, srcImgpName, blurRsltImgName);
    
    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);


    ///////////////////////////////////////// second step : sharp //////////////////////////////////////////////////////////////////////



    /** I want to sharp the smooth image . In this step I apply the sharpen kernel (matrix of ints) over each pixel in the bouns - and make the image more sharp.
*this function was originally used this matrix :
* [-1, -1, -1]
* [-1, 9, -1]
* [-1, -1, -1]
*because the matrix is full of (-1) , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable. I operato like that : insted of multiply in (-1) in the end of the step , I define counter initializes with zero , and
*substruct all te colors' values from it. the result is actually the same as multiply by (-1), in more efficient way.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    for (i = 1 ; i < n-1; ++i) {
        for (j =  1 ; j < n-1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            // Do central pixel first
            p=src[calculateIndex(i,j,n)];
            sum_red   = 10*p.red;
            sum_green = 10*p.green;
            sum_blue  = 10*p.blue;

            for(ii =i-1; ii <= i + 1; ++ii) {
                for(jj = j-1; jj <= j + 1; ++jj) {
                    p = src[calculateIndex(ii, jj, n)];
                    //operate according to the instructions
                    sum_red -= p.red;
                    sum_green -= p.green;
                    sum_blue -= p.blue;
                }
            }

            //each pixel's colors' values must match the range [0,255] - I used the idea from the original code

            //the red value must be in the range [0,255]
            if (sum_red < 0) {
                sum_red = 0;
            } else if (sum_red > 255 ) {
                sum_red = 255;
            }
            current_pixel.red = (unsigned char)sum_red;


            //the green value must be in the range [0,255]
            if (sum_green < 0) {
            sum_green = 0;
            } else if (sum_green > 255 ) {
            sum_green = 255;
            }
            current_pixel.green = (unsigned char)sum_green;


            //the blue value must be in the range [0,255]
            if (sum_blue < 0) {
                sum_blue = 0;
            } else if (sum_blue > 255 ) {
                sum_blue = 255;
            }
            current_pixel.blue = (unsigned char)sum_blue;


            // put the updated pixel in [i,j] in the image
            dst[calculateIndex(i, j, n)] = current_pixel;
        }
    }

    //free the allocated space to prevent memory leaks
    free(src);

    // write result image to file
    writeBMP(image, srcImgpName, sharpRsltImgName);
}

我想问一下 if 语句,有什么更好的东西可以代替它们吗?而且更一般地说,任何人都可以在这里发现优化错误,或者可以提供他的意见吗?

非常感谢!


更新代码:

typedef struct {
   unsigned char red;
   unsigned char green;
   unsigned char blue;
} pixel;

// I delete the other struct because we can do the same operations with use of only addresses

//use macro instead of function is more efficient
#define calculateIndex(i, j, n) ((i)*(n)+(j))


// I combine all the functions in one because it is time consuming
void myfunction(Image *image, char* srcImgpName, char* blurRsltImgName, char* sharpRsltImgName) {
    // use variable from type 'register int' is much more efficient from 'int'
register int i,j, ii, jj, sum_red, sum_green, sum_blue; 
    //using local variable is much more efficient than using pointer to pixels from the original image,and updat its value in each iteration
    pixel current_pixel , p;

    //dst will point on the first pixel in the image
    pixel* dst = (pixel*)image->data;

    int squareN = n*n;
    //instead of multiply by 3 - I used shift 
    register int sizeToAllocate = ((squareN)<<1)+(squareN); // use variable from type 'register int' is much more efficient from 'int'
    pixel* src = malloc(sizeToAllocate);

    register int index;

    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);

    ///////////////////////////////////////// first step : smooth //////////////////////////////////////////////////////////////////////


    /**the smooth blur is step that apply the blur-kernel (matrix of ints) over each pixel in the bouns - and make the image more smooth.
*this function was originally used this matrix :
* [1, 1, 1]
* [1, 1, 1]
* [1, 1, 1]
*because the matrix is full of 1 , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    index = calculateIndex(1, 1, n);
    for (i = 1 ; i < n - 1; ++i) {
        for (j =  1 ; j < n - 1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            for(ii = i-1; ii <= i+1; ++ii) {
                for(jj =j-1; jj <= j+1; ++jj) {
                    //take care of the [ii,jj] pixel in the matrix
                    //calculate the adrees of the current pixel
                    pixel p = src[calculateIndex(ii, jj, n)];       
                    //sum the colors' values of the neighbors of the current pixel
                    sum_red += p.red;
                    sum_green +=  p.green;
                    sum_blue += p.blue;
                }
            }
            //calculate the avarage of the colors' values around the current pixel - as written in the instructions
            sum_red = (((sum_red) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_green = (((sum_green) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_blue = (((sum_blue) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient

            current_pixel.red = (unsigned char)sum_red;
            current_pixel.green = (unsigned char)sum_green;
            current_pixel.blue = (unsigned char)sum_blue;
            dst[index++] = current_pixel;
        }
        index += 2;
    }
    // write result image to file
    writeBMP(image, srcImgpName, blurRsltImgName);
    
    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);


    ///////////////////////////////////////// second step : sharp //////////////////////////////////////////////////////////////////////



    /** I want to sharp the smooth image . In this step I apply the sharpen kernel (matrix of ints) over each pixel in the bouns - and make the image more sharp.
*this function was originally used this matrix :
* [-1, -1, -1]
* [-1, 9, -1]
* [-1, -1, -1]
*because the matrix is full of (-1) , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable. I operato like that : insted of multiply in (-1) in the end of the step , I define counter initializes with zero , and
*substruct all te colors' values from it. the result is actually the same as multiply by (-1), in more efficient way.
*/

    index = calculateIndex(1,1,n);
    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    for (i = 1 ; i < n-1; ++i) {
        for (j =  1 ; j < n-1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            // Do central pixel first
            p=src[index];
            sum_red   = 10*p.red;
            sum_green = 10*p.green;
            sum_blue  = 10*p.blue;

            for(ii =i-1; ii <= i + 1; ++ii) {
                for(jj = j-1; jj <= j + 1; ++jj) {
                    p = src[calculateIndex(ii, jj, n)];
                    //operate according to the instructions
                    sum_red -= p.red;
                    sum_green -= p.green;
                    sum_blue -= p.blue;
                }
                index += 2;
            }

            //each pixel's colors' values must match the range [0,255] - I used the idea from the original code

            //the red value must be in the range [0,255]
            if (sum_red < 0) {
                sum_red = 0;
            } else if (sum_red > 255 ) {
                sum_red = 255;
            }
            current_pixel.red = (unsigned char)sum_red;


            //the green value must be in the range [0,255]
            if (sum_green < 0) {
                sum_green = 0;
            } else if (sum_green > 255 ) {
                sum_green = 255;
            }
            current_pixel.green = (unsigned char)sum_green;


            //the blue value must be in the range [0,255]
            if (sum_blue < 0) {
                sum_blue = 0;
            } else if (sum_blue > 255 ) {
                sum_blue = 255;
            }
            current_pixel.blue = (unsigned char)sum_blue;


            // put the updated pixel in [i,j] in the image
            dst[calculateIndex(i, j, n)] = current_pixel;
        }
    }

    //free the allocated space to prevent memory leaks
    free(src);

    // write result image to file
    writeBMP(image, srcImgpName, sharpRsltImgName);
}

----------------------------------- -------------------------------------------更新代码:

typedef struct {
   unsigned char red;
   unsigned char green;
   unsigned char blue;
} pixel;

// I delete the other struct because we can do the same operations with use of only addresses

//use macro instead of function is more efficient
#define calculateIndex(i, j, n) ((i)*(n)+(j))


// I combine all the functions in one because it is time consuming
void myfunction(Image *image, char* srcImgpName, char* blurRsltImgName, char* sharpRsltImgName) {
    // use variable from type 'register int' is much more efficient from 'int'
    register int i,j, ii, jj, sum_red, sum_green, sum_blue; 
    //using local variable is much more efficient than using pointer to pixels from the original image,and updat its value in each iteration
    pixel current_pixel , p;

    //dst will point on the first pixel in the image
    pixel* dst = (pixel*)image->data;

    int squareN = n*n;
    //instead of multiply by 3 - I used shift 
    register int sizeToAllocate = ((squareN)<<1)+(squareN); // use    variable from type 'register int' is much more efficient from 'int'
    pixel* src = malloc(sizeToAllocate);

    register int index;

    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);

    ///////////////////////////////////////// first step : smooth //////////////////////////////////////////////////////////////////////


    /**the smooth blur is step that apply the blur-kernel (matrix of ints) over each pixel in the bouns - and make the image more smooth.
*this function was originally used this matrix :
* [1, 1, 1]
* [1, 1, 1]
* [1, 1, 1]
*because the matrix is full of 1 , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable.
*/

    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    index = n + 1;
    for (i = 1 ; i < n - 1; ++i) {
        for (j =  1 ; j < n - 1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            for(ii = i-1; ii <= i+1; ++ii) {
                for(jj =j-1; jj <= j+1; ++jj) {
                    //take care of the [ii,jj] pixel in the matrix
                    //calculate the adrees of the current pixel
                    pixel p = src[calculateIndex(ii, jj, n)];       
                    //sum the colors' values of the neighbors of the current pixel
                    sum_red += p.red;
                    sum_green +=  p.green;
                    sum_blue += p.blue;
                }
            }
            //calculate the avarage of the colors' values around the current pixel - as written in the instructions
            sum_red = (((sum_red) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_green = (((sum_green) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient
            sum_blue = (((sum_blue) * 0xE38F) >> 19);//instead of dividing by 9 - I used shift because it is more efficient

            current_pixel.red = (unsigned char)sum_red;
            current_pixel.green = (unsigned char)sum_green;
            current_pixel.blue = (unsigned char)sum_blue;
            dst[index++] = current_pixel;
        }
        index += 2;
    }
    // write result image to file
    writeBMP(image, srcImgpName, blurRsltImgName);
    
    //memcpy replace the old functions that converts chars to pixels or pixels to chars. it is very efficient and build-in in c libraries
    memcpy(src, dst, sizeToAllocate);


    ///////////////////////////////////////// second step : sharp //////////////////////////////////////////////////////////////////////



    /** I want to sharp the smooth image . In this step I apply the sharpen kernel (matrix of ints) over each pixel in the bouns - and make the image more sharp.
*this function was originally used this matrix :
* [-1, -1, -1]
* [-1, 9, -1]
* [-1, -1, -1]
*because the matrix is full of (-1) , we don't really need it - the access to the matrix is very expensive . instead of the matrix I used 
*primitive variable. I operate like that : instead of multiply in (-1) in the end of the step , I define counter initializes with zero , and
*substruct all te colors' values from it. the result is actually the same as multiply by (-1), in more efficient way.
*/

    index = calculateIndex(1,1,n);
    //the loops are starting with 1 and not with 0 because we need to check only the pixels with 8 neighbors around them
    for (i = 1 ; i < n-1; ++i) {
        for (j =  1 ; j < n-1 ; ++j) {
            // I used this variables as counters to the colors' values around a specific pixel
            sum_red = 0;
            sum_green = 0;
            sum_blue = 0;

            // Do central pixel first
            p=src[index];
            sum_red   = 10*p.red;
            sum_green = 10*p.green;
            sum_blue  = 10*p.blue;

            for(ii =i-1; ii <= i + 1; ++ii) {
                for(jj = j-1; jj <= j + 1; ++jj) {
                    p = src[calculateIndex(ii, jj, n)];
                    //operate according to the instructions
                    sum_red -= p.red;
                    sum_green -= p.green;
                    sum_blue -= p.blue;
                }
            }

            //each pixel's colors' values must match the range [0,255] - I used the idea from the original code

            //the red value must be in the range [0,255]
            if (sum_red < 0) {
                sum_red = 0;
            } else if (sum_red > 255 ) {
                sum_red = 255;
            }
            current_pixel.red = (unsigned char)sum_red;


            //the green value must be in the range [0,255]
            if (sum_green < 0) {
                sum_green = 0;
            } else if (sum_green > 255 ) {
                sum_green = 255;
            }
            current_pixel.green = (unsigned char)sum_green;


            //the blue value must be in the range [0,255]
            if (sum_blue < 0) {
                sum_blue = 0;
            } else if (sum_blue > 255 ) {
                sum_blue = 255;
            }
            current_pixel.blue = (unsigned char)sum_blue;


            // put the updated pixel in [i,j] in the image
            dst[calculateIndex(i, j, n)] = current_pixel;
        }
        index += 2;
    }

    //free the allocated space to prevent memory leaks
    free(src);

    // write result image to file
    writeBMP(image, srcImgpName, sharpRsltImgName);
}

【问题讨论】:

  • 像高斯模糊滤镜这样的模糊滤镜是可分离的。这是一个非常重要的优化。此外,您可以并行执行一些操作。事实上,在这种情况下,“shift”作为优化是微不足道的。
  • 我的建议,在您执行@IharobAlAsimi 建议的重新:可分离过滤器之前,不要费心优化任何东西。然后,您需要衡量代码的性能。
  • 如果您有工作代码,并且您只是希望人们审查它以尝试改进它,您应该改为询问Code Review;这正是它被创建的原因。本网站是针对您遇到的代码问题提出的问题,而不是关于改进或优化的建议。
  • 另外,根据图像的大小,您可以使用快速傅里叶变换来执行卷积,因为它只是点对点乘法而不是求和。但是如果图像不够大,那么较小的内核和普通卷积可能会更快,在计算内核 + 图像和卷积的 FFT 的成本和直接卷积的成本之间存在权衡。 /跨度>
  • ifs 在四个嵌套循环中。即使您对它们进行了优化,您仍然会遇到缓慢的算法。重新思考算法。

标签: c image image-processing optimization


【解决方案1】:

一些通用优化指南:

  1. 如果您在 x86 上运行,请编译为 64 位二进制文​​件。 x86 实际上是一个寄存器匮乏的 CPU。在 32 位模式下,您几乎只有 5 或 6 个 32 位通用寄存器可用,如果您在 GCC 上使用 -fomit-frame-pointer 之类的优化进行编译,则只能获得“全部”6 个。在 64 位模式下,您将拥有 13 个或 14 个 64 位通用寄存器。

  2. 获得一个好的编译器并使用尽可能高的通用优化级别。

  3. 简介!轮廓!轮廓!实际分析您的代码,以便真正知道性能瓶颈在哪里。任何关于性能瓶颈位置的猜测都可能是错误的。

  4. 一旦找到瓶颈,检查编译器生成的实际指令并查看瓶颈区域,看看发生了什么。也许瓶颈在于编译器由于寄存器压力而不得不做很多register spilling and filling。如果您可以深入到指令级别,这将非常有用。

  5. 使用分析和检查生成的指令的见解来改进您的代码和编译参数。例如,如果您看到大量寄存器溢出和填充,则需要减少寄存器压力,可能通过手动合并循环或使用编译器选项禁用预取。

  6. 尝试不同的页面大小选项。如果单行像素占页面大小的很大一部分,则到达其他行更有可能到达另一页并导致TLB miss。使用更大的内存页面可能会显着减少这种情况。

您的代码的一些具体想法:

  1. 只使用一个外循环。您必须尝试找到处理“额外”边缘像素的最快方法。最快的方法可能是不做任何特殊的事情,像“正常”像素一样直接滚动它们,然后忽略它们中的值。

  2. 手动展开两个内部循环 - 你只做 9 个像素。

  3. 不要使用calculateIndex() - 使用当前像素的地址 并通过从当前像素地址中减去或添加适当的值来查找其他像素。例如,内部循环中左上角像素的地址类似于currentPixelAddress - n - 1

这些会将您的四深嵌套循环转换为单个循环,只需很少的索引计算。

【讨论】:

  • 好建议。在 2) 上更具体一点,您可能会建议在编译器上使用以下开关 -march=native -O3
  • @MarkSetchell 好吧,OP 没有指定正在使用的编译器,所以我不想深入了解。
  • 很公平,这是你的答案 :-) 我经常发现一些 OP 和初学者不理解更有经验的人所建议的重要性,所以他们忽略了这条建议 - 我记得希望多年前我刚开始工作时,人们已经给出了他们所知道的所有东西的具体例子。我们随便说“分析您的代码”,但没有提及如何编译或运行分析。我们说“符号链接最新版本的XYZ”,但不说如何符号链接或从哪里下载。
【解决方案2】:

一些想法 - 未经测试。


您有 if(ii==i &amp;&amp; jj=j) 来测试锐化循环中的中心像素,您对每个像素执行 9 倍。我认为删除 if 并为每个像素执行完全相同的操作会更快,然后在循环外通过添加 10 倍中心像素进行校正。

    // Do central pixel first
    p=src[calculateIndex(i,j,n)];
    sum_red   = 10*p.red;
    sum_green = 10*p.green;
    sum_blue  = 10*p.blue;

    for(ii =i-1; ii <= i + 1; ++ii) {
        for(jj = j-1; jj <= j + 1; ++jj) {
            p = src[calculateIndex(ii, jj, n)];
            //operate according to the instructions
            sum_red -= p.red;
            sum_green -= p.green;
            sum_blue -= p.blue;
        }
    }

dst[calculateIndex(i, j, n)] = current_pixel; 处,您可能可以在循环开始之前计算一次索引,然后在循环内每次写入时递增指针 - 假设您的数组是连续的且未填充。

index=calculateIndex(1,1,n)
for (i = 1 ; i < n - 1; ++i) {
    for (j =  1 ; j < n - 1 ; ++j) {
        ...
        dst[index++] = current_pixel;
    }
    index+=2; // skip over last pixel of this line and first pixel of next line
}

当您在图像上移动 9 像素的 3x3 窗口时,您可以“记住”最左边的 3 像素列,而不是每个像素添加 9 个像素,您将对离开窗口的最左边的列进行一次减法,并为进入右侧窗口的新列进行 3 次加法,即 4 次计算而不是 9 次。

【讨论】:

  • 嗨,马克,你能详细说明一下吗?我不确定我是否了解如何实施您的想法。非常感谢!
  • 我尝试实施您的第一个建议,结果我的图像变成了全白,知道为什么吗?
  • 尝试在你的调试器下运行,并在修改后的代码开始的地方设置一个断点并单步执行。
  • 知道了。我现在正在尝试删除索引计算...非常感谢!
  • 嗨,在你的第二个建议中,我不明白我们如何在 i 和 j 未初始化的循环之前使用 calcIndex ?也只是为了尝试,我尝试这样做并得到一个“需要左值作为赋值的左操作数”错误......
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-07-31
  • 1970-01-01
  • 1970-01-01
  • 2020-08-13
  • 1970-01-01
相关资源
最近更新 更多