从图像中的统一背景中提取页面答案

【问题标题】：Extract a page from a uniform background in an image从图像中的统一背景中提取页面
【发布时间】：2015-08-09 19:32:36
【问题描述】：

如果我有一张图片，其中有一页文字是在统一的背景上拍摄的，我如何自动检测纸张和背景之间的边界？

我要检测的图像示例如下所示。我将要处理的图像由统一背景上的单个页面组成，它们可以任意角度旋转。

【问题讨论】：

StackOverflow 不是代码编写服务。到目前为止，您尝试过什么？
我只是要求任何可行的方法。
我认为您需要更具体地了解问题约束。页面可以旋转吗？可以是任意大小吗？一张图片可以有多个页面吗？或者图像中还有其他矩形对象？
是的，这个问题太不受约束了。同意@eigenchris 关于您可能遇到的场景类型。
所以只有一页出现在统一的背景中。页面可以旋转。

标签： image matlab image-processing computer-vision

【解决方案1】：

一种简单的方法是在将图像转换为灰度图像后按某个已知值对图像进行阈值处理。这种方法的问题是我们正在应用一个全局阈值，因此如果阈值设置得太高，图像底部的一些纸张将会丢失。如果您将阈值设置得太低，那么您肯定会得到纸张，但您也会包含很多背景像素，并且可能很难通过后处理来去除这些像素。

我可以建议的一件事是使用自适应阈值算法。过去对我有用的算法是Bradley-Roth adaptive thresholding algorithm。你可以在这里阅读我不久前评论过的一篇文章：

Bradley Adaptive Thresholding -- Confused (questions)

但是，如果您想了解其要点，则首先拍摄图像灰度版本的integral image。积分图像很重要，因为它允许您以O(1) 复杂度计算窗口内的像素总和。然而，积分图像的计算通常是O(n^2)，但你只需要这样做一次。使用积分图像，您扫描大小为s x s 的像素的邻域，并检查平均强度是否小于t% 在此s x s 窗口内的实际平均值，然后将其归类为背景像素。如果它更大，那么它被归类为前景的一部分。这是自适应的，因为阈值是使用局部像素邻域完成的，而不是使用全局阈值。

我在这里为您编写了 Bradley-Roth 算法的实现。该算法的默认参数是s 是图像宽度的 1/8，t 是 15%。因此，您可以这样调用它来调用默认参数：

out = adaptiveThreshold(im);

im 是输入图像，out 是二值图像，表示属于前景 (logical true) 或背景 (logical false)。您可以使用第二个和第三个输入参数：s 是阈值窗口的大小，t 是我们上面讨论的百分比，可以像这样调用函数：

out = adaptiveThreshold(im, s, t);

因此，算法的代码如下所示：

function [out] = adaptiveThreshold(im, s, t)

%// Error checking of the input
%// Default value for s is 1/8th the width of the image
%// Must make sure that this is a whole number
if nargin <= 1, s = round(size(im,2) / 8); end

%// Default value for t is 15
%// t is used to determine whether the current pixel is t% lower than the
%// average in the particular neighbourhood
if nargin <= 2, t = 15; end

%// Too few or too many arguments?
if nargin == 0, error('Too few arguments'); end
if nargin >= 4, error('Too many arguments'); end

%// Convert to grayscale if necessary then cast to double to ensure no
%// saturation
if size(im, 3) == 3
    im = double(rgb2gray(im));
elseif size(im, 3) == 1
    im = double(im);
else
    error('Incompatible image: Must be a colour or grayscale image');
end

%// Compute integral image
intImage = cumsum(cumsum(im, 2), 1);

%// Define grid of points
[rows, cols] = size(im);
[X,Y] = meshgrid(1:cols, 1:rows);

%// Ensure s is even so that we are able to index the image properly
s = s + mod(s,2);

%// Access the four corners of each neighbourhood
x1 = X - s/2; x2 = X + s/2;
y1 = Y - s/2; y2 = Y + s/2;

%// Ensure no co-ordinates are out of bounds
x1(x1 < 1) = 1;
x2(x2 > cols) = cols;
y1(y1 < 1) = 1;
y2(y2 > rows) = rows;

%// Count how many pixels there are in each neighbourhood
count = (x2 - x1) .* (y2 - y1);

%// Compute row and column co-ordinates to access each corner of the
%// neighbourhood for the integral image
f1_x = x2; f1_y = y2;
f2_x = x2; f2_y = y1 - 1; f2_y(f2_y < 1) = 1;
f3_x = x1 - 1; f3_x(f3_x < 1) = 1; f3_y = y2;
f4_x = f3_x; f4_y = f2_y;

%// Compute 1D linear indices for each of the corners
ind_f1 = sub2ind([rows cols], f1_y, f1_x);
ind_f2 = sub2ind([rows cols], f2_y, f2_x);
ind_f3 = sub2ind([rows cols], f3_y, f3_x);
ind_f4 = sub2ind([rows cols], f4_y, f4_x);

%// Calculate the areas for each of the neighbourhoods
sums = intImage(ind_f1) - intImage(ind_f2) - intImage(ind_f3) + ...
    intImage(ind_f4);

%// Determine whether the summed area surpasses a threshold
%// Set this output to 0 if it doesn't
locs = (im .* count) <= (sums * (100 - t) / 100);
out = true(size(im));
out(locs) = false;

end

当我使用您的图像并设置s = 500 和t = 5 时，这是代码，这是我得到的图像：

im = imread('http://i.stack.imgur.com/MEcaz.jpg');
out = adaptiveThreshold(im, 500, 5);
imshow(out);

您可以看到图像底部白色处有一些虚假的白色像素，并且纸张内部有一些我们需要填充的孔。因此，让我们使用一些形态学并声明一个 15 x 15 正方形的结构元素，执行一个开口以去除噪声像素，然后在我们完成后填充孔：

se = strel('square', 15);
out = imopen(out, se);
out = imfill(out, 'holes');
imshow(out);

这就是我得到的结果：

还不错吧？现在如果你真的想看看分割后的图像是什么样子，我们可以使用这个蒙版并将它与原始图像相乘。这样，任何属于纸张的像素都会被保留，而属于背景的像素会消失：

out_colour = bsxfun(@times, im, uint8(out));
imshow(out_colour);

我们得到这个：

您必须尝试使用这些参数，直到它适合您，但上述参数是我用来使其适用于您向我们展示的特定页面的参数。图像处理就是反复试验，按照正确的顺序排列处理步骤，直到得到足够好的东西来满足您的目的。

图像过滤愉快！

【讨论】：

@roni - 谢谢 :) :) :)
很棒的文章，我猜 SO 是一个代码编写服务：p
@excaza 哈哈。这个问题很有趣……这就是我写东西的原因。如果我根本不感兴趣，我会投票关闭。实际上，在我自己尝试了一些工作并对我得到的结果感到惊讶之前，我不会碰这个……我想我会与大家分享它们！