使用 Python 删除字母图像中的剩余部分答案

【问题标题】：Remove remains in a letter image with Python使用 Python 删除字母图像中的剩余部分
【发布时间】：2019-04-29 11:49:32
【问题描述】：

我有一组表示从单词图像中提取的字母的图像。在某些图像中存在相邻字母的残留物，我想消除它们但我不知道如何。

一些样本

我正在使用 openCV，我尝试了两种方法，但都没有。

使用 findContours：

def is_contour_bad(c):
    return len(c) < 50

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edged = cv2.Canny(gray, 50, 100)

contours = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]

mask = np.ones(image.shape[:2], dtype="uint8") * 255

for c in contours:
    # if the c  ontour is bad, draw it on the mask
    if is_contour_bad(c):
        cv2.drawContours(mask, [c], -1, 0, -1)

# remove the contours from the image and show the resulting images
image = cv2.bitwise_and(image, image, mask=mask)
cv2.imshow("After", image)
cv2.waitKey(0)

我认为它不起作用，因为图像在边缘 cv2.drawContours 无法正确计算面积并且不会消除内部点

使用 connectedComponentsWithStats：

cv2.imshow("Image", img)
cv2.waitKey(0)
nb_components, output, stats, centroids = cv2.connectedComponentsWithStats(img)
sizes = stats[1:, -1];
nb_components = nb_components - 1

min_size = 150

img2 = np.zeros((output.shape))
for i in range(0, nb_components):
    if sizes[i] >= min_size:
        img2[output == i + 1] = 255

cv2.imshow("After", img2)
cv2.waitKey(0)

在这种情况下，我不知道为什么侧面的小元素不将它们识别为连通分量

嗯..我将非常感谢任何帮助！

【问题讨论】：

反转图像，找到轮廓，然后选择最大的轮廓。

标签： python opencv image-processing cv2 outliers

【解决方案1】：

在问题的开头，您提到字母是从单词的图像中提取的。

所以我认为，您本可以正确完成提取。那么你就不会遇到这样的问题了。我可以为您提供一个解决方案，该解决方案适用于从原始图像中提取字母或从您提供的图像中提取和分离字母。

解决方案：

您可以像这样使用convex hull 坐标来分隔字符。

代码：

import cv2
import numpy as np

img = cv2.imread('test.png', 0)
cv2.bitwise_not(img,img)
img2 = img.copy()

ret, threshed_img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
image, contours, hier = cv2.findContours(threshed_img, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)

#--- Black image to be used to draw individual convex hull ---
black = np.zeros_like(img)
contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])

for cnt in contours:
    hull = cv2.convexHull(cnt)

    img3 = img.copy()
    black2 = black.copy()

    #--- Here is where I am filling the contour after finding the convex hull ---
    cv2.drawContours(black2, [hull], -1, (255, 255, 255), -1)
    r, t2 = cv2.threshold(black2, 127, 255, cv2.THRESH_BINARY)
    masked = cv2.bitwise_and(img2, img2, mask = t2)
    cv2.imshow("masked.jpg", masked)
    cv2.waitKey(0)

cv2.destroyAllWindows()

输出：

因此，正如我所建议的，当您从原始图像中提取字符时，最好使用此解决方案，而不是在提取后去除噪声。

【讨论】：

【解决方案2】：

我会尝试以下方法：

沿列求和，以便将每个图像投影到矢量中
假设 white=0 且 black=1，找到该向量中第一个 = 0 的索引值。
从步骤 2 中删除索引值左侧的图像列。
反转步骤 1 中的求和向量
在第四步的反向向量中找到第一个 =0 的索引值。
从步骤 5 中删除反向索引值右侧的图像列。

这对于白色 = 0 和黑色 = 1 的二进制图像非常有效，但如果不是，则有几种方法可以解决此问题，包括图像阈值或设置容差水平（例如，对于步骤 2。在向量中找到第一个索引值 >宽容...）

【讨论】：