Pytesseract 无法识别文本答案

【问题标题】：Pytesseract Not Recognising TextPytesseract 无法识别文本
【发布时间】：2022-01-02 23:47:05
【问题描述】：

我正在尝试使用 Pytesseract 从下图中读取数字：

不幸的是，即使在使用灰度、阈值、噪声检测或精确边缘检测之后，程序也没有返回任何解决方案。当使用配置仅将数字和 $/ 列入白名单时，程序甚至停止检测高分辨率图像。 (here)

代码如下：


class NumberAnalyser:

    # boilerplate code to pre-process image
    # get grayscale image
    def get_grayscale(self, image):
        return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # noise removal
    def remove_noise(self, image):
        return cv2.medianBlur(image, 5)

    # thresholding
    def thresholding(self, image):
        gray = self.get_grayscale(image)
        (T, threshInv) = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
        # visualize only the masked regions in the image
        masked = cv2.bitwise_not(gray, gray, mask=threshInv)
        ret, thresh1 = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
        ret, thresh2 = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
        ret, thresh3 = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)
        ret, thresh4 = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO)
        ret, thresh5 = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO_INV)
        return thresh4

    # dilation
    def dilate(self, image):
        kernel = np.ones((5, 5), np.uint8)
        return cv2.dilate(image, kernel, iterations=1)

    # erosion
    def erode(self, image):
        kernel = np.ones((5, 5), np.uint8)
        return cv2.erode(image, kernel, iterations=1)

    # opening - erosion followed by dilation
    def opening(self, image):
        kernel = np.ones((5, 5), np.uint8)
        return cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)

    # canny edge detection
    def canny(self, image):
        return cv2.Canny(image, 100, 200)

    # skew correction
    def deskew(self, image):
        coords = np.column_stack(np.where(image > 0))
        angle = cv2.minAreaRect(coords)[-1]
        if angle < -45:
            angle = -(90 + angle)
        else:
            angle = -angle
            (h, w) = image.shape[:2]
            center = (w // 2, h // 2)
            M = cv2.getRotationMatrix2D(center, angle, 1.0)
            rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
            return rotated

    # template matching
    def match_template(self, image, template):
        return cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)

    def numbers(self, img_path):

        reader = cv2.imread(img_path)
        # reader = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_RGB2BGR)'

        gray = self.get_grayscale(reader)
        thresh = self.thresholding(reader)
        opening = self.opening(reader)
        canny = self.canny(reader)
        noiseless = self.remove_noise(reader)

        # cv2.imshow('canny', canny)
        # cv2.waitKey(0)
        # cv2.imshow('gray', gray)
        # cv2.waitKey(0)
        cv2.imshow('threshold', thresh)
        cv2.waitKey(0)
        # cv2.imshow('opening', opening)
        # cv2.waitKey(0)
        # cv2.imshow('noise removal', noiseless)
        # cv2.waitKey(0)
        # cv2.imshow('og', reader)
        # cv2.waitKey(0)

        print('yes')
        print(pt.image_to_string(thresh, config='--psm 11, -c tessedit_char_whitelist=$,0123456789'))

--psm 11 配置添加/删除不会改变任何东西。

任何帮助将不胜感激！

【问题讨论】：

这不是minimal reproducible example。此代码不会按原样运行。有太多函数被遗漏了（self.opening、self.canny 等）。
嗨@bfris，我已经添加了必要的功能！希望这没问题。
@ckyzm。更好，但您仍然需要导入、类的实例化以及对正确函数的调用。我们应该能够剪切和粘贴您的代码并运行它，这样我们就可以直接解决问题。

标签： opencv image-processing computer-vision python-tesseract pytesser

【解决方案1】：

您连续应用多个简单阈值，但您还应该使用其他类型的阈值（例如 adaptive 和 inRange）对其进行测试。

例如，如果您在给定示例中使用inRange thresholding：

高分辨率图像的结果将是：

0.38 版本的输出：

20000
4.000
100

低分辨率图像的结果将是：

0.38 版本的输出：

44.900
16.000
34

很遗憾，只能正确识别中间的数字。如果您设置范围值，生成的图像可能会给出更好的结果。

代码：

import cv2
import pytesseract
from numpy import array

img = cv2.imread("eO1XG.png")  # Load the images: high-res: l9Zbt.png, low-res: eO1XG.png
img = cv2.cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
msk = cv2.inRange(img, array([94, 0, 196]), array([179, 84, 255]))  # for low resolution
# msk = cv2.inRange(img, array([0, 0, 0]), array([179, 26, 255]))  # for high resolution
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=1)
thr = 255 - cv2.bitwise_and(dlt, msk)
txt = pytesseract.image_to_string(thr, config='--psm 6 digits')
print(txt)
cv2.imshow("", thr)
cv2.waitKey(0)

【讨论】：

非常感谢！我现在就试一试，但只想对如此全面的解决方案表示感谢！