Pytesseract 无法识别“3”

【问题标题】：Pytesseract fails to recognize '3'Pytesseract 无法识别“3”
【发布时间】：2021-05-18 14:56:06
【问题描述】：

from PIL import Image
import pytesseract, time, PADBS
pytesseract.pytesseract.tesseract_cmd = r"C:/tesseract/Tesseract-OCR/tesseract.exe"

image = Image.open('3.png')
print(pytesseract.image_to_string(image))

Image with '3' Image with '10'

当尝试读取“3.png”时，它没有输出而结束。但是当尝试读取“10.png”时，它会成功读取。我试图在不同的配置上运行它； --oem 3 -psm 13。我尝试了 --oem 1 到 3。但没有任何效果。它无法识别此号码的可能原因是什么？我可以在代码中进行哪些更改以使其正常工作？

【问题讨论】：

标签： python python-tesseract python-3.9

【解决方案1】：

我想你错过了页面分割模式6：

6 假设一个统一的文本块。 Source

对于 4.1.1 版本，结果将为 3。

代码：

import cv2
import pytesseract

# Load the image
img = cv2.imread("3.png")

# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")

# Print
print(pytesseract.get_tesseract_version())
print(txt)

# Display
cv2.imshow("", gry)
cv2.waitKey(0)

Result:

4.1.1
3

【讨论】：