为什么 Tesseract 不能正确识别文本？答案

【问题标题】：Why Tesseract is not recognizing text properly?为什么 Tesseract 不能正确识别文本？
【发布时间】：2021-08-31 00:58:21
【问题描述】：

我想从下面留下的图像中识别文本。我不知道图书馆为什么会在空白处识别出一些字母。我已经尝试过更改配置参数。

我编写的这个函数返回这个：Legendary X yp

这里是代码

def get_text(image, coord):
    im = Image.open(image)
    image_cropped = crop_text(im, coord)
    text = pytesseract.image_to_string(image_cropped, lang='eng', config='--psm 7')
    print(text)
    return text

我要从中提取文本的图像：

【问题讨论】：

显示的图像是crop_text 的结果还是image 输入到get_text 的结果？在这种情况下，也需要代码和coord。参考minimal reproducible example。

标签： python image-processing ocr tesseract python-tesseract

【解决方案1】：

无需进一步（预处理）处理，以下内容可在我的机器上运行：

from PIL import Image
import pytesseract

img = Image.open('mIjNm.png')
text = pytesseract.image_to_string(img, config='--psm 6')
print(text.replace('\n', '').replace('\f', ''))
# Legendary X

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19042-SP0
Python:        3.9.6
PyCharm:       2021.2
Pillow:        8.3.1
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

【讨论】：