【问题标题】:TesseractNotFound - PytesserTesseractNotFound - Pytesser
【发布时间】:2013-04-30 15:19:38
【问题描述】:

我正在尝试使用从HERE 下载的 pytesser 进行 OCR。

这里是pytesser.py的代码

try:
    import cv2.cv as cv
    OPENCV_AVAILABLE = True
except ImportError:
    OPENCV_AVAILABLE = False

from subprocess import Popen, PIPE
import os

PROG_NAME = 'tesseract'
TEMP_IMAGE = 'tmp.bmp'
TEMP_FILE = 'tmp'

#All the PSM arguments as a variable name (avoid having to know them)
PSM_OSD_ONLY = 0
PSM_SEG_AND_OSD = 1
PSM_SEG_ONLY = 2
PSM_AUTO = 3
PSM_SINGLE_COLUMN = 4
PSM_VERTICAL_ALIGN = 5
PSM_UNIFORM_BLOCK = 6
PSM_SINGLE_LINE = 7
PSM_SINGLE_WORD = 8
PSM_SINGLE_WORD_CIRCLE = 9
PSM_SINGLE_CHAR = 10

class TesseractException(Exception): #Raised when tesseract does not return 0
    pass

class TesseractNotFound(Exception): #When tesseract is not found in the path
    pass

def check_path(): #Check if tesseract is in the path raise TesseractNotFound otherwise
    for path in os.environ.get('PATH', '').split(';'):
        filepath = os.path.join(path, PROG_NAME)
        if os.path.exists(filepath) and not os.path.isdir(filepath):
            return True
    raise TesseractNotFound

def process_request(input_file, output_file, lang=None, psm=None):
    args = [PROG_NAME, input_file, output_file] #Create the arguments
    if lang is not None:
        args.append("-l")
        args.append(lang)
    if psm is not None:
        args.append("-psm")
        args.append(str(psm))
    proc = Popen(args, stdout=PIPE, stderr=PIPE) #Open process
    ret = proc.communicate() #Launch it

    code = proc.returncode
    if code != 0:
        if code == 2:
            raise TesseractException, "File not found"
        if code == -11:
            raise TesseractException, "Language code invalid: "+ret[1]
        else:
            raise TesseractException, ret[1]

def iplimage_to_string(im, lang=None, psm=None):
    if not OPENCV_AVAILABLE:
        print "OpenCV not Available"
        return -1
    else:
        cv.SaveImage(TEMP_IMAGE, im)
        txt = image_to_string(TEMP_IMAGE, lang, psm)
        os.remove(TEMP_IMAGE)
        return txt

def image_to_string(file,lang=None, psm=None):
    check_path() #Check if tesseract available in the path
    process_request(file, TEMP_FILE, lang, psm) #Process command
    f = open(TEMP_FILE+".txt","r") #Open back the file
    txt = f.read()
    os.remove(TEMP_FILE+".txt")
    return txt


if __name__ =='__main__':
    print image_to_string("image.jpg", "fra", PSM_AUTO) #Example

问题是,当我尝试执行上面链接中提供的示例 sn-p 时,我收到错误 `Tesseract

>>> import pytesser
>>> txt = pytesser.image_to_string('C:/output.png')

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    txt = pytesser.image_to_string('C:/output.png')
  File "C:\Python27\lib\site-packages\pytesser.py", line 71, in image_to_string
    check_path() #Check if tesseract available in the path
  File "C:\Python27\lib\site-packages\pytesser.py", line 38, in check_path
    raise TesseractNotFound
TesseractNotFound
>>> 

我的Tesseract-OCR 安装在C:\Tesseract-OCR

我已经设置了TESSDATA_PREFIX=C:\Tesseract-OCR\ 还有Path=C:\Tesseract-OCR

我想知道为什么我得到TessractnotFound 尽管设置了正确的环境变量?

谢谢。

【问题讨论】:

  • 运行print os.environ.get('PATH', '')会得到什么?
  • @RobWatts 谢谢回复,其实问题已经解决了,只好在pytesser.py做些修改,马上贴出来。

标签: python python-2.7 ocr tesseract


【解决方案1】:

pytesser.py 的更改解决了我的问题,设置的路径没有问题。

变化如下:

PROG_NAME=tesseract改为PROG_NAME=tesseract.exe

image_to_string()函数中f.close()后添加txt=f.read()

不是这样的:)

【讨论】:

    猜你喜欢
    • 2013-03-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多