使用 Opencv 检测图像中的文本区域答案

【问题标题】：Detect text region in image using Opencv使用 Opencv 检测图像中的文本区域
【发布时间】：2014-08-14 15:52:29
【问题描述】：

我有一张图片，想检测其中的文本区域。

我尝试了 TiRG_RAW_20110219 项目，但结果并不令人满意。如果输入图像是http://imgur.com/yCxOvQS,GD38rCa，它将产生http://imgur.com/yCxOvQS,GD38rCa#1 作为输出。

任何人都可以提出一些替代方案。我希望通过仅将文本区域作为输入发送来改善 tesseract 的输出。

【问题讨论】：

TiRG_RAW_20110219 的链接：ftp.jaist.ac.jp/pub//sourceforge/t/ti/tirg 我正在使用 python 代码。
opencv的场景文本检测功能怎么样？
我已经尝试了这个链接google-melange.com/gsoc/project/details/google/gsoc2013/…提供的代码示例，它是opencv的场景文本检测功能的实现，与上述结果相比，它的性能更差。
stackoverflow.com/questions/10206526/… 和 stackoverflow.com/questions/10255013/… 可以帮忙吗？
谢谢。在发布这个问题之前我已经尝试过这些，但我没有发现它们有用。

标签： python image opencv image-processing python-tesseract

【解决方案1】：

import cv2


def captch_ex(file_name):
    img = cv2.imread(file_name)

    img_final = cv2.imread(file_name)
    img2gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    ret, mask = cv2.threshold(img2gray, 180, 255, cv2.THRESH_BINARY)
    image_final = cv2.bitwise_and(img2gray, img2gray, mask=mask)
    ret, new_img = cv2.threshold(image_final, 180, 255, cv2.THRESH_BINARY)  # for black text , cv.THRESH_BINARY_INV
    '''
            line  8 to 12  : Remove noisy portion 
    '''
    kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,
                                                         3))  # to manipulate the orientation of dilution , large x means horizonatally dilating  more, large y means vertically dilating more
    dilated = cv2.dilate(new_img, kernel, iterations=9)  # dilate , more the iteration more the dilation

    # for cv2.x.x

    _, contours, hierarchy = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)  # findContours returns 3 variables for getting contours

    # for cv3.x.x comment above line and uncomment line below

    #image, contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)


    for contour in contours:
        # get rectangle bounding contour
        [x, y, w, h] = cv2.boundingRect(contour)

        # Don't plot small false positives that aren't text
        if w < 35 and h < 35:
            continue

        # draw rectangle around contour on original image
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 2)

        '''
        #you can crop image and send to OCR  , false detected will return no text :)
        cropped = img_final[y :y +  h , x : x + w]

        s = file_name + '/crop_' + str(index) + '.jpg' 
        cv2.imwrite(s , cropped)
        index = index + 1

        '''
    # write original image with added contours to disk
    cv2.imshow('captcha_result', img)
    cv2.waitKey()


file_name = 'your_image.jpg'
captch_ex(file_name)

【讨论】：

@AmitKushwaha +1 好答案！我正在使用 OpenCV 3.1.0，并且 cv2.findContours() 返回三个值：图像、轮廓、层次结构。您的示例唯一需要的是在 contours 前面添加一个变量
有趣的是，当使用 RETR_LIST 和 CHAIN_APPROX_SIMPLE 时，我倾向于消除大部分这些问题。或者，检查每个框的 x 和 y 坐标，并在误差范围内寻找重叠。如果您已经尝试过带有误报和误报或 LBMP 的 HAAR 级联，请尝试 OCR，如果不好，则丢弃它。
嘿@MichaelDausmann，只需执行“_, contours, hierarchy = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) # 获取轮廓”
我在网上看到的最简单最好的答案之一！ ...谢谢！
cv2.findContours() 函数不再返回图像。因此，必须将语句更改为contours, hierarchy = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)更多详细信息，请参阅：Want to find contours -> ValueError: not enough values to unpack (expected 3, got 2), this appears

【解决方案2】：

由于没有人发布完整的解决方案，这里有一种方法。通过观察所需的文本是白色的并且单词是水平对齐的，我们可以使用颜色分割来提取和 OCR 字母。

执行颜色分割。我们加载图像，转换为 HSV 格式，定义下/上限范围并使用cv2.inRange() 执行颜色分割以获得二进制掩码
扩张以连接文本字符。我们使用cv2.getStructuringElement() 创建一个水平形状的内核，然后使用cv2.dilate() 扩张以将单个字母组合成一个轮廓
删除非文本轮廓。我们使用cv2.findContours() 查找轮廓并使用aspect ratio 过滤以删除非文本字符。由于文本处于水平方向，如果确定轮廓小于预定义的纵横比阈值，则我们通过使用cv2.drawContours()填充轮廓来删除非文本轮廓
执行 OCR。 我们对带有初始掩码的膨胀图像进行逐位运算，以仅隔离文本字符并反转图像，使文本为黑色，背景为白色。最后，我们将图像丢入 Pytesseract OCR

这是每个步骤的可视化：

输入图像

颜色分割生成的蒙版

# Load image, convert to HSV format, define lower/upper ranges, and perform
# color segmentation to create a binary mask
image = cv2.imread('1.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 218])
upper = np.array([157, 54, 255])
mask = cv2.inRange(hsv, lower, upper)

放大图像以连接文本轮廓并使用纵横比过滤去除非文本轮廓

# Create horizontal kernel and dilate to connect text characters
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,3))
dilate = cv2.dilate(mask, kernel, iterations=5)

# Find contours and filter using aspect ratio
# Remove non-text contours by filling in the contour
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    ar = w / float(h)
    if ar < 5:
        cv2.drawContours(dilate, [c], -1, (0,0,0), -1)

按位-掩码和反转都可以为 OCR 准备好结果

# Bitwise dilated image with mask, invert, then OCR
result = 255 - cv2.bitwise_and(dilate, mask)
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

来自 Pytesseract OCR 的结果，使用 --psm 6 配置设置假设一个统一的文本块。查看here 了解更多配置选项

All women become
like their mothers.
That is their tragedy.
No man does.

That's his.

OSCAR WILDE

完整代码

import cv2
import numpy as np
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, convert to HSV format, define lower/upper ranges, and perform
# color segmentation to create a binary mask
image = cv2.imread('1.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 218])
upper = np.array([157, 54, 255])
mask = cv2.inRange(hsv, lower, upper)

# Create horizontal kernel and dilate to connect text characters
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,3))
dilate = cv2.dilate(mask, kernel, iterations=5)

# Find contours and filter using aspect ratio
# Remove non-text contours by filling in the contour
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    ar = w / float(h)
    if ar < 5:
        cv2.drawContours(dilate, [c], -1, (0,0,0), -1)

# Bitwise dilated image with mask, invert, then OCR
result = 255 - cv2.bitwise_and(dilate, mask)
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.imshow('dilate', dilate)
cv2.imshow('result', result)
cv2.waitKey()

使用此 HSV 颜色阈值脚本确定 HSV 下限/上限颜色范围

import cv2
import numpy as np

def nothing(x):
    pass

# Load image
image = cv2.imread('1.jpg')

# Create a window
cv2.namedWindow('image')

# Create trackbars for color change
# Hue is from 0-179 for Opencv
cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
cv2.createTrackbar('VMax', 'image', 0, 255, nothing)

# Set default value for Max HSV trackbars
cv2.setTrackbarPos('HMax', 'image', 179)
cv2.setTrackbarPos('SMax', 'image', 255)
cv2.setTrackbarPos('VMax', 'image', 255)

# Initialize HSV min/max values
hMin = sMin = vMin = hMax = sMax = vMax = 0
phMin = psMin = pvMin = phMax = psMax = pvMax = 0

while(1):
    # Get current positions of all trackbars
    hMin = cv2.getTrackbarPos('HMin', 'image')
    sMin = cv2.getTrackbarPos('SMin', 'image')
    vMin = cv2.getTrackbarPos('VMin', 'image')
    hMax = cv2.getTrackbarPos('HMax', 'image')
    sMax = cv2.getTrackbarPos('SMax', 'image')
    vMax = cv2.getTrackbarPos('VMax', 'image')

    # Set minimum and maximum HSV values to display
    lower = np.array([hMin, sMin, vMin])
    upper = np.array([hMax, sMax, vMax])

    # Convert to HSV format and color threshold
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, lower, upper)
    result = cv2.bitwise_and(image, image, mask=mask)

    # Print if there is a change in HSV value
    if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ):
        print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
        phMin = hMin
        psMin = sMin
        pvMin = vMin
        phMax = hMax
        psMax = sMax
        pvMax = vMax

    # Display result image
    cv2.imshow('image', result)
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

【讨论】：

【解决方案3】：

如果你不介意弄脏你的手，你可以尝试将这些文本区域扩大到一个更大的矩形区域，你可以一次将其提供给 tesseract。

我还建议尝试多次对图像进行阈值处理，然后将每个图像分别输入到 tesseract 中，看看这是否有帮助。您可以将输出与字典单词进行比较，以自动确定特定 OCR 结果是否良好。

【讨论】：