OpenCV-Python 中的简单数字识别 OCR答案

【问题标题】：Simple Digit Recognition OCR in OpenCV-PythonOpenCV-Python 中的简单数字识别 OCR
【发布时间】：2012-03-13 20:50:09
【问题描述】：

我正在尝试在 OpenCV-Python (cv2) 中实现“数字识别 OCR”。这只是为了学习目的。我想学习 OpenCV 中的 KNearest 和 SVM 功能。

我有每个数字的 100 个样本（即图像）。我想和他们一起训练。

OpenCV 示例附带一个示例letter_recog.py。但我仍然无法弄清楚如何使用它。我不明白样本，响应等是什么。另外，它首先加载了一个txt文件，我首先不明白。

稍后搜索了一下，我可以在 cpp 样本中找到一个 letter_recognition.data。我使用它并在 letter_recog.py 的模型中为 cv2.KNearest 做了一个代码（仅用于测试）：

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

它给了我一个大小为 20000 的数组，我不明白它是什么。

问题：

1) 什么是 letter_recognition.data 文件？如何从我自己的数据集中构建该文件？

2) results.reval() 表示什么？

3) 我们如何使用 letter_recognition.data 文件（KNearest 或 SVM）编写一个简单的数字识别工具？

【问题讨论】：

标签： python opencv numpy computer-vision ocr

【解决方案1】：

好吧，我决定在我的问题上锻炼自己来解决上述问题。我想要的是使用 OpenCV 中的 KNearest 或 SVM 功能实现一个简单的 OCR。下面是我做了什么以及如何做的。（仅用于学习如何使用 KNearest 进行简单的 OCR）。

1) 我的第一个问题是关于 OpenCV 样本附带的 letter_recognition.data 文件。我想知道那个文件里面有什么。

它包含一个字母，以及该字母的 16 个特征。

而this SOF 帮助我找到了它。这 16 个特性在论文Letter Recognition Using Holland-Style Adaptive Classifiers 中有解释。（虽然最后有些功能我没看懂）

2) 因为我知道，如果不了解所有这些功能，很难做到这种方法。我尝试了其他一些论文，但对于初学者来说都有点困难。

So I just decided to take all the pixel values as my features.（我并不担心准确性或性能，我只是希望它能够工作，至少准确性最低）

我为我的训练数据拍摄了下图：

（我知道训练数据量较少。但是，由于所有字母的字体和大小都相同，我决定尝试一下）。

为了准备训练数据，我在 OpenCV 中做了一个小代码。它做了以下事情：

它会加载图像。
选择数字（显然是通过轮廓查找和对字母的面积和高度应用约束以避免错误检测）。
围绕一个字母绘制边界矩形并等待key press manually。这次我们自己按数字键对应框中的字母。
一旦按下相应的数字键，它就会将此框的大小调整为 10x10，并将 100 个像素值保存在一个数组（此处为样本）中，并将相应的手动输入的数字保存在另一个数组中（此处为响应）。
然后将两个数组保存在单独的 txt 文件中。

在数字手动分类结束时，train data(train.png)中的所有数字都是我们自己手动标注的，如下图所示：

以下是我用于上述目的的代码（当然，不是那么干净）：

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)

现在我们进入训练和测试部分。

对于我使用下图的测试部分，它具有与我曾经训练过的相同类型的字母。

对于训练，我们执行以下操作：

加载我们之前保存的 txt 文件
创建我们正在使用的分类器实例（这里是 KNearest）
然后我们使用 KNearest.train 函数来训练数据

出于测试目的，我们执行以下操作：

我们加载用于测试的图像
像之前一样处理图像并使用轮廓方法提取每个数字
为其绘制边界框，然后将其大小调整为 10x10，并将其像素值存储在一个数组中，如前所述。
然后我们使用 KNearest.find_nearest() 函数来找到最接近我们给定的项目。（如果幸运的话，它会识别出正确的数字。）

我在下面的单个代码中包含了最后两个步骤（训练和测试）：

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

它成功了，下面是我得到的结果：

在这里它以 100% 的准确率工作。我认为这是因为所有数字的种类和大小都相同。

但无论如何，这对于初学者来说是一个好的开始（我希望如此）。

【讨论】：

+1 很长的帖子，但很有教育意义。这应该去opencv tag info
如果有人感兴趣，我用这段代码制作了一个合适的 OO 引擎，以及一些花里胡哨：github.com/goncalopp/simple-ocr-opencv
请注意，如果您拥有定义明确的完美字体，则无需使用 SVM 和 KNN。例如，数字 0、4、6、9 构成一个组，数字 1、2、3、5、7 构成另一个组，而 8 构成另一个组。该组由欧拉数给出。那么“0”没有端点，“4”有两个端点，“6”和“9”通过质心位置来区分。 “3”是另一组中唯一的一个，具有 3 个端点。 “1”和“7”以骨架长度区分。与数字一起考虑凸包时，“5”和“2”有两个孔，可以通过最大孔的质心来区分。
一流的教程。谢谢！需要进行一些更改才能使其与 OpenCV 的最新 (3.1) 版本一起使用：contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE) => _,contours,hierarchy = cv2.findContours (thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE), model = cv2.KNearest() => model = cv2.ml.KNearest_create(), model.train(samples,responses) => model.train(samples,cv2.ml .ROW_SAMPLE,responses), retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1) => retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
@JohannesBrodwall 感谢您的更新，快速说明-您的最后一次更正略有偏差，应为：retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1) => retval,结果, neigh_resp, dists = model.findNearest(roismall, k = 1)

【解决方案2】：

对C++代码感兴趣的可以参考下面的代码。感谢 Abid Rahman 的精彩解释。

过程同上，但轮廓查找仅使用第一层轮廓，因此该算法仅使用每个数字的外轮廓。

创建样本和标签数据的代码

//Process image to extract contour
Mat thr,gray,con;
Mat src=imread("digit.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); //Threshold to find contour
thr.copyTo(con);

// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;  
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
    Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
    rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
    Mat ROI = thr(r); //Crop the image
    Mat tmp1, tmp2;
    resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
    tmp1.convertTo(tmp2,CV_32FC1); //convert to float
    sample.push_back(tmp2.reshape(1,1)); // Store  sample data
    imshow("src",src);
    int c=waitKey(0); // Read corresponding label for contour from keyoard
    c-=0x30;     // Convert ascii to intiger value
    response_array.push_back(c); // Store label to a mat
    rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);    
}

// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert  to float

FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<<endl;

imshow("src",src);
waitKey();

训练和测试代码

Mat thr,gray,con;
Mat src=imread("dig.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); // Threshold to create input
thr.copyTo(con);


// Read stored sample and label for training
Mat sample;
Mat response,tmp;
FileStorage Data("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data["data"] >> sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label["label"] >> response;
Label.release();


KNearest knn;
knn.train(sample,response); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;

vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;

//Create input sample by contour finding and cropping
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst(src.rows,src.cols,CV_8UC3,Scalar::all(0));

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through each contour for first hierarchy level .
{
    Rect r= boundingRect(contours[i]);
    Mat ROI = thr(r);
    Mat tmp1, tmp2;
    resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
    tmp1.convertTo(tmp2,CV_32FC1);
    float p=knn.find_nearest(tmp2.reshape(1,1), 1);
    char name[4];
    sprintf(name,"%d",(int)p);
    putText( dst,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}

imshow("src",src);
imshow("dst",dst);
imwrite("dest.jpg",dst);
waitKey();

结果

在结果中，第一行中的点被检测为 8，我们还没有针对点进行训练。此外，我正在考虑将第一层级中的每个轮廓作为样本输入，用户可以通过计算面积来避免它。

【讨论】：

运行这段代码我累了。我能够创建样本和标签数据。但是当我运行测试训练文件时，它运行时出现错误*** stack smashing detected ***:，因此我没有得到最终正确的图像，因为你在上面（绿色数字）
我将代码中的char name[4]; 更改为char name[7];，我没有收到与堆栈相关的错误，但我仍然没有得到正确的结果。我得到一个像这里的图像 i.imgur.com/qRkV2B4.jpg >
@skm 确保您获得的轮廓数与图像中的位数相同，也可以尝试在控制台上打印结果。
您好，我们可以加载一个训练好的网络来使用吗？

【解决方案3】：

我在生成训练数据时遇到了一些问题，因为有时很难识别最后选择的字母，所以我将图像旋转了 1.5 度。现在按顺序选择每个字符，训练后测试仍然显示 100% 的准确率。代码如下：

import numpy as np
import cv2

def rotate_image(image, angle):
  image_center = tuple(np.array(image.shape[1::-1]) / 2)
  rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
  result = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR)
  return result

img = cv2.imread('training_image.png')
cv2.imshow('orig image', img)
whiteBorder = [255,255,255]
# extend the image border
image1 = cv2.copyMakeBorder(img, 80, 80, 80, 80, cv2.BORDER_CONSTANT, None, whiteBorder)
# rotate the image 1.5 degrees clockwise for ease of data entry
image_rot = rotate_image(image1, -1.5)
#crop_img = image_rot[y:y+h, x:x+w]
cropped = image_rot[70:350, 70:710]
cv2.imwrite('rotated.png', cropped)
cv2.imshow('rotated image', cropped)
cv2.waitKey(0)

对于示例数据，我对脚本进行了一些更改，如下所示：

import sys
import numpy as np
import cv2

def sort_contours(contours, x_axis_sort='LEFT_TO_RIGHT', y_axis_sort='TOP_TO_BOTTOM'):
    # initialize the reverse flag
    x_reverse = False
    y_reverse = False
    if x_axis_sort == 'RIGHT_TO_LEFT':
        x_reverse = True
    if y_axis_sort == 'BOTTOM_TO_TOP':
        y_reverse = True
    
    boundingBoxes = [cv2.boundingRect(c) for c in contours]
    
    # sorting on x-axis 
    sortedByX = zip(*sorted(zip(contours, boundingBoxes),
    key=lambda b:b[1][0], reverse=x_reverse))
    
    # sorting on y-axis 
    (contours, boundingBoxes) = zip(*sorted(zip(*sortedByX),
    key=lambda b:b[1][1], reverse=y_reverse))
    # return the list of sorted contours and bounding boxes
    return (contours, boundingBoxes)

im = cv2.imread('rotated.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
contours, boundingBoxes = sort_contours(contours, x_axis_sort='LEFT_TO_RIGHT', y_axis_sort='TOP_TO_BOTTOM')

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28 and h < 40:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.ubyte)
responses = responses.reshape((responses.size,1))
print("training complete")

np.savetxt('generalsamples.data',samples,fmt='%i')
np.savetxt('generalresponses.data',responses,fmt='%i')

【讨论】：