【问题标题】:Digit recognition with openCV and python使用 openCV 和 python 进行数字识别
【发布时间】:2017-03-13 21:33:53
【问题描述】:

我正在尝试在 openCV 中实现用于视频捕获的数字识别程序。它可以使用普通(静止)图片作为输入,但是当我添加视频捕获功能时,如果我移动相机,它会在录制时卡住。我的程序代码在这里:

import numpy as np
import cv2
from sklearn.externals import joblib
from skimage.feature import hog


# Load the classifier
clf = joblib.load("digits_cls.pkl")

# Default camera has index 0 and externally(USB) connected cameras have
# indexes ranging from 1 to 3
cap = cv2.VideoCapture(0)

while(True):


  # Capture frame-by-frame
  ret, frame = cap.read()

  # Convert to grayscale and apply Gaussian filtering
  im_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

  im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)


  # Threshold the image
  ret, im_th = cv2.threshold(im_gray.copy(), 120, 255, cv2.THRESH_BINARY_INV)

  # Find contours in the binary image 'im_th'

  _, contours0, hierarchy  = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

  # Draw contours in the original image 'im' with contours0 as input

  # cv2.drawContours(frame, contours0, -1, (0,0,255), 2, cv2.LINE_AA, hierarchy, abs(-1))


  # Rectangular bounding box around each number/contour
  rects = [cv2.boundingRect(ctr) for ctr in contours0]

  # Draw the bounding box around the numbers
  for rect in rects:

   cv2.rectangle(frame, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 255, 0), 3)

   # Make the rectangular region around the digit
   leng = int(rect[3] * 1.6)
   pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
   pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
   roi = im_th[pt1:pt1+leng, pt2:pt2+leng]



   # Resize the image
   roi = cv2.resize(roi, (28, 28), im_th, interpolation=cv2.INTER_AREA)
   roi = cv2.dilate(roi, (3, 3))
   # Calculate the HOG features
   roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
   nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
   cv2.putText(frame, str(int(nbr[0])), (rect[0], rect[1]),cv2.FONT_HERSHEY_DUPLEX, 2, (0, 255, 255), 3)



   # Display the resulting frame
   cv2.imshow('frame', frame)
   cv2.imshow('Threshold', im_th)



   # Press 'q' to exit the video stream
   if cv2.waitKey(1) & 0xFF == ord('q'):
      break


# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

我得到的错误是在调整大小的 ROI(感兴趣区域)处没有输入。我觉得这很奇怪,因为只要我不在图片中移动太多东西,它就可以工作。我确定不是相机有问题,因为我尝试了很多不同的相机。以下是具体的错误信息:

Traceback (most recent call last):
File "C:\Users\marti\Desktop\Code\Python\digitRecognition\Video_cap.py", line 55, in <module>
 roi = cv2.resize(roi, (28, 28), im_th, interpolation=cv2.INTER_AREA)
cv2.error: D:\Build\OpenCV\opencv-3.2.0\modules\imgproc\src\imgwarp.cpp:3492: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize

Picture of the program in action, if a move the numbers around the program freezes

【问题讨论】:

  • 请更正代码的缩进
  • 缩进已被修正。
  • 我建议你也应该简化你的代码,只保留重现问题所必需的部分。

标签: python opencv image-processing scikit-learn


【解决方案1】:

在尝试查找轮廓之前,您正在使用固定阈值进行预处理。由于cv2.resize() 必须调整大小,它期望 roi 矩阵具有非零宽度和高度。我猜在某些时候,当您移动相机时,由于您的非自适应预处理算法,您不会检测到任何数字。

我建议您在移动相机时显示阈值图像和轮廓叠加在框架上的图像。这样你就可以调试算法了。另外,请确保print(len(rects)) 以查看是否检测到任何矩形。

另一个技巧是保存帧并在崩溃前保存的最后一帧上运行算法,以找出该帧导致错误的原因。

总而言之,如果您希望代码产生有意义的结果,您确实需要控制代码。解决方案 - 根据您的数据 - 可能是在阈值操作之前使用某种 对比度增强 和/或使用 Otsu 方法自适应阈值 加上一些额外的过滤。

【讨论】:

    【解决方案2】:

    试试这个怎么样:

    if roi.any():
            roi = cv2.resize(roi, (28, 28), frame, interpolation=cv2.INTER_AREA)
            roi = cv2.dilate(roi, (3, 3))
    

    我认为这是你想要的(我简化了你的例子):

    cap = cv2.VideoCapture(0)
    
    while(True):
        # Capture frame-by-frame
        ret, frame = cap.read()
        frame2=frame.copy()
        # Convert to grayscale and apply Gaussian filtering
        im_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
        ret, im_th = cv2.threshold(im_gray.copy(), 120, 255, cv2.THRESH_BINARY_INV)
        # Find contours in the binary image 'im_th'
        _, contours0, hierarchy  = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        # Rectangular bounding box around each number/contour
        rects = [cv2.boundingRect(ctr) for ctr in contours0]
        # Draw the bounding box around the numbers
        for rect in rects:
            cv2.rectangle(frame, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 0, 255), 3)
            # Make the rectangular region around the digit
            leng = int(rect[3] * 1.6)
            pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
            pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
            roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
    
        # Resize the image
        if roi.any():
            roi = cv2.resize(roi, (28, 28), frame, interpolation=cv2.INTER_AREA)
            roi = cv2.dilate(roi, (3, 3))
    
        # Display the resulting frame
        cv2.imshow('frame', frame)
        #cv2.imshow('Threshold', im_th)
    
        # Press 'q' to exit the video stream
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
    

    【讨论】:

      猜你喜欢
      • 2021-12-01
      • 2017-07-08
      • 2016-08-06
      • 2011-12-25
      • 2020-02-21
      • 2015-12-21
      • 1970-01-01
      • 1970-01-01
      • 2015-07-14
      相关资源
      最近更新 更多