如何使用 OpenCV 和 Tesseract 正确检测图像中的单词答案

【问题标题】：how to detect words in an image with OpenCV and Tesseract properly如何使用 OpenCV 和 Tesseract 正确检测图像中的单词
【发布时间】：2021-12-29 08:42:26
【问题描述】：

我正在开发一个应用程序，该应用程序使用 OpenCV 读取图像文件并使用 Tesseract 处理其中的文字。使用以下代码，Tesseract 检测到不包含文本的额外矩形。

void Application::Application::OpenAndProcessImageFile(void)
{
    OPENFILENAMEA ofn;
    ZeroMemory(&ofn, sizeof(OPENFILENAMEA));

    char szFile[260] = { 0 };
    // Initialize remaining fields of OPENFILENAMEA structure
    ofn.lStructSize     = sizeof(ofn);
    ofn.hwndOwner       = mWindow->getHandle();
    ofn.lpstrFile       = szFile;
    ofn.nMaxFile        = sizeof(szFile);
    ofn.lpstrFilter     = "JPG\0*.JPG\0PNG\0*.PNG\0";
    ofn.nFilterIndex    = 1;
    ofn.lpstrFileTitle  = NULL;
    ofn.nMaxFileTitle   = 0;
    ofn.lpstrInitialDir = NULL;
    ofn.Flags           = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;

    //open the picture dialog and select the image
    if (GetOpenFileNameA(&ofn) == TRUE) {
        std::string filePath = ofn.lpstrFile;
        
        //load image
        mImage = cv::imread(filePath.c_str());

        //process image     
        tesseract::TessBaseAPI ocr = tesseract::TessBaseAPI();

        ocr.Init(NULL, "eng");
        ocr.SetImage(mImage.data, mImage.cols, mImage.rows, 3, mImage.step);

        Boxa* bounds = ocr.GetWords(NULL);
        for (int i = 0; i < bounds->n; ++i) {
            Box* b = bounds->box[i];
            cv::rectangle(mImage, { b->x,b->y,b->w,b->h }, { 0, 255, 0 }, 2);
        }

        ocr.End();
        
        //show image
        cv::destroyAllWindows();
        cv::imshow("İşlenmiş Resim", mImage);
    }
}

这是输出图像

正如您所见，Tesseract 处理根本不包含单词的区域。我该如何解决这个问题？

【问题讨论】：

标签： c++ opencv tesseract

【解决方案1】：

Tesseract 基于字符识别而不是文本检测。即使有些区域没有文字，tesseract 也可以将某些特征视为文字。

您需要做的是，使用文本检测算法先检测文本区域，然后应用 tesseract。 Here 是一个用于文本检测的 dnn 模型的教程，非常棒。

我很快将您的图像应用到此，这是输出：

你可以通过改变模型的输入参数来获得更好的结果。我只是使用默认的。

【讨论】：

感谢您的回答。您能解释一下 confThreshold 和 nmsThreshold 参数的用途吗？
它实际上是一个 score_threshold，检查here