Tesseract OCR 无法按预期工作，无法从图像中获取完整的文本。 C＃答案

【问题标题】：Tesseract OCR not working as expected, unable to get the complete text from image. c#Tesseract OCR 无法按预期工作，无法从图像中获取完整的文本。 C＃
【发布时间】：2016-06-28 13:36:22
【问题描述】：

我的图像中包含所有数字（PFA 图像）enter image description here，所有数字都没有出现在输出文本中。运行以下代码后我收到的文本是：

75491024385252003967

。我从 https://github.com/tesseract-ocr/langdata

下载了我的训练数据

谁能指导我在这里做错了什么？

enter code here


       string file = @"C:\Images\image.jpg";
        char[] textArray = null;
        using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
        {

            using (var img = Pix.LoadFromFile(file))
            {

                using (var page = engine.Process(img))
                {
                    var text = page.GetText();
                    text = Regex.Replace(text, @"\t|\n|\r|\s", "");
                    text = text.Trim(' ');
                    textArray = text.ToCharArray();

                }

            }
        }

【问题讨论】：

标签： c# ocr tesseract text-extraction

【解决方案1】：

如果您仍然没有找到解决方案，您可能想尝试我们的 Leadtools OCR，它由我工作的 LEAD Technologies 授权。我能够在我们的 .NET OCR 演示中使用该图像，并在单个字符串中获取所有数字。我根本不需要使用训练数据。提取的文本包括字符之间的空格，但您可以使用相同的 Regex 命令来修复它。这是生成的 pdf 的屏幕截图：

结果导出为 PDF

【讨论】：