【发布时间】:2014-05-01 19:57:56
【问题描述】:
我正在使用 tesseract ocr 从图像中提取文本。保留文档的结构对我来说非常重要。目前 tesseract 不保留结构,实际上它改变了文本的顺序。我的输入是下图。
我得到的输出如下:
Someto the left
Someto the left
Some in the middle
Some in the middle
Some with some tab
Some with some tab
Some with some space between them
Some with some space between them
Sometext here
Sometext here
this much
this much
如何获得与图像中相同结构的所需输出?
即如下:
Some text here
Some text here
Some to the left
Some to the left
Some in the middle
Some in the middle
Some with some tab
Some with some tab
Some with some space between them this much
Some with some space between them this much
【问题讨论】: