从多个图像中提取文本

【问题标题】：Extracting text from several images从多个图像中提取文本
【发布时间】：2021-03-30 16:54:17
【问题描述】：

我想从多张图片中提取文字。
我想在 colab 中进行。
我知道如何用一张图片来做到这一点：https://github.com/bhadreshpsavani/ExploringOCR/blob/master/OCRusingTesseract.ipynb
但是怎么一个循环呢，因为我有一百多张图片呢？
提前致谢！

【问题讨论】：

没有循环的概念吗？

标签： cycle text-extraction

【解决方案1】：

我在根目录的 colab.research 中上传了我的图片，并使用以下代码解决了这个任务：

image_ext = ['.jpg', '.png', '.jpeg']
directory = '/'
for file in os.listdir(directory):
  ext = os.path.splitext(file)[-1].lower()
  if ext not in image_ext:
    continue
  filename = os.path.join(directory, file)
  
  extracted_information = pytesseract.image_to_string(Image.open(filename))
  print(extracted_information)

【讨论】：