【问题标题】：Can I get datas from all images in a folder Python Tesseract?我可以从 Python Tesseract 文件夹中的所有图像中获取数据吗？
【发布时间】：2020-12-19 00:51:18
【问题描述】：

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
print(pytesseract.image_to_string(r'D:\examplepdf2image.png'))

我不想只获取 1 张图像，我想在一个文件夹中获取图像，如果可能的话，我想快速地一张一张地获取图像（例如 1 秒的冷却时间，总共 100 张图像）

[我的另一个邪恶想法是等待照片直播，当照片进入文件夹时，程序会读取并输入它，重要的是现场观看但不一定是这样]

有人可以帮我吗？

谢谢...

{https://towardsdatascience.com/how-to-extract-text-from-images-with-python-db9b87fe432b}

编辑：

从文件夹中的所有图像中提取文本

# storing the text in a single file 
from PIL import Image 
import pytesseract as pt 
import os  

def main(): 
    # path for the folder for getting the raw images 
    path ="C:\\Users\\USER\\Desktop\\Masaüstü\\Test\\Input"
  
    # link to the file in which output needs to be kept 
    fullTempPath ="C:\\Users\\USER\\Desktop\\Masaüstü\\Test\\Output\\outputFile.txt"
  
    # iterating the images inside the folder 
    for imageName in os.listdir(path): 
        inputPath = os.path.join(path, imageName) 
        img = Image.open(inputPath) 

        # applying ocr using pytesseract for python 
        text = pt.image_to_string(img, lang ="eng") 
  
        # saving the  text for appending it to the output.txt file 
        # a + parameter used for creating the file if not present 
        # and if present then append the text content 
        file1 = open(fullTempPath, "a+") 
  
        # providing the name of the image 
        file1.write(imageName+"\n") 
  
        # providing the content in the image 
        file1.write(text+"\n") 
        file1.close()  
  
    # for printing the output file 
    file2 = open(fullTempPath, 'r') 
    print(file2.read()) 
    file2.close()         

if __name__ == '__main__': 
    main()

我找到了这段代码，它正在这里读取和创建文本文件并写入数据。

【问题讨论】：

标签： python image path tesseract python-tesseract

【解决方案1】：

为了方便扫描和获取文件夹中的所有文件，您可以使用glob 或os.walk

import glob,os
folder = "your/folder/path"

# to get all *.png files directly under your folder:
files = glob.glob(folder+"/*.png")
# files will be a list that contains all *.png files directly under folder, not include subfolder. 

# or use os.walk:
result = []
for root,_,file in os.walk(folder):
    if file.endswith('.png'):
        result.append(os.path.join(root,file))
# result will be a list that contains all *.png files in your folder, including subfolders.

如果您想实时监控文件夹并在将新的.png 文件写入文件夹时触发某些操作，

如果你不需要即时响应文件创建，文件夹又不是太拥挤，

您可以做的最简单的事情是每隔几秒钟扫描同一个文件夹，并将新文件列表与旧文件列表进行比较并处理新文件。

如果您想要eventListener 类型的响应，一旦创建文件，操作就会立即触发，您可以查看名为watchdog 的python 库。

这里是 PyPI 主页：watchdog package home page

使用watchdog，您可以像这样创建文件监视器：

from watchdog.events import PatternMatchingEventHandler
from watchdog.observers import Observer

class PNG_Handler(PatternMatchingEventHandler)
    def __init__(self, ):
        super().__init__(patterns=["*.png", ], ignore_directories=False,)

    def on_created(self, event):
        newfilepath = event.src_path 
        # newfilepath is the path to newly created .png file 
        # you can implement your handler method here.
        # the other methods have the same principle.
        
    def on_deleted(self, event):
        pass

    def on_modified(self, event):
        pass

    def on_moved(self, event):
        pass

observer = Observer()
observer.schedule(PNG_Handler(),"path/to/folder", recursive=True)

每当创建“*.png”文件时，都会调用on_created 函数。

【讨论】：