【问题标题】:Can I get datas from all images in a folder Python Tesseract?我可以从 Python Tesseract 文件夹中的所有图像中获取数据吗?
【发布时间】:2020-12-19 00:51:18
【问题描述】:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
print(pytesseract.image_to_string(r'D:\examplepdf2image.png'))

我不想只获取 1 张图像,我想在一个文件夹中获取图像,如果可能的话,我想快速地一张一张地获取图像(例如 1 秒的冷却时间,总共 100 张图像)

[我的另一个邪恶想法是等待照片直播,当照片进入文件夹时,程序会读取并输入它,重要的是现场观看但不一定是这样]

有人可以帮我吗?

谢谢...

{https://towardsdatascience.com/how-to-extract-text-from-images-with-python-db9b87fe432b}

编辑:

从文件夹中的所有图像中提取文本

# storing the text in a single file 
from PIL import Image 
import pytesseract as pt 
import os  

def main(): 
    # path for the folder for getting the raw images 
    path ="C:\\Users\\USER\\Desktop\\Masaüstü\\Test\\Input"
  
    # link to the file in which output needs to be kept 
    fullTempPath ="C:\\Users\\USER\\Desktop\\Masaüstü\\Test\\Output\\outputFile.txt"
  
    # iterating the images inside the folder 
    for imageName in os.listdir(path): 
        inputPath = os.path.join(path, imageName) 
        img = Image.open(inputPath) 

        # applying ocr using pytesseract for python 
        text = pt.image_to_string(img, lang ="eng") 
  
        # saving the  text for appending it to the output.txt file 
        # a + parameter used for creating the file if not present 
        # and if present then append the text content 
        file1 = open(fullTempPath, "a+") 
  
        # providing the name of the image 
        file1.write(imageName+"\n") 
  
        # providing the content in the image 
        file1.write(text+"\n") 
        file1.close()  
  
    # for printing the output file 
    file2 = open(fullTempPath, 'r') 
    print(file2.read()) 
    file2.close()         

if __name__ == '__main__': 
    main() 

我找到了这段代码,它正在这里读取和创建文本文件并写入数据。

【问题讨论】:

    标签: python image path tesseract python-tesseract


    【解决方案1】:

    为了方便扫描和获取文件夹中的所有文件,您可以使用globos.walk

    import glob,os
    folder = "your/folder/path"
    
    # to get all *.png files directly under your folder:
    files = glob.glob(folder+"/*.png")
    # files will be a list that contains all *.png files directly under folder, not include subfolder. 
    
    # or use os.walk:
    result = []
    for root,_,file in os.walk(folder):
        if file.endswith('.png'):
            result.append(os.path.join(root,file))
    # result will be a list that contains all *.png files in your folder, including subfolders.
    
    

    如果您想实时监控文件夹并在将新的.png 文件写入文件夹时触发某些操作,

    如果你不需要即时响应文件创建,文件夹又不是太拥挤,

    您可以做的最简单的事情是每隔几秒钟扫描同一个文件夹,并将新文件列表与旧文件列表进行比较并处理新文件。

    如果您想要eventListener 类型的响应,一旦创建文件,操作就会立即触发,您可以查看名为watchdog 的python 库。

    这里是 PyPI 主页:watchdog package home page

    使用watchdog,您可以像这样创建文件监视器:

    from watchdog.events import PatternMatchingEventHandler
    from watchdog.observers import Observer
    
    class PNG_Handler(PatternMatchingEventHandler)
        def __init__(self, ):
            super().__init__(patterns=["*.png", ], ignore_directories=False,)
    
        def on_created(self, event):
            newfilepath = event.src_path 
            # newfilepath is the path to newly created .png file 
            # you can implement your handler method here.
            # the other methods have the same principle.
            
        def on_deleted(self, event):
            pass
    
        def on_modified(self, event):
            pass
    
        def on_moved(self, event):
            pass
    
    observer = Observer()
    observer.schedule(PNG_Handler(),"path/to/folder", recursive=True)
    
    

    每当创建“*.png”文件时,都会调用on_created 函数。

    【讨论】:

      猜你喜欢
      • 2018-10-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-06-11
      • 1970-01-01
      • 1970-01-01
      • 2015-08-30
      • 2011-05-16
      相关资源
      最近更新 更多