【发布时间】:2020-12-19 00:51:18
【问题描述】:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
print(pytesseract.image_to_string(r'D:\examplepdf2image.png'))
我不想只获取 1 张图像,我想在一个文件夹中获取图像,如果可能的话,我想快速地一张一张地获取图像(例如 1 秒的冷却时间,总共 100 张图像)
[我的另一个邪恶想法是等待照片直播,当照片进入文件夹时,程序会读取并输入它,重要的是现场观看但不一定是这样]
有人可以帮我吗?
谢谢...
{https://towardsdatascience.com/how-to-extract-text-from-images-with-python-db9b87fe432b}
编辑:
从文件夹中的所有图像中提取文本
# storing the text in a single file
from PIL import Image
import pytesseract as pt
import os
def main():
# path for the folder for getting the raw images
path ="C:\\Users\\USER\\Desktop\\Masaüstü\\Test\\Input"
# link to the file in which output needs to be kept
fullTempPath ="C:\\Users\\USER\\Desktop\\Masaüstü\\Test\\Output\\outputFile.txt"
# iterating the images inside the folder
for imageName in os.listdir(path):
inputPath = os.path.join(path, imageName)
img = Image.open(inputPath)
# applying ocr using pytesseract for python
text = pt.image_to_string(img, lang ="eng")
# saving the text for appending it to the output.txt file
# a + parameter used for creating the file if not present
# and if present then append the text content
file1 = open(fullTempPath, "a+")
# providing the name of the image
file1.write(imageName+"\n")
# providing the content in the image
file1.write(text+"\n")
file1.close()
# for printing the output file
file2 = open(fullTempPath, 'r')
print(file2.read())
file2.close()
if __name__ == '__main__':
main()
我找到了这段代码,它正在这里读取和创建文本文件并写入数据。
【问题讨论】:
标签: python image path tesseract python-tesseract