python-pdf文件(持续更新

python-pdf文件(持续更新)

介绍

Python处理pdf模块很多,其他很麻烦暂时不写,有时间再更新(其实是我很懒..)

其他模块比较繁琐,建议使用这个下面第一个

pdfplumber

使用简单,我用来找一下指定文字,配合re

具体文档在这https://zhuanlan.zhihu.com/p/90991510

找指定文件文字信息

import pdfplumber
	
with pdfplumber.open("test.pdf") as pdf:
	first_page = pdf.pages[0] #取第一页
	text = first_page.extract_text()
	print(first_page.extract_text())#打印第一页第一个字文字信息
	file_name = re.findall(\'\d{11}\', text)[0]

使用正则匹配想要拿到的数据..
就这样

2022-12-23
2021-12-28
2021-08-16
2021-10-31
2021-12-05
2022-12-23
2022-01-18
2021-08-03