从ppt 提取所有文字
语言:python 3.8
IDE:pycharm
python-pptx允许用户创建新的ppt和对现有ppt更改。
从pycharm引入pptx
pycharm->prefereces->project(xxxx)->project interpreter
点左下角“+” 搜索“python-pptx”
点 左下角install package(安装时间有点长),等待安装成功。
在程序中引入这个包
from pptx import Presentation
加载ppt文件
m_ppt = Presentation(\'test.pptx\')
计算加载ppt的页数
print(len(m_ppt.slides))
获取每页ppt包含的文字
for slide in m_ppt.slides: for shape in slide.shapes: if not shape.has_text_frame: continue for paragraph in shape.text_frame.paragraphs: for content in paragraph.runs: print(content.text)
运行结果如下:
全部代码如下:
# -- coding: utf-8 -- from pptx import Presentation m_ppt = Presentation(\'test.pptx\') print(len(m_ppt.slides))
for slide in m_ppt.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for content in paragraph.runs:
print(content.text)