【发布时间】:2021-10-07 15:12:49
【问题描述】:
我引用https://www.py4u.net/discuss/1545220 从 PDF 文件中提取突出显示的部分。我使用 Flask 上传 pdf 文件,然后对相同和进一步的操作执行注释。但是,我收到一条错误消息:
UnboundLocalError: local variable 'lst' referenced before assignment
我在烧瓶中的 app.py 文件如下所示:
@app.route('/converted',methods = ['GET', 'POST'])
def convert():
global f1
fi = request.files['pdf']
f1 = fi.filename
fi.save(f1)
process_file(f1)
return render_template('converted.html')
和script.py如下:
def process_file(file):
def _parse_highlight(annot: fitz.Annot, wordlist: List[Tuple[float, float, float, float, str, int, int, int]]) -> str:
points = annot.vertices
pno = ''
quad_count = int(len(points) / 4)
sentences = []
for i in range(quad_count):
# where the highlighted part is
r = fitz.Quad(points[i * 4 : i * 4 + 4]).rect
words = [w for w in wordlist if fitz.Rect(w[:4]).intersects(r)]
sentences.append(" ".join(w[4] for w in words))
pno = re.findall(r'^\D*(\d+)', str(annot))
pno = ",".join(pno)
sentences.append(pno)
sentences.append(annot.colors["stroke"])
return sentences
def handle_page(page):
wordlist = page.getText("words") # list of words on page
wordlist.sort(key=lambda w: (w[3], w[0])) # ascending y, then x
highlights = []
annot = page.firstAnnot
while annot:
if annot.type[0] == 8:
highlights.append(_parse_highlight(annot, wordlist))
annot = annot.next
return highlights
def main(filepath: str) -> List:
doc = fitz.open(filepath)
highlights = []
for page in doc:
highlights += (handle_page(page))
return highlights
if __name__ == "__main__":
lst = main(file)
#converting page column to integer type
for x in lst:
x[-2]=int(x[-2])
我该如何解决? pdf还有其他注释吗?
【问题讨论】:
标签: python flask annotations