【问题标题】：Dependency parse large text file with python使用python依赖解析大文本文件
【发布时间】：2019-01-31 02:49:24
【问题描述】：

我正在尝试解析一个大的 txt 文件（大约 2000 句）。当我想设置模型路径时，我得到了这个按摩：

NLTK 无法找到 stanford-parser.jar！设置类路径环境变量。而且，当我将 CLASSPATH 设置为此文件时，会出现另一条消息： NLTK 无法找到 stanford-parser-(\d+)(.(\d+))+-models.jar！设置 CLASSPATH 环境变量。

你能帮我解决它吗？这是我的代码：

导入 nltk

从 nltk.parse.stanford 导入 StanfordDependencyParser

dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")

================================================ ============================== NLTK 无法找到 stanford-parser.jar！设置类路径环境变量。

有关 stanford-parser.jar 的更多信息，请参阅：

lex-parser.shtml

导入操作系统

os.environ['CLASSPATH'] = "stanford-corenlp-full-2018-10-05/*"

dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")

================================================ ============================== NLTK 无法找到 stanford-parser.jar！设置类路径环境变量。

有关 stanford-parser.jar 的更多信息，请参阅：

https://nlp.stanford.edu/software/lex-parser.shtml

os.environ['CLASSPATH'] = "stanford-corenlp-full-2018-10-05/stanford-parser-full-2018-10-17/stanford-parser.jar"

>>>dependency_parser = StanfordDependencyParser(model_path="stanford-corenlp-full-2018-10-05/stanford-parser-full-2018-10-17/edu/stanford/nlp/models/lexparser/englishPCFG.ser .gz")

NLTK 无法找到 stanford-parser-(\d+)(.(\d+))+-models.jar！设置 CLASSPATH 环境变量。

有关 stanford-parser-(\d+)(.(\d+))+-models.jar 的更多信息，请参阅： https://nlp.stanford.edu/software/lex-parser.shtml

【问题讨论】：

标签： parsing dependencies stanford-nlp

【解决方案1】：

您应该获得 Python 原生的新的 stanfordnlp 依赖解析器！它在 CPU 上的运行速度会比在 GPU 上慢，但它仍然应该运行得相当快。

只需运行pip install stanfordnlp 即可安装。

import stanfordnlp
stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
doc.sentences[0].print_dependencies()

还有一个有用的命令行工具：

python -m stanfordnlp.run_pipeline -l en example.txt

详情请看：https://stanfordnlp.github.io/stanfordnlp/

GitHub：https://github.com/stanfordnlp/stanfordnlp

【讨论】：