【问题标题】:How to define tokens in spaCy NLP in Python?如何在 Python 的 spaCy NLP 中定义标记?
【发布时间】:2017-05-25 14:50:08
【问题描述】:

我想在我的 FlaskApp 中使用 NLP 的 spaCy 功能。我一直在官方网站上搜索不同的例子:(对于 spaCy) https://spacy.io/docs/usage/tutorials

和(对于 Flask) https://realpython.com/blog/python/flask-by-example-part-3-text-processing-with-requests-beautifulsoup-nltk/

在 MyWebapp 中,我有代码可以发布来自 parse_news_from 的 NLP 分析结果:

@app.route('/submit', methods=['POST'])
def submit_textarea():
    if(parse_news_from(format(request.form["text"]))):
       print("The news is parsed sucessfully!");
    return talk_title;

目前parse_news_from 与 NLTK 库一起使用,但我将使用 spaCy。 这是我从官方来源获得的 spaCy 代码:

from spacy.en import English
import _regex
parser = English()

# Test Data
multiSentence = "There is an art, it says, or rather, a knack to flying." \
                 "The knack lies in learning how to throw yourself at the ground and miss." \
                 "In the beginning the Universe was created. This has made a lot of people "\
                 "very angry and been widely regarded as a bad move."
# all you have to do to parse text is this:
#note: the first time you run spaCy in a file it takes a little while to load up its modules
parsedData = parser(multiSentence)

# Let's look at the tokens
# All you have to do is iterate through the parsedData
# Each token is an object with lots of different properties
# A property with an underscore at the end returns the string representation
# while a property without the underscore returns an index (int) into spaCy's vocabulary
# The probability estimate is based on counts from a 3 billion word
# corpus, smoothed using the Simple Good-Turing method.
for i, token in enumerate(parsedData):
    print("original:", token.orth, token.orth_)
    print("lowercased:", token.lower, token.lower_)
    print("lemma:", token.lemma, token.lemma_)
    print("shape:", token.shape, token.shape_)
    print("prefix:", token.prefix, token.prefix_)
    print("suffix:", token.suffix, token.suffix_)
    print("log probability:", token.prob)
    print("Brown cluster id:", token.cluster)
    print("----------------------------------------")
    if i > 1:
        break

执行后出现错误:

File "/home/xxx/anaconda3/lib/python3.6/site-packages/_regex_core.py", line 21, in <module>
    import _regex
ImportError: /home/xxx/anaconda3/lib/python3.6/site-packages/_regex.cpython-36m-x86_64-linux-gnu.so: undefined symbol: PySlice_AdjustIndices

是否有任何工作示例如何开始?我的错在哪里?谢谢

【问题讨论】:

    标签: python ios web-applications nlp


    【解决方案1】:

    我发现了上述错误的问题,这对我来说太不可预测了。它在这里描述: How to fix a python spaCy error: "undefined symbol: PySlice_AdjustIndices"?

    【讨论】:

      猜你喜欢
      • 2018-06-01
      • 1970-01-01
      • 2019-12-20
      • 1970-01-01
      • 1970-01-01
      • 2022-08-14
      • 2019-06-23
      • 2023-02-04
      • 1970-01-01
      相关资源
      最近更新 更多