【问题标题】:spaCy nlp - replace stringspaCy nlp - 替换字符串
【发布时间】:2020-04-10 23:44:13
【问题描述】:
import spacy
import en_core_web_sm
import re


nlp = en_core_web_sm.load()
document_string= 'Electronically signed by : John Douglas.; Jun 13 2018 11:13AM CST, Adam Smith.'
nlp_doc = nlp(document_string)
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
matcher.add('FULL_NAME', None, pattern)
matches = matcher(nlp_doc)
for match_id, start, end in matches:
     span = nlp_doc[start:end]
     names = span.text
     print(span.text)

输出:

John Douglas
Adam Smith

我需要将其替换为 [hidden] 并打印 document_string 并显示隐藏值(之前的输出)

所需输出:

Electronically signed by : [hidden].; Jun 13 2018 11:13AM CST, [hidden].

【问题讨论】:

  • 到底是什么问题?有没有调试过?

标签: python replace nlp spacy matcher


【解决方案1】:

您可以在这里简单地使用.replace()

new_doc = nlp_doc.text
names = []

pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}] 
matcher.add('FULL_NAME', None, pattern) 
matches = matcher(nlp_doc) 

for match_id, start, end in matches: 
    span = nlp_doc[start:end] 
    names.append(span.text) 

for name in names: 
    new_doc = new_doc.replace(name,'[hidden]')

结果:

In [114]: new_doc
Out[114]: 'Electronically signed by : [hidden].; Jun 13 2018 11:13AM CST, [hidden].'

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-03-22
    • 1970-01-01
    • 1970-01-01
    • 2012-04-26
    • 1970-01-01
    • 2012-09-25
    相关资源
    最近更新 更多