【发布时间】:2018-12-27 13:11:42
【问题描述】:
我正在尝试在我的数据集上从 textacy 中实现“extract.subject_verb_object_triples”功能。但是,我编写的代码非常缓慢且占用大量内存。有没有更高效的实现方式?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != []:
tuples_list.append(tuples_to_list)
tuples_list = []
sp500news['title'].apply(extract_SVO)
print(tuples_list)
样本数据 (sp500news)
date_publish \
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets
【问题讨论】:
-
能否提供一些示例数据?
-
嗨@VivekKalyanarangan,我已经添加了示例数据
-
你能复制粘贴并格式化为代码吗?它比从图像中查看和输入更容易
-
@VivekKalyanarangan -- 完成
标签: python pandas nlp spacy textacy