【发布时间】:2020-06-30 01:05:48
【问题描述】:
我想打破这个句子以便使用 spacy 处理它
Finally, on 1595 July 22 at 2h 40m am, when the sun was at 7° 59' 52" Leo, 101,487 distant from earth, Mars's mean longitude 11s 14° 9' 5", and anomaly 164° 48' 55", and consequent eccentric position from the vicarious hypothesis 17° 16' 36" Pisces: the apparent position of Mars, from the most select observations, was 4° 11' 10" Taurus, lat. 2° 30' S ^37. Thus we twice have Mars in the most opportune position, in quadrature with the sun, while the positions of earth and Mars are also distant by a quadrant.\n
我希望结果是这样的:
[
Finally, on 1595 July 22 at 2h 40m am, when the sun was at 7° 59' 52" Leo, 101,487 distant from earth, Mars's mean longitude 11s 14° 9' 5", and anomaly 164° 48' 55", and consequent eccentric position from the vicarious hypothesis 17° 16' 36" Pisces: the apparent position of Mars, from the most select observations, was 4° 11' 10" Taurus, lat. 2° 30' S ^37. ,
Thus we twice have Mars in the most opportune position, in quadrature with the sun, while the positions of earth and Mars are also distant by a quadrant.\n ]
意思是两句,第一句应该在lat之后。 2° 30' S ^37。但自从纬度。有一个溺爱,它打破了lat之后的句子。
但是直到现在我都没有找到解决方案,我使用了这两种方法:
def set_custom_boundaries(doc):
for token in doc[:-1]:
if token.text in ("lat."):
# print("Detected:", token.text)
doc[token.i].is_sent_start = False
return doc
nlp.add_pipe(set_custom_boundaries, before="parser")
nlp.pipeline
和
a.split('.')
我认为第一个代码中的一些小错误。
以上两种方法都不能按需要拆分句子!
一般来说,为了将段落分割成句子,您有什么建议? (尤其是当我们有)这种缩写的情况下存在
lat.
【问题讨论】:
-
意思是两句,第一句在lat之后。 2° 30' S ^37. 你能以更好/更清晰的格式分享文本吗? 两者都不起作用!这是什么意思? 一般来说,为了将段落分割成句子,您有什么建议? 使用专为使用自然语言而设计的库,您已经在这样做了。
-
我已经编辑了文本。基本上问题是像“lat”这样的词。这是缩写导致不想要的句子中断,你将如何分割段落以纠正句子