【发布时间】:2013-12-07 05:58:37
【问题描述】:
A 24-year-old youth died on the spot, after his motorcycle
rammed a divider near Golf market on <LOCATION>BelAir</LOCATION> road
Thursday night. The deceased has been identified as
John(24) hailing from <LOCATION>UK</LOCATION>.
He was originally from <LOCATION>Usa</LOCATION>.
这些句子是 2 个不同的段落。我希望输出看起来像:
Para 1:BelAir
UK
Para 2:Usa
我已将标签的正则表达式标识为:
<(?<tag>\w*)>(?<text>.*)</\k<tag>>
对于段落来说:
(\n|^).*?(?=\n|$)
有没有办法把这些结合起来?或者我应该使用拆分吗?
【问题讨论】:
-
这是嵌入在某种 HTML 或其他标记中,还是独立的?
-
不是独立的。实际上它是 stanfords ner tagger 的输出
标签: java regex text-extraction