如何从段落中提取字符串？答案

【问题标题】：How to extract a string from a paragraph?如何从段落中提取字符串？
【发布时间】：2013-12-07 05:58:37
【问题描述】：

A 24-year-old youth died on the spot, after his motorcycle
 rammed a divider near Golf market on <LOCATION>BelAir</LOCATION> road 
 Thursday night. The deceased has been identified as
 John(24) hailing from <LOCATION>UK</LOCATION>.

He was originally from <LOCATION>Usa</LOCATION>.

这些句子是 2 个不同的段落。我希望输出看起来像：

Para 1:BelAir 
       UK

Para 2:Usa

我已将标签的正则表达式标识为：

<(?<tag>\w*)>(?<text>.*)</\k<tag>>

对于段落来说：

(\n|^).*?(?=\n|$)

有没有办法把这些结合起来？或者我应该使用拆分吗？

【问题讨论】：

这是嵌入在某种 HTML 或其他标记中，还是独立的？
不是独立的。实际上它是 stanfords ner tagger 的输出

标签： java regex text-extraction

【解决方案1】：

试试这个

String str = "A 24-year-old youth died on the spot, after his motorcycle " +
            "rammed a divider near Golf market on <LOCATION>BelAir</LOCATION> road" +
            " Thursday night. The deceased has been identified as  John(24) hailing from <LOCATION>UK</LOCATION>." +
            "\n He was originally from <LOCATION>Usa</LOCATION>.";
    String [] paras=str.split("\n"); //Divide the string into two paragraphs
    Pattern pattern = Pattern.compile("<LOCATION>(.*?)</LOCATION>");
        for(int i=0;i<paras.length;i++)
        {
            System.out.print("Para "+(i+1)+": ");
            Matcher matcher = pattern.matcher(paras[i]);
            while (matcher.find()) {
                System.out.println(matcher.group(1));
            }
        }

输出将是

Para 1: BelAir
UK
Para 2: Usa

【讨论】：

谢谢我试试这个
事情是，我不能手动放一个\n。我从文本文件中读取内容，我想识别每个段落中的所有位置

【解决方案2】：

检查字符串是否以'\n'开头

while(){//read line
   if(string.startsWith("\n")==false){
     // your regex expration for tags
     // store it in a list
   }
   else{
     // add a null in a List 
   }
}

所以你的列表看起来像

BelAir
US
Null
USA

所以在每个 null 之后都有一个新的 Para

【讨论】：