Java 正则表达式不工作多行答案

【问题标题】：Java Regex not working multilineJava 正则表达式不工作多行
【发布时间】：2015-11-17 15:46:59
【问题描述】：

昨晚我得到了一些帮助，想出一个正则表达式来捕获尽可能小的组。我需要获取一串歌词并在其中找到一个搜索词组。我遇到的问题是我无法让它看起来多行。

我有一个包含我读过的歌词的文本文件，这只是歌曲的一部分。（括号不在文本文件中，我只是用它们来显示我要捕获的组。

 The first [time we fall in love. 
 Love can be exciting, it can be a bloody bore. 
 Love can be a pleasure or nothing but a chore.
 Love can be like a dull routine, 
 It can run you around until you're out of steam. 
 It can treat you well, it can treat you mean, 
 Love can mess you around, 
 Love can pick you up, it can bring you down]. 
 But they'll never know The feelings we show

我使用正则表达式的短语是

 time can bring you down

我使用字符串生成器来创建歌词字符串，然后歌词包含 \n 字符。我尝试做一个 replaceAll 来剥离它们，但它仍然没有用。如果我进入文本文件并只写一行说时间可以让你失望，它可以工作，但如果我把它写成两行它就不会。

我尝试在我的正则表达式中使用 \n 但它最终捕获了大部分歌曲，因为时间是第二个词。这是我目前正在尝试使用的正则表达式：

(?is)(\bTime\b)(?:(?!\n\b(?:time|can|bring|you|down)\b\n).)*(\bcan\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\bbring\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\byou\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\bdown\b)

我试图捕捉歌词中括号中的内容。这是我正在使用的方法，它接收歌词和 searchPhrase 并返回它找到的字符串的长度。

    static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Split the phrase up into separate words
    String[] phrase = lyricsPhrase.split("[^a-zA-Z]+");

    //Helper string for regex so we can get smallest grouping
    String regexHelper = lyricsPhrase.replaceAll(" ","|").toLowerCase();

    //Start to build the regex
    StringBuilder regex = new StringBuilder("(?im)"+"(\\" + "b" + phrase[0] + "\\b)");

    //loop through each word in the phrase
    for(int i = 1; i < phrase.length; i++){ 

            //add this to the regex we will search for
            regex.append("(?:(?!\\b(?:" + regexHelper + ")\\b).)*(\\b" + phrase[i] + "\\b)");   

    }

    //Create the pattern
    Pattern p = Pattern.compile(regex.toString(), Pattern.DOTALL);
    Matcher m = p.matcher(lyrics);

    //string for regex match found
    String regexMatch = "";
        while(m.find()){

            regexMatch = m.group();
            System.out.println(regexMatch);
    }

    return regexMatch.length();

}

我将继续尝试并试图弄明白，我认为这是在正则表达式中工作的问题，但不是 100% 确定。谢谢！

【问题讨论】：

仍然无法使用正则表达式，有什么帮助吗？

标签： java regex

【解决方案1】：

您正在尝试搜索字符串中的单词组合。这可以通过使用word1.*?word2 作为正则表达式轻松实现。在这里，单词一和单词二之间可以有 n 个字符。 ? 表示惰性匹配。尽可能少。
但这里的问题是您试图在多行中搜索模式。当您使用 . 元字符时，它在一行中工作。 . 是除换行符之外的所有元字符。
您可以通过使用(.|\n)* 而不是使用.* 轻松克服这个问题

我在下面更新了你的代码。

public class Regexa2 {
 static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Start to build the regex
    String regex = lyricsPhrase.replaceAll(" ","(.|\\n)*?").toLowerCase();

    System.out.println(regex);
    //Create the pattern
    Pattern p = Pattern.compile(regex.toString(), Pattern.DOTALL);
    Matcher m = p.matcher(lyrics);

    //string for regex match found
    String regexMatch = "";
        while(m.find()){

            regexMatch = m.group();
            System.out.println(regexMatch);
    }

    return regexMatch.length();

}

public static void main(String[] args) {
    String lyrics = "The first time we fall in love. \n" + 
            "Love can be exciting, it can be a bloody bore. \n" + 
            "Love can be a pleasure or nothing but a chore.\n" + 
            "Love can be like a dull routine, \n" + 
            "It can run you around until you're out of steam. \n" + 
            "It can treat you well, it can treat you mean, \n" + 
            "Love can mess you around, \n" + 
            "Love can pick you up, it can bring you down. \n" + 
            "But they'll never know The feelings we show ";
    String phrase = "time can bring you down";
    Regexa2.rankPhrase(lyrics, phrase);
 }
}

【讨论】：

感谢您的回复。当我使用正则表达式尝试它时，它似乎卡在了 while 循环中
@CjWeber 您是否尝试过此处发布的代码。它正在工作。