【问题标题】:I need help getting Regex expression correct我需要帮助使正则表达式正确
【发布时间】:2011-08-05 04:19:47
【问题描述】:

我正在尝试获取一个正则表达式来在一行中查找我的模式的多个条目。注意:我已经使用 Regex 大约一个小时了... =/

例如:

<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>

应该匹配两次:

1) <a href="G2532" id="1">back</a>
2) <a href="G2564" id="2">next</a>

我认为答案在于正确掌握贪婪、不情愿和占有欲,但我似乎无法让它发挥作用......

我想我已经接近了,到目前为止我创建的正则表达式字符串是:

(<a href=").*(" id="1">).*(</a>)

但正则表达式匹配器返回 1 个匹配,整个字符串...

我在下面的代码中有一个(可编译的)Java 正则表达式测试工具。这是我最近(徒劳的)尝试使用该程序来获得它,输出应该非常直观。

Enter your regex: (<a href=").*(" id="1">).*(</a>)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (<a href=").*(" id="1">).*(</a>)+
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: ((<a href=").*(" id="1">).*(</a>))?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
I found the text "" starting at index 63 and ending at index 63.

Enter your regex: ((<a href=").*(" id="1">).*(</a>))+?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (((<a href=").*(" id="1">).*(</a>))+?)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

这是Java:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexTestHarness {

    public static void main(String[] args){
        try{
            while (true) {

                System.out.print("\nEnter your regex: ");

                BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
                Pattern pattern = Pattern.compile(reader.readLine());

                System.out.print("Enter input string to search: ");
                Matcher matcher = pattern.matcher(reader.readLine());

                boolean found = false;
                while (matcher.find()) {
                    System.out.println("I found the text \"" + matcher.group() + "\" starting at " +
                       "index " + matcher.start() + " and ending at index " + matcher.end() + ".");
                    found = true;
                }
                if(!found){
                    System.out.println("No match found.");
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
            System.exit(-1);
        }

    }
}

【问题讨论】:

标签: java regex reluctant-quantifiers


【解决方案1】:

试试这个:

<a href=".*?" id="1">.*?</a>

我通过在.* 之后添加? 将捕获转换为非贪婪模式

但是当你有疑问时,你可以使用这个技巧:

<a href="[^"]*" id="1">[^<]*</a>

[^"]* 表示任意数量的非双引号字符
[^&lt;]* 表示任意数量的非左角字符

因此您不必担心贪婪/非贪婪

【讨论】:

  • 波西米亚人,你让我走上了正轨。我使用了你的技术,但我意识到我必须将 id="1" 更改为 "id="[1-9]+"。最后,它现在可以工作了。谢谢。
猜你喜欢
  • 2012-11-30
  • 2023-03-19
  • 1970-01-01
  • 2021-08-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多