【问题标题】:RegEx for capturing particular digits用于捕获特定数字的正则表达式
【发布时间】:2019-10-08 11:34:41
【问题描述】:

从下面的日志中,我如何单独 grep '951792' 值

2019 May 22 03:32:17.952296 france1v4 sh[4937]: 190522-03:32:17.951792 [mod=REC, lvl=INFO] [tid=26130] Recording A8602096210405800406L200218680503121519 size is 4145956224 bytes
2019 May 22 03:32:17.952387 france1v4 sh[4937]: 190522-03:32:17.951895 [mod=REC, lvl=INFO] [tid=26130] RecordingInfo = fffocap://0x401e
2019 May 22 03:32:17.952466 france1v4 sh[4937]: 190522-03:32:17.951934 [mod=REC, lvl=INFO] [tid=26130] recording_dvr_from_recording_info:physicalSegmentCount=10   

我尝试了 java 拆分/子字符串操作。但是代码行很高。使用正则表达式我怎样才能得到'951792'值

输出将是

951792
951895
951934 
075041

【问题讨论】:

  • 您可以在“.”上拆分字符串,然后在第一个空格处再次拆分。它真的需要是一个正则表达式吗?

标签: java regex regex-lookarounds regex-group regex-greedy


【解决方案1】:

你可以试试下面的正则表达式:

(?<=[0-9]{6}-[0-9]{2}:[0-9]{2}:[0-9]{2}\.)[0-9]+

在您的 java 代码中添加 . (\\.) 时,不要忘记双重转义。

输入:

2019 May 22 03:32:17.952296 france1v4 sh[4937]: 190522-03:32:17.951792 [mod=REC, lvl=INFO] [tid=26130] Recording A8602096210405800406L200218680503121519 size is 4145956224 bytes
2019 May 22 03:32:17.952387 france1v4 sh[4937]: 190522-03:32:17.951895 [mod=REC, lvl=INFO] [tid=26130] RecordingInfo = fffocap://0x401e
2019 May 22 03:32:17.952466 france1v4 sh[4937]: 190522-03:32:17.951934 [mod=REC, lvl=INFO] [tid=26130] recording_dvr_from_recording_info:physicalSegmentCount=10 

匹配:

951792
951895
951934 

Demo 1

对于同时使用前瞻和后瞻的更严格的正则表达式,请使用:

(?<=[0-9]\]:\s[0-9]{6}-[0-9]{2}:[0-9]{2}:[0-9]{2}\.)[0-9]+(?=\s\[mod=REC)

Demo 2

java代码示例:

String input = "2019 May 22 03:32:17.952296 france1v4 sh[4937]: 190522-03:32:17.951792 [mod=REC, lvl=INFO] [tid=26130] Recording A8602096210405800406L200218680503121519 size is 4145956224 bytes\n" + 
                "2019 May 22 03:32:17.952387 france1v4 sh[4937]: 190522-03:32:17.951895 [mod=REC, lvl=INFO] [tid=26130] RecordingInfo = fffocap://0x401e\n" + 
                "2019 May 22 03:32:17.952466 france1v4 sh[4937]: 190522-03:32:17.951934 [mod=REC, lvl=INFO] [tid=26130] recording_dvr_from_recording_info:physicalSegmentCount=10   ";
List<String> matches = new ArrayList<String>();
Matcher m = Pattern.compile("(?<=[0-9]{6}-[0-9]{2}:[0-9]{2}:[0-9]{2}\\.)[0-9]+")
.matcher(input);
while (m.find()) {
    matches.add(m.group());
}
System.out.println(matches);

代码输出:

[951792, 951895, 951934]

【讨论】:

    【解决方案2】:

    //逐行迭代循环。

    String line = "2019 May 22 03:32:17.952296 france1v4 sh[4937]: 190522-03:32:17.951792 [mod=REC, lvl=INFO] [tid=26130] Recording A8602096210405800406L200218680503121519 size is 4145956224 bytes";
          String pattern = "^.+\\.(\\d+)";
    
          // Create a Pattern object
          Pattern r = Pattern.compile(pattern);
    
          // Now create matcher object.
          Matcher m = r.matcher(line);
          if (m.find( )) {
              System.out.println("Found value: " + m.group(1) ); //This would give 951792
                  }else {
             System.out.println("NO MATCH");
          }
    

    在此处获取正则表达式参考:https://regex101.com/r/8F0D4w/1

    【讨论】:

      【解决方案3】:

      在这里,我们可能想简单地在我们想要的数字旁边使用右边界[mod,并在我们的第一个捕获组中收集数字,可能类似于这样:

      ([0-9]+)\s\[m 
      

      如果我们愿意,我们可以添加更多的边界,例如:

      (.+?)([0-9]+)\s\[m.+
      

      DEMO

      测试

      import java.util.regex.Matcher;
      import java.util.regex.Pattern;
      
      final String regex = "(.+?)([0-9]+)\\s\\[m.+";
      final String string = "2019 May 22 03:32:17.952296 france1v4 sh[4937]: 190522-03:32:17.951792 [mod=REC, lvl=INFO] [tid=26130] Recording A8602096210405800406L200218680503121519 size is 4145956224 bytes\n"
           + "2019 May 22 03:32:17.952387 france1v4 sh[4937]: 190522-03:32:17.951895 [mod=REC, lvl=INFO] [tid=26130] RecordingInfo = fffocap://0x401e\n"
           + "2019 May 22 03:32:17.952466 france1v4 sh[4937]: 190522-03:32:17.951934 [mod=REC, lvl=INFO] [tid=26130] recording_dvr_from_recording_info:physicalSegmentCount=10   \n";
      final String subst = "\\2";
      
      final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
      final Matcher matcher = pattern.matcher(string);
      
      // The substituted value will be contained in the result variable
      final String result = matcher.replaceAll(subst);
      
      System.out.println("Substitution result: " + result);
      

      演示

      const regex = /(.+?)([0-9]+)\s\[m.+/gm;
      const str = `2019 May 22 03:32:17.952296 france1v4 sh[4937]: 190522-03:32:17.951792 [mod=REC, lvl=INFO] [tid=26130] Recording A8602096210405800406L200218680503121519 size is 4145956224 bytes
      2019 May 22 03:32:17.952387 france1v4 sh[4937]: 190522-03:32:17.951895 [mod=REC, lvl=INFO] [tid=26130] RecordingInfo = fffocap://0x401e
      2019 May 22 03:32:17.952466 france1v4 sh[4937]: 190522-03:32:17.951934 [mod=REC, lvl=INFO] [tid=26130] recording_dvr_from_recording_info:physicalSegmentCount=10   
      `;
      const subst = `$2`;
      
      // The substituted value will be contained in the result variable
      const result = str.replace(regex, subst);
      
      console.log('Substitution result: ', result);

      正则表达式

      如果不需要此表达式,可以在 regex101.com 中修改或更改。

      正则表达式电路

      jex.im 可视化正则表达式:

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-10-15
        • 2023-01-17
        • 1970-01-01
        • 1970-01-01
        • 2012-03-20
        • 2023-01-10
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多