【问题标题】:match the sequence between a word and the last occurding space匹配单词和最后出现的空格之间的序列
【发布时间】:2026-01-27 09:15:02
【问题描述】:

我正在寻找在特定单词之后提取匹配的特定字符集,直到序列中出现的最后一个空格。

例子:

FAILED on portal HTTP (10.1.1.1)
FAILED on portal TELNET 0 SSH (10.1.1.1)

我希望 O/P 是:

HTTP
TELNET 0 SSH

目前正在使用以下 RegEX 并正在研究它:

.+((?<=portal)[^\s]]+

如果你们中的任何人可以帮助我解决这个问题,将会很有帮助:)

根据评论更新:

文字:

1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION User authentication attempt FAILED on portal TELNET 0 SSH (10.1.2.8:64940) 

正则表达式:

^(\d+).* (\S+\d) ([\w\s]+) (\w* ?AUTHENTICATION:SESSION) (.+) (([\w.]+):(\d+)).* 

我希望从我的示例字符串中获得的组通常是:

#1 - 1368028793000 
#2 - 10.3.1.4 
#3 - CISCO X 
#4 - AUTHENTICATION:SESSION 
#5 - User authentication attempt FAILED on portal 
#6 - TELNET 0 SSH 
#7 - 10.1.2.8 
#8 - 6940

【问题讨论】:

  • 好的.. 我将描述我正在尝试的整个事情
  • 文本:1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION 用户身份验证尝试在门户 TELNET 0 SSH (10.1.2.8:64940) 上失败 正则表达式:^(\d+).* (\S+\d) ([\w\s]+) (\w* ?AUTHENTICATION:SESSION) (.+) (([\w.]+):(\d+)).*
  • 您真的要捕获所有这些组吗?还是只有门户之后的内容?
  • 我已经设法捕获了所有组...但是我想在一个单独的组中捕获“门户”之后的内容。无论是 HTTP、TELNET 0 SSH、FTP 等...
  • 您能否根据您的示例字符串解释您想要保留哪些组,以及它们的值?

标签: regex regex-negation regex-greedy regex-lookarounds


【解决方案1】:

你可以试试这个:

(?<=portal\s)(.+)\s\(

请注意,您缺少右括号) 和缺少左方括号[,我认为这是一个错字。并且您需要转义标记(10.1.1.1) 位开始的左括号。

【讨论】:

    【解决方案2】:

    根据新要求全部更改。

    试一试:

    ^(\d+)\s+([\d.]+)\s+([\w\s]+?)\s+(AUTHENTICATION:SESSION)\s+(.+?portal)\s(.+?)\(([\d.]+)(?::(\d+))?\)$
    

    这是一个运行它的 perl 脚本:

    my $re = qr/^(\d+)\s+([\d.]+)\s+([\w\s]+?)\s+(AUTHENTICATION:SESSION)\s+(.+?portal)\s(.+?)\(([\d.]+)(?::(\d+))?\)$/;
    while(<DATA>) {
        chomp;
        my @l = ($_ =~ $re);
        dump@l;
    }
    __DATA__
    1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION User authentication attempt FAILED on portal HTTP (10.1.1.1)
    1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION User authentication attempt FAILED on portal TELNET 0 SSH (10.1.2.8:64940)
    

    输出:

    (
      1368028793000,
      "10.3.1.4",
      "CISCO X",
      "AUTHENTICATION:SESSION",
      "User authentication attempt FAILED on portal",
      "HTTP ",
      "10.1.1.1",
      undef,
    )
    (
      1368028793000,
      "10.3.1.4",
      "CISCO X",
      "AUTHENTICATION:SESSION",
      "User authentication attempt FAILED on portal",
      "TELNET 0 SSH ",
      "10.1.2.8",
      64940,
    )
    

    正则表达式解释:

    The regular expression:
    
    (?-imsx:^(\d+)\s+([\d.]+)\s+([\w\s]+?)\s+(AUTHENTICATION:SESSION)\s+(.+?portal)\s(.+?)\(([\d.]+)(?::(\d+))?\)$)
    
    matches as follows:
    
    NODE                     EXPLANATION
    ----------------------------------------------------------------------
    (?-imsx:                 group, but do not capture (case-sensitive)
                             (with ^ and $ matching normally) (with . not
                             matching \n) (matching whitespace and #
                             normally):
    ----------------------------------------------------------------------
      ^                        the beginning of the string
    ----------------------------------------------------------------------
      (                        group and capture to \1:
    ----------------------------------------------------------------------
        \d+                      digits (0-9) (1 or more times (matching
                                 the most amount possible))
    ----------------------------------------------------------------------
      )                        end of \1
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      (                        group and capture to \2:
    ----------------------------------------------------------------------
        [\d.]+                   any character of: digits (0-9), '.' (1
                                 or more times (matching the most amount
                                 possible))
    ----------------------------------------------------------------------
      )                        end of \2
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      (                        group and capture to \3:
    ----------------------------------------------------------------------
        [\w\s]+?                 any character of: word characters (a-z,
                                 A-Z, 0-9, _), whitespace (\n, \r, \t,
                                 \f, and " ") (1 or more times (matching
                                 the least amount possible))
    ----------------------------------------------------------------------
      )                        end of \3
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      (                        group and capture to \4:
    ----------------------------------------------------------------------
        AUTHENTICATION:SES       'AUTHENTICATION:SESSION'
        SION
    ----------------------------------------------------------------------
      )                        end of \4
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      (                        group and capture to \5:
    ----------------------------------------------------------------------
        .+?                      any character except \n (1 or more times
                                 (matching the least amount possible))
    ----------------------------------------------------------------------
        portal                   'portal'
    ----------------------------------------------------------------------
      )                        end of \5
    ----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
    ----------------------------------------------------------------------
      (                        group and capture to \6:
    ----------------------------------------------------------------------
        .+?                      any character except \n (1 or more times
                                 (matching the least amount possible))
    ----------------------------------------------------------------------
      )                        end of \6
    ----------------------------------------------------------------------
      \(                       '('
    ----------------------------------------------------------------------
      (                        group and capture to \7:
    ----------------------------------------------------------------------
        [\d.]+                   any character of: digits (0-9), '.' (1
                                 or more times (matching the most amount
                                 possible))
    ----------------------------------------------------------------------
      )                        end of \7
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        :                        ':'
    ----------------------------------------------------------------------
        (                        group and capture to \8:
    ----------------------------------------------------------------------
          \d+                      digits (0-9) (1 or more times
                                   (matching the most amount possible))
    ----------------------------------------------------------------------
        )                        end of \8
    ----------------------------------------------------------------------
      )?                       end of grouping
    ----------------------------------------------------------------------
      \)                       ')'
    ----------------------------------------------------------------------
      $                        before an optional \n, and the end of the
                               string
    ----------------------------------------------------------------------
    )                        end of grouping
    ----------------------------------------------------------------------
    

    【讨论】:

    • 谢谢.. 但是这个给出了下面的 O/P 门户 TELNET 0 SSH(和)TELNET 0 SSH
    • @Designerztouch:对不起,我没听懂。
    • 好的。我已经准确地提出了我所需要的内容:) 我希望现在很清楚。谢谢!
    • 老兄!这对我来说 100% 有效! :D 非常感谢您的及时和大力帮助。你拯救了我的一天......
    【解决方案3】:

    你可以使用这个正则表达式

    (?<=portal).+(?=\s)
    

    .+ 是贪心的,所以它会匹配到最后,然后在必要时回溯..

    【讨论】:

      最近更新 更多