【问题标题】:Perl regex for multiple pattern grouping and multiline regex用于多模式分组和多行正则表达式的 Perl 正则表达式
【发布时间】:2019-06-08 18:36:04
【问题描述】:

我有一个输入 txt 文件,其中包含上述格式的多行。

JMOD_01 :: This is starting of grouping 2nd KFGJHFG RTIRT DFB SFJKF ERIEFF FJDKF OIOIISD SDJKD 
last line ______________ 5564 numerical digits.

This is second starting of grouping 2nd KFGJHFG RTIRT FSFJKF  
ERIEFF FJDKF OIOIISD SDJKD 
till this end ___________ 021542 some random digits.

我正在尝试读取此文件并以分组方式提取搜索到的模式

这是我尝试过的。 我试过了,对第一场比赛进行分组,它被正确地捕获了。 寻找第二个分组时出现问题,它没有考虑下一行元素。

open(IFH,'<',"file.txt");

while ($line = <IFH>) {
if ($line =~ /^\s*(\w+\_\d*.*)\s*::(.*)/s) {
print "$1\n";
print "$2\n";
}
}
close(IFH);

预期结果:

打印 $1; #这应该给我

JMOD_01
fdgh_6765_546/456

当 , 打印 $2; #那么它应该给我

"This is starting of grouping 2nd KFGJHFG RTIRT DFB SFJKF ERIEFF FJDKF OIOIISD SDJKD last line"

"This is second starting of grouping 2nd KFGJHFG RTIRT FSFJKF  
ERIEFF FJDKF OIOIISD SDJKD till this end"

以及何时打印 $3; #那么它应该给

"5564 numerical digits"
"021542 some random digits"

但第二组的实际输出会有所不同: 打印 2 美元; #实际输出

"This is first starting of grouping 2nd KFGJHFG RTIRT DFB SFJKF"

"This is second starting of grouping 2nd KFGJHFG RTIRT FSFJKF"

【问题讨论】:

  • 是的,请忽略。考虑以下输入: JMOD_01 :: 这是分组 2nd KFGJHFG RTIRT DFB SFJKF ERIEFF FJDKF OIOIISD SDJKD 最后一行 ______________ 5564 个数字的开始。 fdgh_6765_546/456 :: 这是对第 2 个 KFGJHFG RTIRT FSFJKF ERIEFF FJDKF OIOIISD SDJKD 分组的第二个开始,直到此结束 ___________ 021542 一些随机数字。谢谢指点。

标签: regex perl grouping file-handling multiline


【解决方案1】:

如果我正确理解了这个问题,我们可能会使用两个简单的表达式并提取我们想要的数据,如果可以的话:

([A-Z_0-9]+)\s+::\s+([\s\S]+)

Demo 1

测试

use strict;

my $str = 'JMOD_01 :: This is starting of grouping 2nd KFGJHFG RTIRT DFB SFJKF ERIEFF FJDKF OIOIISD SDJKD 
last line ______________ 5564 numerical digits.

This is second starting of grouping 2nd KFGJHFG RTIRT FSFJKF  
ERIEFF FJDKF OIOIISD SDJKD 
till this end ___________ 021542 some random digits.

';
my $regex = qr/([A-Z_0-9]+)\s+::\s+([\s\S]+)/mp;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
  # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
  # print "Capture Group 2 is $2 ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

以及提取我们的数字:

([0-9]+\snumerical digits|[0-9]+\ssome random digits)

Demo 2

测试

use strict;

my $str = 'JMOD_01 :: This is starting of grouping 2nd KFGJHFG RTIRT DFB SFJKF ERIEFF FJDKF OIOIISD SDJKD 
last line ______________ 5564 numerical digits.

This is second starting of grouping 2nd KFGJHFG RTIRT FSFJKF  
ERIEFF FJDKF OIOIISD SDJKD 
till this end ___________ 021542 some random digits.

';
my $regex = qr/([0-9]+\snumerical digits|[0-9]+\ssome random digits)/mp;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
  # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
  # print "Capture Group 2 is $2 ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

正则表达式电路

jex.im 可视化正则表达式:

【讨论】:

  • 我在我的代码中结合了提到的正则表达式,现在它正确地采用了所有三个分组。谢谢艾玛。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-11-28
  • 2019-04-14
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多