【问题标题】:Convert regexp from JavaScript to PCRE format将正则表达式从 JavaScript 转换为 PCRE 格式
【发布时间】:2011-07-11 06:00:03
【问题描述】:

在这里看到很多关于同一主题的问题,但没有找到答案:(

我有这个{{Infobox[^]*?({{[^{}]*?}}[^]*?)*}} 正则表达式。在 Javascript 和此处 http://www.gskinner.com/RegExr/

中用作魅力

在 PHP 中,它会产生错误“编译失败:偏移 41 处的字符类缺少终止]”。

我知道分隔符,也尝试将 \ 放在 { 和 } 之前 - 没有帮助。

{{For|the title track of the album|Master of Puppets (song)}}
{{Infobox Album <!-- See Wikipedia:WikiProject_Albums -->
| Name        = Master of Puppets
| Type        = Studio album
| Artist      = [[Metallica]]
| Cover       = Metallica - Master of Puppets.jpg
| Released    = {{Start date|1986|3|3}}
| Recorded    = September 1 – December 27, 1985 at [[Sweet Silence Studios]], [[Copenhagen]], [[Denmark]]
| Genre       = [[Thrash metal]]
| Length      = 54:41
| Label       = [[Elektra Records|Elektra]], [[Music for Nations]], [[Vertigo Records|Vertigo]]
| Producer    = Metallica, [[Flemming Rasmussen]]
| Last album  = ''[[Ride the Lightning]]''<br/>(1984)
| This album  = '''''Master of Puppets'''''<br/>(1986)
| Next album  = ''[[...And Justice for All (album)|...And Justice for All]]''<br/>(1988)
| Misc        = {{Singles
  | Name           = Master of Puppets
  | Type           = Studio
  | single 1       = [[Master of Puppets (song)|Master of Puppets]]
  | single 1 date  = July 2, 1986
  | single 2       = [[Battery (song)|Battery]]
  | single 2 date  = 1986
  | single 3       = [[Welcome Home (Sanitarium)]]
  | single 3 date  = 1986
  }}
}}
'''''Master of Puppets''''' is the third studio album by the American [[heavy metal music|heavy metal]] band [[Metallica]]. It was released on March 3, 1986 through [[Elektra Records]]. The album reached #29<ref>{{Cite news 
  | last = Pareles
  | first = Jon
  | coauthors = 
  | title = HEAVY METAL, WEIGHTY WORDS
  | work = [[The New York Times]]
  | place = USA
  | page = 8
  | language = 
  | publisher = The New York Times Company
  | date = 10 July 1988
  | url = http://www.nytimes.com/1988/07/10/magazine/heavy-metal-weighty-words.html?pagewanted=8
  | accessdate = 14 November 2010}}</ref> on the U.S. [[Billboard 200|''Billboard'' 200]] album chart and was the band's first gold record for sales of over 500,000 copies. This was done without any radio airplay or the release of a music video. The album eventually was certified 6x platinum by the [[Recording Industry Association of America|RIAA]].<ref>{{cite web|title=Gold & Platinum|url=http://riaa.com/goldandplatinumdata.php?resultpage=1&table=SEARCH_RESULTS&action=&title=Master%20of%20Puppets&artist=Metallica&format=&debutLP=&category=&sex=&releaseDate=&requestNo=&type=&level=&label=&company=&certificationDate=&awardDescription=&catalogNo=&aSex=&rec_id=&charField=&gold=&platinum=&multiPlat=&level2=&certDate=&album=&id=&after=&before=&startMonth=1&endMonth=1&startYear=1958&endYear=2008&sort=Artist&perPage=25|publisher=RIAA|accessdate=2009-12-31}}</ref> 

【问题讨论】:

  • 你能显示你试图匹配的字符串吗?
  • 偏移量 41?你的RE甚至没有那么长。有什么遗漏吗?
  • 试图解析维基百科的文章。将示例添加到初始帖子中。缺少单词“album”,但这并不重要——“Compilation failed: missing terminating ] for character class at offset 35”
  • 您是否尝试使用正则表达式解析 html?

标签: php regex pcre


【解决方案1】:

我认为[^] 的用法具有误导性/不清楚,我不确定它的含义。

(因为您收到关于缺少 ] 的错误,我假设它会打开一个字符类,^ 是该类的否定,以下 ] 不会关闭该类,它是它的第一个成员。但是由于 Regexr 使用它的方式不同,我认为不同的正则表达式引擎中的含义不同。)

你可以试试这个

{{Infobox[^{]*?({{[^}]*?}}[\s\S]*?)*}}

here on Regexr

【讨论】:

    【解决方案2】:

    我想说问题在于 ^{} 字符没有被转义 - 你应该尝试一下:

    \{\{Infobox[\^]*?(\{\{[\^\{\}]*?\}\}\[^]*?)*\}\}
    

    编辑以匹配任何带有[^]的字符:

    \{\{Infobox.*?(\{\{[.\{\}]*?\}\}\.*?)*\}\}
    

    【讨论】:

    • 如果我转义 ^ 正则表达式与 PHP 中的任何内容都不匹配,gskinner.com/RegExr 至于 {} - 我试图转义它们 - 没有帮助。
    • 我认为 OP 不希望插入符号逃脱。表达式[^] 是一种写“任何字符”的方法,这是正在尝试的:直到下一个{{ 之前的任何字符确实 需要转义。
    • 任何字符应该匹配. - ^ inside [] 表示NOT the following character 之后他没有任何字符
    猜你喜欢
    • 1970-01-01
    • 2011-01-22
    • 1970-01-01
    • 1970-01-01
    • 2016-07-13
    • 2020-07-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多