JavaScript 正则表达式和捕获组答案

【问题标题】：JavaScript Regular Expressions and Capture GroupsJavaScript 正则表达式和捕获组
【发布时间】：2016-02-10 19:34:50
【问题描述】：

我不熟悉 JavaScript 中的正则表达式，无法从文本字符串中获取匹配数组，如下所示：

Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat

我想得到一个这样的匹配数组：

match[0] = [
    'foo',
    'bar'
]
match[1] = [
    'baz',
    'bat'
]

所以总结一下，我要找的是：

"在句子之后出现的任何破折号+单词（-foo、-bar 等）"

任何人都可以提供一个公式来捕获所有迭代而不是最后一个迭代，因为重复捕获组显然只会捕获最后一个迭代。如果这是一个愚蠢的问题，请原谅我。如果有人想向我发送一些测试，我正在使用 regex101

【问题讨论】：

遍历所有行并根据需要收集数据可能更容易。
连字符总是在行首吗？

标签： javascript regex

【解决方案1】：

正则表达式捕获对于无限数量的组并不适用。相反，拆分在这里效果更好：

var text = document.getElementById('text').textContent;
var blocks = text.split(/^(?!-)/m);
var result = blocks.map(function(block) {
  return block.split(/^-/m).slice(1).map(function(line) {
      return line.trim();
    });
});
document.getElementById('text').textContent = JSON.stringify(result);

<div id="text">Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat
</div>

【讨论】：

【解决方案2】：

我想出的第一个正则表达式如下：

/([^-]+)(-\w*)/g

第一组([^-]+) 抓取所有不是破折号的东西。然后我们跟进我们想要的实际捕获组(-\w+)。我们添加标志g 以使正则表达式对象跟踪它最后查看的位置。这意味着，每次我们运行 regex.exec(search) 时，我们都会得到您在 regex101 中看到的下一个匹配项。

注意：JavaScript 的 \w 等同于 [a-zA-Z0-9_]。所以，如果你只想要字母使用这个而不是\w：[a-zA-Z]

这是实现这个正则表达式的代码。

<p id = "input">
    Sentence would go here
    -foo
    -bar
    Another sentence would go here
    -baz
    -bat
</p>

<p id = "output">

</p>

<script>
    // Needed in order to make sure did not get a sentence.
    function check_for_word(search) {return search.split(/\w/).length > 1}
    function capture(regex, search) {
        var 
        // The initial match.
            match  = regex.exec(search),
        // Stores all of the results from the search.
            result = [],
        // Used to gather results.
            gather;
        while(match) {
            // Create something empty.
            gather = [];
            // Push onto the gather.
            gather.push(match[2]);
            // Get the next match.
            match = regex.exec(search);
            // While we have more dashes...
            while(match && !check_for_word(match[1])) {
                // Push result on!
                gather.push(match[2]);
                // Get the next match to be checked.
                match = regex.exec(search);
            };
            // Push what was gathered onto the result.
            result.push(gather);
        }
        // Hand back the result.
        return result;
    };
    var output = capture(/([^-]+)(-\w+)/g, document.getElementById("input").innerHTML);
    document.getElementById("output").innerHTML = JSON.stringify(output);
</script>

使用稍微修改的正则表达式，您可能会得到更多您正在寻找的东西。

/[^-]+((?:-\w+[^-\w]*)+)/g

[^-\w]* 的额外位允许在每个破折号词之间进行某种分隔。然后添加非捕获组(?:) 以允许+ 一个或多个破折号。我们也不需要() 周围的[^-]+，因为您将在下面看到不再需要数据。第一个对于破折号之间可以中断的内容更灵活，但我发现这个更清晰。

function capture(regex, search) {
    var 
	// The initial match.
	    match  = regex.exec(search),
	// Stores all of the results from the search.
	    result = [],
	// Used to gather results.
		gather;
	while(match) {
	    // Create something empty.
	    gather = [];
		
	    // Break up the large match.
	    var temp = match[1].split('-');
		for(var i in temp) 
		{
		    temp[i] = temp[i].split(/\W*/).join("");
			// Makes sure there was actually something to gather.
		    if(temp[i].length > 0)
		        gather.push("-" + temp[i]);
		}
		
		// Push what was gathered onto the result.
		result.push(gather);
		
		// Get the next match.
		match = regex.exec(search);	
	};
	// Hand back the result.
	return result;
};
var output = capture(/[^-]+((?:-\w+[^-\w]*)+)/g, document.getElementById("input").innerHTML);
document.getElementById("output").innerHTML = JSON.stringify(output);

<p id = "input">
Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat
My very own sentence!
-get
-all
-of
  -these!
</p>

<p id = "output">

</p>

【讨论】：

【解决方案3】：

只需匹配以- 开头的两行，如果这样就足够了。

\n-(.*)\r?\n-(.*)

见regex demo at regex101。要获得匹配，请使用exec() method。

var re = /\n-(.*)\r?\n-(.*)/g; var m;

var str = 'Sentence would go here\n-foo\n-bar\nAnother sentence would go here\n-baz\n-bat';

while ((m = re.exec(str)) !== null) {
  if (m.index === re.lastIndex) re.lastIndex++;
  document.write(m[1] + ',' + m[2] + '<br>');
}

【讨论】：

谢谢。在您的帮助下，我最终解决了这个问题： (^\"[\s\S]*?\")\n-(.*)\r?\n-(.*)
唯一的事情是“匹配两行”，我需要“匹配 N 行”。不过我会解决的，再次感谢您
@bobblebubble。您不需要增加lastIndex，因为使用g 标志会自动为您完成。
@Coldstar 欢迎您！哦，我以为你想配对。 Mabye 只需要 \r?\n-(.*) like this 但您似乎已经找到了解决方案。
@TMKelleher 感谢您的评论！使用了产生这个的 regex101 代码生成器。我会留下它并支持您的注释。