在这种情况下最好的 Javascript 字符串搜索和返回技术？答案

【问题标题】：Best Javascript string search and return techniques in this case?在这种情况下最好的 Javascript 字符串搜索和返回技术？
【发布时间】：2014-01-22 04:48:38
【问题描述】：

我有一个字符串进入一个函数，它可能是：

I am a "somevalue"

或

I am a "somevalue" of "anothervalue"

在每种情况下，我都需要识别“我是一个”部分，然后返回引号内的值，或者如果有两个，则返回两者。有几种方法可以做到这一点，但我正在寻找高使用率的最有效的方法。

有兴趣听取对此提出意见的任何人的意见 - 谢谢！

【问题讨论】：

Regex 可以完成这项工作，是的，但您必须定义“高使用率”。你可以批量处理你的字符串服务器端吗？你想做什么？
谢谢，关于正则表达式的任何开始建议吗？ @Vache“高使用率”意味着它将有大量的调用。这是一个运行在 Linux 上的 nodejs 后台应用。
你给出的两个例子......这些是完整的字符串吗？那么，您是否只需要引号内的所有值？
引号内的引号如何转义？
@basilikum 是的，它们是完整的字符串。我正在寻找引号内的所有值，但前提是首先找到确切的模式 'i am a'，并且如果在引号中的第一个值之后找到 ' of '，则还要在引号中找到第二个值。

标签： javascript regex string search

【解决方案1】：

由于您的格式是不变的，您可以在同一个正则表达式中匹配和捕获。

var str1 = 'I am a "somevalue" of "anothervalue"',
    str2 = 'I am a "somevalue"',
    str3 = 'I am a "value with \\"escaped\\" quotes"',
    regex = /^I am a "((?:\\"|[^"])*)"(?: of "((?:\\"|[^"])*)")?/;

function match(str) {
    var matches = str.match(regex);
    if (matches !== null) {
        console.log(matches.slice(1)); // ["somevalue", "anothervalue"]
    }
}

match(str1); // ["somevalue", "anothervalue"]
match(str2); // ["somevalue", undefined]
match(str3); // ["value with \"escaped\" quotes", undefined]

切片调用是删除包含整个字符串的第一个匹配项。如果没有可匹配的内容，您将获得“未定义”作为第二个匹配项。

根据引号内引号的转义方式，您可能需要稍微修改一下正则表达式。我假设\ 将成为转义字符。

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  I am a "                 'I am a "'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \\                       '\'
--------------------------------------------------------------------------------
      "                        '"'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [^"]                     any character except: '"'
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  "                        '"'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
     of "                    ' of "'
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more
                               times (matching the most amount
                               possible)):
--------------------------------------------------------------------------------
        \\                       '\'
--------------------------------------------------------------------------------
        "                        '"'
--------------------------------------------------------------------------------
       |                        OR
--------------------------------------------------------------------------------
        [^"]                     any character except: '"'
--------------------------------------------------------------------------------
      )*                       end of grouping
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
  )?                       end of grouping

话虽如此，此解决方案仅适用于“高使用率”的特定值。如果我们以非常高的速度谈论数百万个查询，那么您最好使用更适合解析文本的技术（这可能不会在 JavaScript/node 中）。

【讨论】：

正则表达式的很好解释:)
如果引号前有转义斜杠，这可能会失败 - 简单的解决方案是在非捕获组中添加 \\\\| 前缀
@Vache 谢谢，我能够使用此解决方案并将其他行为基于返回数组的值。引号的转义效果很好（输入字符串来自 twitter api）。