如何在ruby中解析字符串中最后一组括号之间的子字符串答案

【问题标题】：How to parse substring between last set of parentheses in string in ruby如何在ruby中解析字符串中最后一组括号之间的子字符串
【发布时间】：2009-03-29 00:22:35
【问题描述】：

在我的 ruby on rails 应用程序中，我正在尝试构建一个解析器来从字符串中提取一些元数据。

假设示例字符串是：

敏捷的红狐 (frank,10) 跳了起来在懒惰的棕色狗身上（拉尔夫，20 岁）。

我想从 ( ) 的最后一次出现中提取子字符串。

所以，无论字符串中有多少 ( )，我都想得到 "ralph, 20"。

有没有最好的方法来创建这个 ruby 字符串提取... regexp？

谢谢，

约翰

【问题讨论】：

标签： ruby-on-rails ruby regex

【解决方案1】：

您似乎想要sexeger。它们通过反转字符串、针对字符串运行反转的正则表达式，然后反转结果来工作。这是一个例子（请原谅代码，我不太了解 Ruby）：

#!/usr/bin/ruby

s = "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20).";

reversed_s = s.reverse;
reversed_s =~ /^.*?\)(.*?)\(/;
result = $1.reverse;
puts result;

这没有得到赞成票的事实告诉我，没有人点击阅读您为什么要使用sexeger，所以这里是一个基准测试的结果：

do they all return the same thing?
ralph, 20
ralph, 20
ralph, 20
ralph, 20
                        user     system      total        real
scan greedy         0.760000   0.000000   0.760000 (  0.772793)
scan non greedy     0.750000   0.010000   0.760000 (  0.760855)
right index         0.760000   0.000000   0.760000 (  0.770573)
sexeger non greedy  0.400000   0.000000   0.400000 (  0.408110)

这里是基准：

#!/usr/bin/ruby

require 'benchmark'

def scan_greedy(s)
    result = s.scan(/\([^)]*\)/x)[-1]
    result[1 .. result.length - 2]
end

def scan_non_greedy(s)
    result = s.scan(/\(.*?\)/)[-1]
    result[1 .. result.length - 2]
end

def right_index(s)
    s[s.rindex('(') + 1 .. s.rindex(')') -1]
end

def sexeger_non_greedy(s)
    s.reverse =~ /^.*?\)(.*?)\(/
    $1.reverse
end

s = "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20).";

puts "do they all return the same thing?", 
    scan_greedy(s), scan_non_greedy(s), right_index(s), sexeger_non_greedy(s)

n = 100_000
Benchmark.bm(18) do |x|
    x.report("scan greedy")        { n.times do; scan_greedy(s); end }
    x.report("scan non greedy")    { n.times do; scan_non_greedy(s); end }
    x.report("right index")        { n.times do; scan_greedy(s); end }
    x.report("sexeger non greedy") { n.times do; sexeger_non_greedy(s); end }
end

【讨论】：

有趣（而且彻底！）...是表示获得答案所需时间速度的基准吗？
是的，特别是运行函数 100,000 次的时间。如果您想看到令人印象深刻的差异，请将 s 更改为 "(foo)(foo)(foo)(foo)(foo)(bar)" 并将 n 更改为 10000。sexeger 快一个数量级。

【解决方案2】：

我会试试这个（这里我的正则表达式假设第一个值是字母数字，第二个值是数字，相应调整）。在这里，扫描将所有出现的事件作为一个数组获取，-1 告诉我们只抓取最后一个，这似乎正是您所要求的：

>> foo = "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20)."
=> "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20)."
>> foo.scan(/\(\w+, ?\d+\)/)[-1]
=> "(ralph, 20)"

【讨论】：

太棒了！ ...我最终将其更改为 foo.scan(/(.*,*.*)/)[-1] 因为我真的不需要将其限制为示例字符类型。谢谢
不会 s.scan(/(.*?)/)[-1];更容易吗？
或者 s.scan(/([^)]*)/)[-1] 如果你不喜欢非贪婪匹配。

【解决方案3】：

一个简单的非正则表达式解决方案：

string = "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20)."
string[string.rindex('(')..string.rindex(')')]

例子：

irb(main):001:0> string =  "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20)."
=> "The quick red fox (frank,10) jumped over the lazy brown dog (ralph, 20)."
irb(main):002:0> string[string.rindex('(')..string.rindex(')')]
=> "(ralph, 20)"

并且没有括号：

irb(main):007:0> string[string.rindex('(')+1..string.rindex(')')-1]
=> "ralph, 20"

【讨论】：