假设str = File.read(<filename>),在哪里
str = <<~BITTER_END
---
layout: page
title: "WE07S-AWE"
date: 2018-10-21 01:31:26.000000000 -0600
---
Humpty Dumpty sat
on a wall
---
layout: page
title: "WE08RS-WEA"
date: 2018-10-22 07:31:26.000000000 -0600
---
Little Miss
Muffet sat on
her tuffet
---
layout: page
title: "AR91G-HUH"
date: 2017-03-13 01:30:26.000000000 -0800
---
Three blind mice
See how they run
BITTER_END
您可能希望执行一系列操作,而不是使用单个正则表达式,以提高可读性并促进测试。我将使用两个正则表达式:
r1 = /^---\r?\n/
r2 = /^title: +"([^"]+)/
r1 表示“匹配由三个连字符组成的行”。 ^ 是行首锚点,\r?\n 是行终止符(如果文件是使用 Windows 创建的,则可以选择包含 回车 字符 \r)。
r2 读作“匹配 'title:' 在行首后跟一个或多个空格 (+)、一个双引号,后跟一个或多个双引号以外的字符(如尽可能多)。[^"] 是一个字符类,它匹配除" 之外的任何字符。
我们可以这样写:
str.split(r1).
drop(1).
each_slice(2).
with_object({}) { |(header,body),h| h[header[r2,1]] = body }
#=> {"WE07S-AWE"=>"Humpty Dumpty sat\non a wall\n",
# "WE08RS-WEA"=>"Little Miss\nMuffet sat on\nher tuffet\n",
# "AR91G-HUH"=>"Three blind mice\nSee how they run\n"}
步骤如下。
a = str.split(r1)
#=> ["",
# "layout: page\ntitle: \"WE07S-AWE\"\ndate: 2018-10-21 01:31:26.000000000 -0600\n",
# "Humpty Dumpty sat\non a wall\n",
# "layout: page\ntitle: \"WE08RS-WEA\"\ndate: 2018-10-22 07:31:26.000000000 -0600\n",
# "Little Miss\nMuffet sat on\nher tuffet\n",
# "layout: page\ntitle: \"AR91G-HUH\"\ndate: 2017-03-13 01:30:26.000000000 -0800\n",
# "Three blind mice\nSee how they run\n"]
b = a.drop(1)
#=> ["layout: page\ntitle: \"WE07S-AWE\"\ndate: 2018-10-21 01:31:26.000000000 -0600\n",
# ...
# "Three blind mice\nSee how they run\n"]
c = b.each_slice(2)
#=> #<Enumerator: ["layout: page\ntitle: \"WE07S-AWE\"\ndate: 2018-10-21 01:31:26.000000000 -0600\n",..., "Three blind mice\nSee how they run\n"]:each_slice(2)>
我们可以看到将由枚举器c 生成并通过将其转换为数组传递给with_object 的元素。
c.to_a
#=> [["layout: page\ntitle: \"WE07S-AWE\"\ndate: 2018-10-21 01:31:26.000000000 -0600\n",
# "Humpty Dumpty sat\non a wall\n"],
# ["layout: page\ntitle: \"WE08RS-WEA\"\ndate: 2018-10-22 07:31:26.000000000 -0600\n",
# "Little Miss\nMuffet sat on\nher tuffet\n"],
# ["layout: page\ntitle: \"AR91G-HUH\"\ndate: 2017-03-13 01:30:26.000000000 -0800\n",
# "Three blind mice\nSee how they run\n"]]
继续,
d = c.with_object({})
#=> #<Enumerator: #<Enumerator: ["layout:...]:each_slice(2)>:each_with_object({"\"WE07S-AWE"=>"Humpty Dumpty sat\non a wall\n"})>
d 可能被认为是一个复合枚举器,尽管 Ruby 没有这样的概念。继续,
(header,body),h = d.next
#=> [["layout: page\ntitle: \"WE07S-AWE\"\ndate: 2018-10-21 01:31:26.000000000 -0600\n",
# "Humpty Dumpty sat\non a wall\n"],
# {}]
Ruby 使用array decomposition 将d.next 分解为三个对象,分别成为三个块变量header、body 和h 的值。让我们检查一下这些值。
header
#=> "layout: page\ntitle: \"WE07S-AWE\"\ndate: 2018-10-21 01:31:26.000000000 -0600\n"
body
#=> "Humpty Dumpty sat\non a wall\n"
h #=> {}
这是h 的初始值。它将在计算过程中构建。现在检查块计算。
s = header[r2,1]
#=> "WE07S-AWE"
h[s] = body
#=> "Humpty Dumpty sat\non a wall\n"
现在
h #=> {"WE07S-AWE"=>"Humpty Dumpty sat\non a wall\n"}
其余计算类似。
请参阅String#split、Array#drop、Enumerable#each_slice 和 Enumerator#with_object。