【发布时间】:2014-11-11 04:22:29
【问题描述】:
我是 R 中正则表达式的新手。这里我有一个向量,我有兴趣在其中提取向量的每个字符串中第一次出现的数字。
我有一个名为“shootsummary”的向量,看起来像这样。
> head(shootsummary)
[1] Aaron Alexis, 34, a military veteran and contractor from Texas, opened fire in the Navy installation, killing 12 people and wounding 8 before being shot dead by police.
[2] Pedro Vargas, 42, set fire to his apartment, killed six people in the complex, and held another two hostages at gunpoint before a SWAT team stormed the building and fatally shot him.
[3] John Zawahri, 23, armed with a homemade assault rifle and high-capacity magazines, killed his brother and father at home and then headed to Santa Monica College, where he was eventually killed by police.
[4] Dennis Clark III, 27, shot and killed his girlfriend in their shared apartment, and then shot two witnesses in the building's parking lot and a third victim in another apartment, before being killed by police.
[5] Kurt Myers, 64, shot six people in neighboring towns, killing two in a barbershop and two at a car care business, before being killed by officers in a shootout after a nearly 19-hour standoff.
每个字符串中第一次出现的数字表示个人的“年龄”,我有兴趣从这些字符串中提取年龄,而不会将它们与所列行中的其他数字混合。
我用过:
as.numeric(gsub("\\D", "", shootsummary))
结果:
[1] 34128 42 23 27 6419
我正在寻找一个看起来像这样的结果,其中仅包含从句子中提取的年龄,而没有提取年龄之后出现的其他数字。
[1] 34 42 23 27 64
【问题讨论】:
-
假设向量元素之一没有数字,你想返回什么。在我的解决方案中,它返回
NA。