【发布时间】:2014-06-15 10:51:24
【问题描述】:
我的数据包含如下所示的文本消息。我想从中提取区块年龄。
x:
my block is 8 years old and I am happy with it. I had been travelling since 2 years and that’s fun too…..
He invested in my 1 year block and is happy with the returns
He re-invested in my 1.5 year old block
i had come to U.K for 4 years and when I reach Germany my block will be of 5 years
我提取了后跟单词“year”或“years”的数字,但我意识到我应该选择更接近单词“block”的数字。
library(stringr)
> str_extract_all(x, "[0-9.]{1,3}.year|[0-9.]{1,3}.years")
[[1]]
[1] "8 years" "2 years"
[[2]]
[1] "1 year"
[[3]]
[1] "1.5 year"
[[4]]
[1] "4 years" "5 years"
我希望输出是一个包含
的列表8 years
1 year
1.5 year
5 years
我正在考虑提取包含“块”、“旧”等词的句子的一部分。但我不太清楚如何实现这一点。任何改进此过程的想法或建议都会有所帮助。
谢谢
【问题讨论】:
-
@David- 我只想提取区块的年龄。我编辑了我的帖子以包含图书馆的名称