【发布时间】:2015-12-29 11:33:37
【问题描述】:
我想从R中的字符向量中提取数字信息。向量中的每一行都有相同的结构,如下所示:
[1] "Capturing tweets..."
[2] "Connection to Twitter stream was closed after 1 seconds with up to 1 tweets downloaded."
[3] "Capturing tweets..."
[4] "Connection to Twitter stream was closed after 1 seconds with up to 1 tweets downloaded."
[5] "Capturing tweets..."
[6] "Connection to Twitter stream was closed after 1 seconds with up to 1 tweets downloaded."
[7] "Capturing tweets..."
[8] "Connection to Twitter stream was closed after 1 seconds with up to 1 tweets downloaded."
[9] "Capturing tweets..."
如您所见,此向量中有两种重复出现的数字信息。一个概述打开连接的持续时间,即数字后跟“秒”,另一个指示下载的推文数量。我只需要推文的数量,所以我想生成一个新的数字向量,它只包含每行后面跟着“推文”的数字。
【问题讨论】:
-
你已经尝试过什么了吗?
-
嗨@Heroka,我已经尝试了'gsub'的几种变体,例如this:
tweetnumbers <- as.numeric(gsub("[^\\d]+", "", output, perl=TRUE))但是,这只是为每一行留下了一个“11”。 -
@nikUoM,我建议编辑您的问题以包括您迄今为止所做的最有希望的尝试(上述评论很好,但添加到问题中时效果最好),以便人们有更多的东西具体来帮助你。