通过hashtag搜索推特并获取推文，最大化返回搜索结果的数量答案

【问题标题】：Search twitter and obtain tweets by hashtag, maximizing number of returned search results通过hashtag搜索推特并获取推文，最大化返回搜索结果的数量
【发布时间】：2014-04-14 08:08:26
【问题描述】：

我正在尝试使用 R 中的 twitteR 包，从他们的 API 编译 Twitter 上与世界杯相关的所有推文的语料库。

我将以下代码用于单个主题标签（例如）。但是，我的问题是，我似乎只被“授权”访问一组有限的推文（在这种情况下，只有 32 个最近的推文）。

library(twitteR)

reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "http://api.twitter.com/oauth/authorize"
#consumerKey <- Omitted
#consumerSecret <- Omitted
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
                             consumerSecret=consumerSecret,
                             requestURL=reqURL,
                             accessURL=accessURL,
                             authURL=authURL)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package =  "RCurl")))
twitCred$handshake()

#setwd("/Users/user/FIFA")

#save(twitCred, file="twitterAuthentication.Rdata")
#load("twitterAuthentication.Rdata")
registerTwitterOAuth(twitCred)

FIFA<-searchTwitter("#WorldCup", n=9999, since='2007-10-30')

返回以下错误：

Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit,  :
  9999 tweets were requested but the API can only return 32

我的问题是：如何使用特定主题标签访问最大数量的推文？（另外，有人能澄清一下“最大”限制实际上是什么吗？为什么我似乎无法接近这个值（~ 1500 条推文）？

我已经在 Twitter 开发者网站中测试了 OAuth，并分别获得了 Signature base string、authorization header 和 cURL 命令的签名结果，这表明我拥有适当的权限和授权来从 Twitter 的服务器中提取适当的数据。如果我错了，或者您需要更多信息，请告知/纠正我。

我的 API 权限当前设置为：读取、写入和访问直接消息

Session Info:

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RJSONIO_1.0-3  twitteR_1.1.7  rjson_0.2.12   ROAuth_0.9.3   digest_0.6.3   RCurl_1.95-4.1 bitops_1.0-5  
[8] foreign_0.8-55

loaded via a namespace (and not attached):
[1] tools_3.0.2

其他资源/来源：

twitter package in R maximum tweets using searchTwitter()

此消息来源指出最大值为 1500

Twitter api searching tweets for hashtags

此消息来源指出最大值为 3200

【问题讨论】：

奇怪。 FIFA<-searchTwitter("#WorldCup", n=60) 在这里产生了预期的 60 条推文，范围从“2014-03-10 23:15:52 UTC”到“2014-03-11 00:18:44 UTC”。您是否也尝试过流式 API？ (github.com/pablobarbera/streamR)
@lukeA 提供的示例代码试图从 Twitter 中提取自 2007 年 10 月 30 日至今的所有推文。我也很困惑为什么你说它会产生“预期的”60 条推文？为什么人们应该期望今天或其他任何一天只有 60 条推文？不，我还没有尝试流式传输 API 或 streamR 包。
我认为你不会从 twitter 获得历史性推文，这是对他们的搜索 api 的常见误解（阅读 lists.hexdump.org/pipermail/twitter-users-hexdump.org/…、dev.twitter.com/docs/using-search、dev.twitter.com/docs/api/1.1/get/search/tweets）

标签： r twitter oauth twitter-oauth data-mining

【解决方案1】：

这是不可能的，

Using the Twitter Search API

“搜索 API 不是所有推文的完整索引，而是一个最近推文的索引。目前该指数包括6-9之间推文的日子。”

【讨论】：

如果你有幸获得 Twitter Data Grants 是有可能的：blog.twitter.com/2014/introducing-twitter-data-grants
@lukeA 条件是在接下来的 48 小时内完成拨款提案。

【解决方案2】：

此回复适用于仍在寻找类似问题的人... 您可以包含一个额外的参数“resultType”并提及您是否想要“流行”或“最近”的帖子。

FIFA <- searchTwitter("#WorldCup", n=9999, since='2007-10-30', resultType = 'recent')

这应该可以解决问题。

【讨论】：