使用 twitteR 包检索具有 maxID 的用户时间线给出意外结果答案

【问题标题】：Retrieving user timeline with maxID giving unexpected result using twitteR package使用 twitteR 包检索具有 maxID 的用户时间线给出意外结果
【发布时间】：2016-09-15 22:20:25
【问题描述】：

我正在尝试检索一个帐户的整个时间线，在阅读了 twitter API 之后，我编写了以下代码：

healthLeadersTimeline <- twListToDF(userTimeline("HealthLeaders", n=200, includeRts=TRUE, retryOnRateLimit=180))
write.table(healthLeadersTimeline, "health.csv", sep=",", row.names=FALSE)
maxID <- getMaxID(last(healthLeadersTimeline)$id)
healthLeadersTimeline <- twListToDF(userTimeline("HealthLeader", n=200, maxID=maxID, includeRts=TRUE, retryOnRateLimit=180))
write.table(healthLeadersTimeline, "health.csv", sep=",", append=TRUE, col.names=FALSE, row.names=FALSE)

而getMaxID的实现如下：

getMaxID <- function (tweetID) {
  lastID <- as.numeric(tweetID)
  maxID <- toString(lastID -1)
  return(maxID)
}

这个 Twitter 帐户显然有 400 多条推文。然而在第二次调用时间线时，我只能检索到 35 条推文。我在这里做错了什么？

【问题讨论】：

标签： r twitter

【解决方案1】：

你没有做错任何事，除了可能没有阅读docs :)。 3200 是您可以从时间线正式获得的最大帖子数。

【讨论】：

是的，我刚刚发现使用 python 永远无法超过 3200 的限制，但是我在这里使用 R 的代码每次只设置 n=200。在 userTimeline 的第二次调用中，之前只检索到了 200 条推文。
看起来userTimeline 在内部处理分页。所以你不需要指定maxID，除非你想在一段时间后排除推文。您可以设置n=3200。至少这是 twitterR 文档所暗示的。此外，如果您确实选择提供maxID，它的意思是“小于”而不是“小于或等于”，所以不要减去一个。然而，这一切并不能解释为什么你只得到 35 条推文而不是 200 条。我会删除 maxID，尝试不同的 n 值并检查你得到的推文，看看哪些推文被丢弃了。
在您提供的文档页面中，它说 maxID 是包容性的，我正在根据instruction 做减一。感谢您的回复，是的，我会尝试检查检索到的实际推文。