从 twitter API 检索每个 userId 超过 (~3000) 条推文。答案

【问题标题】：Retrieving more than (~3000) tweets per userId from twitter API.从 twitter API 检索每个 userId 超过 (~3000) 条推文。
【发布时间】：2015-11-10 18:52:59
【问题描述】：

我是 Twitter 开发的新手。我正在尝试下载重要新闻机构的推文。我使用了http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively 中提供的指南。下载推文。我知道 twitter api 对请求数量有一些限制（每 15 分钟 180 个请求），每个请求最多可以获取 100 条推文。所以我希望下面的代码在我第一次运行时能获得 18K 条推文。但是，我只能为每个新闻机构获得大约 3000 条推文。例如 nytimes 3234 推文，cnn 3207。如果您能查看我的代码并告诉我问题所在，我将不胜感激。

def get_tweets(api, username, sinceId):
        max_id = -1L
        maxTweets = 1000000 # Some arbitrary large number
        tweetsPerReq = 100  # the max the API permits
    tweetCount = 0

    print "writing to {0}_tweets.txt".format(username)
    with open("{0}_tweets.txt".format(username) , 'w') as f:
            while tweetCount < maxTweets:
                try:
                    if (max_id <= 0):
                        if (not sinceId):
                            new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq)
                        else:
                            new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq, since_id = sinceId)                              
                    else:
                        if (not sinceId):
                            new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq, max_id=str(max_id - 1))
                        else:
                            new_tweets = api.search(screen_name = username, count= tweetsPerReq, max_id=str(max_id - 1), since_id=sinceId)

                    if not new_tweets:
                        print "no new tweet"
                        break
                    #create array of tweet information: username, tweet id, date/time, text

                    for tweet in new_tweets:
                        f.write(jsonpickle.encode(tweet._json, unpicklable=False) +'\n')


                    tweetCount += len(new_tweets)
                    print("Downloaded {0} tweets".format(tweetCount))
                    max_id = new_tweets[-1].id
                except tweepy.TweepError as e:
                    # Just exit if any error
                    print("some error : " + str(e))
                    break                       


        print ("Downloaded {0} tweets, Saved to {1}_tweets.txt".format(tweetCount, username))

【问题讨论】：

标签： api twitter

【解决方案1】：

这些是 API 施加的限制。

如果你read the documentation，你会看到它说

此方法最多只能返回 3,200 条用户最近的推文。

因此，答案是 - 普通 API 用户无法访问该数据。

【讨论】：