Twitter API - 不使用 Tweepy 收集所有推文答案

【问题标题】：Twitter API - not collecting all tweets using TweepyTwitter API - 不使用 Tweepy 收集所有推文
【发布时间】：2016-06-29 11:49:29
【问题描述】：

我正在使用 Tweepy 通过推文 ID 从 Twitter API 收集推文。
我试图读入一个充满 ID 的文件，从对话流中获取上一条推文，然后将该推文及其作者的屏幕名称等存储在一个文本文件中。一些推文已被删除或用户的个人资料已设置为私人，在这种情况下，我想忽略该推文并继续下一条。但是，出于某种原因，我没有收集所有可访问的推文。它存储的所有推文中可能有 3/4 不是私人的，也没有被删除。有什么想法为什么它不能捕捉到所有东西？

提前致谢。

def getTweet(tweetID, tweetObj, callTweetObj, i):
    tweet = callTweetObj.text.encode("utf8")
    callUserName = callTweetObj.user.screen_name
    callTweetID = tweetObj.in_reply_to_status_id_str

    with open("call_tweets.txt", "a") as calltweets:
        output = (callTweetObj.text.encode('utf-8')+ "\t" + callTweetID + "\t" + tweetID)
        calltweets.write(output)
        print output 

    with open("callauthors.txt", "a") as callauthors:
        cauthors = (callUserName+ "\t" + "\t" + callTweetID + "\n")
        callauthors.write(cauthors)

    with open("callIDs.txt", "a") as callIDs:
        callIDs.write(callTweetID + "\n")

    with open("newResponseIDs.txt", "a") as responseIDs:
        responseIDs.write(tweetID)      

count = 0

file = "Response_IDs.txt"
with open(file, 'r+') as f:
    lines = f.readlines()
    for i in range(0, len(lines)):
        tweetID = lines[i]
        sleep(5)
        try:
            tweetObj = api.get_status(tweetID)
            callTweetID = tweetObj.in_reply_to_status_id_str
            callTweetObj = api.get_status(callTweetID)
            getTweet(tweetID, tweetObj, callTweetObj, i)
            count = count+1
            print count
        except:
            pass

【问题讨论】：

有什么可辨别的模式吗？你错过了旧推文吗？转推？非英语的推文？包含 Unicode 字符的推文？
记录捕获的异常和失败的原因会很有用。您目前正在丢弃对调试代码最重要的数据（except: pass）。
Response_IDs.txt 来自哪里，为什么你认为它的内容是准确的？
好吧，我可以手动将 ID 粘贴到浏览器地址栏中，它会将我带到推文，这样我就可以看到哪些是可访问的，即不是私人的，也没有被删除。

标签： python twitter tweepy

【解决方案1】：

您尚未指定有关从 api.get_status 返回的响应的信息，因此很难检测到错误是什么。

但是，您可能已达到statuses/show/:id 请求的速率限制。 API 指定此请求限制为每个窗口 180 个请求。

您可以使用Tweepy拨打application/rate_limit_status：

response = api.rate_limit_status()
remaining = response['resources']['statuses']['/statuses/show/:id']['remaining']
assert remaining > 0

【讨论】：