【发布时间】:2015-11-10 18:52:59
【问题描述】:
我是 Twitter 开发的新手。我正在尝试下载重要新闻机构的推文。我使用了http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively 中提供的指南。下载推文。我知道 twitter api 对请求数量有一些限制(每 15 分钟 180 个请求),每个请求最多可以获取 100 条推文。所以我希望下面的代码在我第一次运行时能获得 18K 条推文。但是,我只能为每个新闻机构获得大约 3000 条推文。例如 nytimes 3234 推文,cnn 3207。 如果您能查看我的代码并告诉我问题所在,我将不胜感激。
def get_tweets(api, username, sinceId):
max_id = -1L
maxTweets = 1000000 # Some arbitrary large number
tweetsPerReq = 100 # the max the API permits
tweetCount = 0
print "writing to {0}_tweets.txt".format(username)
with open("{0}_tweets.txt".format(username) , 'w') as f:
while tweetCount < maxTweets:
try:
if (max_id <= 0):
if (not sinceId):
new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq)
else:
new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq, since_id = sinceId)
else:
if (not sinceId):
new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq, max_id=str(max_id - 1))
else:
new_tweets = api.search(screen_name = username, count= tweetsPerReq, max_id=str(max_id - 1), since_id=sinceId)
if not new_tweets:
print "no new tweet"
break
#create array of tweet information: username, tweet id, date/time, text
for tweet in new_tweets:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) +'\n')
tweetCount += len(new_tweets)
print("Downloaded {0} tweets".format(tweetCount))
max_id = new_tweets[-1].id
except tweepy.TweepError as e:
# Just exit if any error
print("some error : " + str(e))
break
print ("Downloaded {0} tweets, Saved to {1}_tweets.txt".format(tweetCount, username))
【问题讨论】: