【问题标题】:Retriving all tweets from an hashtag. Problem recovering from rate limit with tweepy从主题标签中检索所有推文。使用 tweepy 从速率限制中恢复的问题
【发布时间】:2019-11-12 15:54:11
【问题描述】:

我正在尝试抓取标签 #nationaldoughnutday 的所有推文,但由于速率限制未能这样做。

参考下面的代码,我尝试将代码放在一个while循环中,这样当速率限制重置时,我可以从上次抓取的日期(until_date)恢复scraping

但是我一直重复出现这个错误,我的爬虫在休眠很长时间后似乎没有重新开始爬取。

TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
Sleeping...
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
Sleeping...
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))

我尝试删除内部 try catch 循环,但也没有帮助

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
query = '#nationaldoughnutday'
untill_date = '01-07-2019'

while True:
    try: #outer try catch 
        tweets = tweepy.Cursor(api.search, q=query + '-filter:retweets', rpp=100, lang='en',tweet_mode='extended',until = until_date).items()
        for tweet in tweets:
            try: #inner try catch 
                print "tweet : ", tweet.created_at
                #this is so that if i reconnect with cursor, i will start with the date before the last crawled tweet
                until_date = tweet.created_at.date() - datetime.timedelta(days=1)
                
            except tweepy.TweepError as e:
                print 'Inner TweepyError', e
                time.sleep(17 * 60)
                break
    except tweepy.TweepError as e:
        print 'Inner TweepyError',
        print "sleeping ...."
        time.sleep(17 * 60)
        continue
    except StopIteration:
                break

提前谢谢你!

【问题讨论】:

    标签: web-scraping twitter tweepy ratelimit


    【解决方案1】:

    尝试添加这个wait_on_rate_limit=True它并不能解决问题,因为它是关于 twitter API 删除这个速率限制但仍然有助于停止显示错误

    【讨论】:

      猜你喜欢
      • 2010-10-31
      • 2011-03-04
      • 2017-12-10
      • 2014-12-15
      • 1970-01-01
      • 2017-02-06
      • 1970-01-01
      • 2019-11-17
      • 2017-05-24
      相关资源
      最近更新 更多