【问题标题】:Twitter streaming stop collecting dataTwitter流媒体停止收集数据
【发布时间】:2016-07-13 10:06:38
【问题描述】:

我有以下代码,用于检索 Twitter 流数据并创建一个 JSON 文件。我想得到的是在例如 1000 条推文之后停止数据收集。如何设置代码?

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

# Other libs
import json

#Variables that contains the user credentials to access Twitter API
access_token = "XXX"
access_token_secret = "XXX"
consumer_key = "XXX"
consumer_secret = "XXX"

#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):

        try:
            tweet = json.loads(data)
            with open('your_data.json', 'a') as my_file:
                json.dump(tweet, my_file)


        except BaseException:
            print('Error')
            pass

    def on_error(self, status):
        print ("Error " + str(status))
        if status == 420:
            print("Rate Limited")
            return False


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)


    stream.filter(track=['Euro2016', 'FRA', 'POR'], languages=['en'])

【问题讨论】:

    标签: python api python-3.x twitter


    【解决方案1】:

    这是一个可能的解决方案:

    class StdOutListener(StreamListener):
    
        tweet_number=0   # class variable
    
        def __init__(self,max_tweets):
            self.max_tweets=max_tweets # max number of tweets
    
        def on_data(self, data):
            self.tweet_number+=1   
            try:
                tweet = json.loads(data)
                with open('your_data.json', 'a') as my_file:
                    json.dump(tweet, my_file)
            except BaseException:
                print('Error')
                pass
            if self.tweet_number>=self.max_tweets:
                sys.exit('Limit of '+str(self.max_tweets)+' tweets reached.')
    
        def on_error(self, status):
            print ("Error " + str(status))
            if status == 420:
                print("Rate Limited")
                return False
    
    l = StdOutListener(1000) # Here you can set your maximum number of tweets (1000 in this example)
    

    在定义类变量tweet_number 之后,我使用init() 方法初始化一个新的StdOutListener 对象,其中包含您要收集的最大推文数。 tweet_number每调用一次on_data(data)方法就加1,导致程序在tweet_number>=max_tweets时终止

    附:您需要导入 sys 才能使代码正常工作。

    【讨论】:

      【解决方案2】:

      这是我将使用的 2.7 代码——抱歉,我也不知道 3.0……我想你想要我第二行的内容。 .items(1000) 部分...?

      stackoverflow 弄乱了我在代码中的缩进。我也在使用 tweepy。

      代码:

              results = []
          for tweet in tweepy.Cursor(api.search, q='%INSERT_SEARCH_VARIABLE HERE').items(1000): #THE 1000 IS WHERE YOU SAY SEARCH FOR 1000 TWEETS. 
              results.append(tweet)
          
          print type(results)
          print len(results)
      def toDataFrame(tweets):
      
          DataSet = pd.DataFrame()
      
          DataSet['tweetID'] = [tweet.id for tweet in tweets]
          DataSet['tweetText'] = [tweet.text for tweet in tweets]
          DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet 
          in tweets]
          DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet 
          in tweets]
          DataSet['tweetSource'] = [tweet.source for tweet in tweets]
          DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
      
      
          DataSet['userID'] = [tweet.user.id for tweet in tweets]
          DataSet['userScreen'] = [tweet.user.screen_name for tweet 
          in tweets]
          DataSet['userName'] = [tweet.user.name for tweet in tweets]
          DataSet['userCreateDt'] = [tweet.user.created_at for tweet 
          in tweets]
          DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
          DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet 
          in tweets]
          DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet 
          in tweets]
          DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
          DataSet['userTimezone'] = [tweet.user.time_zone for tweet 
          in tweets]
          
              return DataSet
          
          #Pass the tweets list to the above function to create a DataFrame
          tweet_frame = toDataFrame(results)
      tweet_frame[0:999]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-01-26
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-04-05
        • 2013-01-07
        • 1970-01-01
        相关资源
        最近更新 更多