【发布时间】:2020-10-12 05:35:32
【问题描述】:
我正在研究访问 Twitter 数据的代码。该代码由展示如何访问 youtube 中的 twitter 数据的人编写。
请看下面的代码(有些部分被截断了):
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import twitter_credentials
import numpy as np
import pandas as pd
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
def get_user_timeline_tweets(self, num_tweets):
tweets = []
for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
tweets.append(tweet)
return tweets
class TwitterAuthenticator():
def authenticate_twitter_app(self):
auth = xxxx
return auth
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()
def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
# This handles Twitter authetification and the connection to Twitter Streaming API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)
# This line filter Twitter Streams to capture data by the keywords:
stream.filter(track=hash_tag_list)
class TwitterListener(StreamListener):
xxxxxxx
if __name__ == '__main__':
hash_tag_list = ["donal trump", "hillary clinton", "barack obama", "bernie sanders"]
twitter_client = TwitterClient('COVID19')
print(twitter_client.get_user_timeline_tweets(1))
twitter_streamer=TwitterStreamer()
twitter_streamer.stream_tweets(
fetched_tweets_filename, hash_tag_list)
从代码中,我想知道为什么要创建两个类TwitterClient() 和TwitterStreamer()? TwitterStreamer() 与 hashtag 列表一起使用,而 TwitterClient() 是 user specific。这是否意味着TwitterStreamer() 更像是一个大规模搜索,而TwitterClient() 是更具体的用户。为什么要把它们一分为二?为什么只对hashtags 使用TwitterStreamer() 类?
由于我是 twitter 数据探索的新手,有人可以评论一下这段代码吗?
非常感谢
【问题讨论】:
标签: python-3.x twitter tweepy