【发布时间】:2022-08-02 19:12:30
【问题描述】:
我的目标是从推文中提取媒体数据。我正在使用 twitter api-v2,当我提取少于 100 条推文时我没有问题,但是当我使用 Paginator 时,我得到一个错误告诉我
users = {u[\"id\"]: u for u in tweets.includes[\'users\']}
AttributeError: \'Paginator\' object has no attribute \'includes\'.
而且我无法更改代码以提取媒体数据。另外,我不知道是否有另一种方式来获得这些数据。任何帮助,将不胜感激!
client = tweepy.Client(bearer_token=(config.BEARER_TOKEN))
query = \'climate change -is:retweet has:media\'
# your start and end time for fetching tweets
start_time = \'2020-01-01T00:00:00Z\'
end_time = \'2020-01-31T00:00:00Z\'
# get tweets from the API
tweets = tweepy.Paginator(client.search_all_tweets,
query=query,
start_time=start_time,
end_time=end_time,
tweet_fields=[\'context_annotations\', \'created_at\',\'source\',\'public_metrics\',
\'lang\',\'referenced_tweets\',\'reply_settings\',\'conversation_id\',
\'in_reply_to_user_id\',\'geo\'],
expansions=[\'attachments.media_keys\',\'author_id\',\'geo.place_id\'],
media_fields=[\'preview_image_url\',\'type\',\'public_metrics\',\'url\'],
place_fields=[\'place_type\', \'geo\'],
user_fields=[\'name\', \'username\', \'location\', \'verified\', \'description\',
\'profile_image_url\',\'entities\'],
max_results=100)
# Get users, media, place list from the includes object
users = {u[\"id\"]: u for u in tweets.includes[\'users\']}
media = {m[\"media_key\"]: m for m in tweets.includes[\'media\']}
# places = {p[\"id\"]: p for p in tweets.includes[\'places\']}
# create a list of records
tweet_info_ls = []
# iterate over each tweet and corresponding user details
for tweet in tweets.data:
# metrics = tweet.organic_metrics
# User Metadata
user = users[tweet.author_id]
# Media files
attachments = tweet.data[\'attachments\']
media_keys = attachments[\'media_keys\']
link_image = media[media_keys[0]].preview_image_url
url_image = media[media_keys[0]].url
link_type = media[media_keys[0]].type
link_public_metrics = media[media_keys[0]].public_metrics
# Public metrics
public_metrics = tweet.data[\'public_metrics\']
retweet_count = public_metrics[\'retweet_count\']
reply_count = public_metrics[\'reply_count\']
like_count = public_metrics[\'like_count\']
quote_count = public_metrics[\'quote_count\']
tweet_info = {
\'id\': tweet.id,
\'author_id\': tweet.author_id,
\'lang\': tweet.lang,
\'geo\': tweet.geo,
# \'tweet_entities\': metrics,
\'referenced_tweets\': tweet.referenced_tweets,
\'reply_settings\': tweet.reply_settings,
\'created_at\': tweet.created_at,
\'text\': tweet.text,
\'source\': tweet.source,
\'retweet_count\': retweet_count,
\'reply_count\': reply_count,
\'like_count\': like_count,
\'quote_count\': quote_count,
\'name\': user.name,
\'username\': user.username,
\'location\': user.location,
\'verified\': user.verified,
\'description\': user.description,
\'entities\': user.entities,
\'profile_image\': user.profile_image_url,
\'media_keys\': link_image,
\'type\': link_type,
\'link_public_metrics\': link_public_metrics,
\'url_image\': url_image
}
tweet_info_ls.append(tweet_info)
# create dataframe from the extracted records
df = pd.DataFrame(tweet_info_ls)