Tweepy：使用 Paginator 提取媒体数据时出错答案

【问题标题】：Tweepy: Error using Paginator to extract media dataTweepy：使用 Paginator 提取媒体数据时出错
【发布时间】：2022-08-02 19:12:30
【问题描述】：

我的目标是从推文中提取媒体数据。我正在使用 twitter api-v2，当我提取少于 100 条推文时我没有问题，但是当我使用 Paginator 时，我得到一个错误告诉我

users = {u[\"id\"]: u for u in tweets.includes[\'users\']}
AttributeError: \'Paginator\' object has no attribute \'includes\'.

而且我无法更改代码以提取媒体数据。另外，我不知道是否有另一种方式来获得这些数据。任何帮助，将不胜感激！

client = tweepy.Client(bearer_token=(config.BEARER_TOKEN))

query = \'climate change -is:retweet has:media\'

# your start and end time for fetching tweets
start_time = \'2020-01-01T00:00:00Z\'
end_time = \'2020-01-31T00:00:00Z\'

# get tweets from the API
tweets = tweepy.Paginator(client.search_all_tweets,
                          query=query,
                          start_time=start_time,
                          end_time=end_time,
                          tweet_fields=[\'context_annotations\', \'created_at\',\'source\',\'public_metrics\',
                                                \'lang\',\'referenced_tweets\',\'reply_settings\',\'conversation_id\',
                                                \'in_reply_to_user_id\',\'geo\'],
                          expansions=[\'attachments.media_keys\',\'author_id\',\'geo.place_id\'],
                          media_fields=[\'preview_image_url\',\'type\',\'public_metrics\',\'url\'],
                          place_fields=[\'place_type\', \'geo\'],
                          user_fields=[\'name\', \'username\', \'location\', \'verified\', \'description\',
                                               \'profile_image_url\',\'entities\'],
                          max_results=100)

# Get users, media, place list from the includes object
users = {u[\"id\"]: u for u in tweets.includes[\'users\']}
media = {m[\"media_key\"]: m for m in tweets.includes[\'media\']}
# places = {p[\"id\"]: p for p in tweets.includes[\'places\']}

# create a list of records
tweet_info_ls = []
# iterate over each tweet and corresponding user details
for tweet in tweets.data:
    # metrics = tweet.organic_metrics
    # User Metadata
    user = users[tweet.author_id]
    # Media files
    attachments = tweet.data[\'attachments\']
    media_keys = attachments[\'media_keys\']
    link_image = media[media_keys[0]].preview_image_url
    url_image = media[media_keys[0]].url
    link_type = media[media_keys[0]].type
    link_public_metrics = media[media_keys[0]].public_metrics
    # Public metrics
    public_metrics = tweet.data[\'public_metrics\']
    retweet_count = public_metrics[\'retweet_count\']
    reply_count = public_metrics[\'reply_count\']
    like_count = public_metrics[\'like_count\']
    quote_count = public_metrics[\'quote_count\']
    tweet_info = {
        \'id\': tweet.id,
        \'author_id\': tweet.author_id,
        \'lang\': tweet.lang,
        \'geo\': tweet.geo,
        # \'tweet_entities\': metrics,
        \'referenced_tweets\': tweet.referenced_tweets,
        \'reply_settings\': tweet.reply_settings,
        \'created_at\': tweet.created_at,
        \'text\': tweet.text,
        \'source\': tweet.source,
        \'retweet_count\': retweet_count,
        \'reply_count\': reply_count,
        \'like_count\': like_count,
        \'quote_count\': quote_count,
        \'name\': user.name,
        \'username\': user.username,
        \'location\': user.location,
        \'verified\': user.verified,
        \'description\': user.description,
        \'entities\': user.entities,
        \'profile_image\': user.profile_image_url,
        \'media_keys\': link_image,
        \'type\': link_type,
        \'link_public_metrics\': link_public_metrics,
        \'url_image\': url_image
    }
    tweet_info_ls.append(tweet_info)

# create dataframe from the extracted records
df = pd.DataFrame(tweet_info_ls)

标签： python tweepy paginator

【解决方案1】：

您必须遍历页面以获取包含（以及数据）。

paginator = tweepy.Paginator(client.search_all_tweets, [...])

for page in paginator:

   print(page.data)      # The tweets in that page
   print(page.includes)  # The includes in that page

【讨论】：