Gmail API 仅返回 1Mb 的数据答案

【问题标题】：Gmail API only returing 1Mb of dataGmail API 仅返回 1Mb 的数据
【发布时间】：2023-03-30 20:10:02
【问题描述】：

我已将我想要请求的所有邮件过滤到 Gmail 中的一个标签中，并且通过在他们的 quickstart.py 脚本中使用这段代码成功地取回了邮件：

# My Code
results = service.users().messages().list(userId='me',labelIds = '{Label_id}', maxResults='10000000').execute()
messages = results.get('messages', [])

for message in messages:
    msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
    print(msg['snippet'].encode('utf-8').strip())

我首先在之前的请求中列出了所有标签及其 ID，然后将其替换为 {Label_id}。然后我只要求主题元数据字段。问题是响应只返回恰好 1 Mb 的数据。我知道这一点，因为我将输出重定向到一个文件并执行ls -latr --block-size=MB。此外，我可以看到该标签中的（旧）消息比基于日期返回的消息要多得多。请求总是在完全相同的消息处停止。他们都没有任何附件。

根据他们的 API 参考，我应该被允许：

Daily Usage 1,000,000,000 quota units per day

Per User Rate Limit 250 quota units per user per second

我不认为这就是我要打的，但也许我错了，因为每条消息都有 1-3 条回复，我可以看到这些回复可能会被计为每条 5 个配额单位？不确定。我试过使用maxResults 参数，但这似乎并没有改变任何东西。

我是在这里达到上限，还是在我的请求逻辑中？

编辑 1

from __future__ import print_function
import pickle
import os.path
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

## If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://mail.google.com/']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    messageArray = []
    pageToken = None
    while True:
        results = service.users().messages().list(userId='me',labelIds = '{Label_ID}', maxResults=500, pageToken=pageToken).execute()
        messages = results.get('messages', [])
        for message in messages:
            msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
            messageArray.append(msg)
        pageToken = results.get('nextPageToken', None)
        if not pageToken:
            print('[%s]' % ', '.join(map(str, messageArray)))
            break


if __name__ == '__main__':
    main()

编辑 2

这是我使用的最后一个脚本。这个输出了一种更好更清晰的格式，我只是重定向到一个文件并且很容易解析。

from __future__ import print_function
import pickle
import os.path
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

## If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://mail.google.com/']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    pageToken = None
    while True:
        results = service.users().messages().list(userId='me',labelIds = '{Label_ID}', maxResults=500, pageToken=pageToken).execute()
        messages = results.get('messages', [])
        for message in messages:
            msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
            print(msg['snippet'].encode('utf-8').strip())
        pageToken = results.get('nextPageToken', None)
        if not pageToken:
            break


if __name__ == '__main__':
    main()

【问题讨论】：

标签： python gmail-api

【解决方案1】：

maxResults 最大值为 500。如果您将其设置得更高，您仍然只能在结果中获得 500 条消息。您可以通过检查messages 来确认这一点。

你需要实现pagination。

messages = []
pageToken = None
while True:
  results = service.users().messages().list(userId='me',labelIds = '{Label_id}', maxResults=500, pageToken=pageToken).execute()
  messages.append(results.get(messages, []))
  pageToken = results.get('nextPageToken', None)
  if not pageToken:
    break

如果您只想要原始未解析的电子邮件，请尝试使用

# at top of file
from base64 import urlsafe_b64decode

msg = service.users().messages().get(userId='me', id=message['id'], format='raw').execute()
print(urlsafe_b64decode(msg['raw']))

【讨论】：

嗯，这肯定会拉回更多数据（现在为 10 MB），但 base64 编码似乎无法正确/完全解码。我刚刚跑了cat messages.txt | base64 --decode，但它只在说“无效输入”之前解码了一个非常小的sn-p。我用 sed 分别用“+”和“/”替换了“-”和“_”，但这并没有改变任何东西。是元数据格式限制了我吗？
忘记了 b64 解码。它是 URL 安全的 base64，所以一些字符被替换掉了。
不用担心。这对我有用print(base64.urlsafe_b64decode(msg['raw'].encode('utf-8').strip()))。否则我得到一个 TypeError。所以......我仍然只收到与元数据格式一样的电子邮件。 2018 年 4 月 9 日是最早的日期，尽管该标签中的电子邮件可以追溯到 2015 年。所以这告诉我它绝对不是数据上限/限制。也许 1 年和 2 个月后的邮件会被归档或忽略？我没有收到任何错误。我不明白为什么请求会停止。
查看我在上面回答的编辑（如果您认为它是正确的，将不胜感激/回答）。您可能只会看到前 500 个结果。
啊好吧好吧。我会实现它并快速完成它。