这是我对获取子文件夹中所有文件的看法...
这使您可以按设置的路径进行查询。这是不同的,因为它不会为每个文件夹发出 1 个请求。它创建批量文件夹进行查询。
批处理片段:
'some_id_1234' in parents or 'some_id_1235' in parents or 'some_id_1236' in parents or 'some_id_1237' in parents or 'some_id_1238' in parents or 'some_id_1239' in parents or 'some_id_1240' in parents and trashed=false
然后您可以一次查询多个文件夹中的文件。您的查询不能太大,因此任何超过 300 多个文件夹('some_id_1234' in parents'),您都会开始出错,因此请将批量大小保持在 250 左右。 p>
假设您要检查的文件夹有 1,110 个文件夹,并且您将批量大小设置为 250。
然后它将发出 5 个单独的请求来查询所有文件夹。
-请求 1 查询 250 个文件夹
-请求 2 查询 250 个文件夹
-请求 3 查询 250 个文件夹
-请求 4 查询 250 个文件夹
-请求5查询110个文件夹
然后里面的任何子文件夹都会被批量创建并递归查询。
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
def parse_gdrive_path(gd_path):
if ':' in gd_path:
gd_path = gd_path.split(':')[1]
gd_path = gd_path.replace('\\', '/').replace('//', '/')
if gd_path.startswith('/'):
gd_path = gd_path[1:]
if gd_path.endswith('/'):
gd_path = gd_path[:-1]
return gd_path.split('/')
def resolve_path_to_id(folder_path):
_id = 'root'
folder_path = parse_gdrive_path(folder_path)
for idx, folder in enumerate(folder_path):
folder_list = gdrive.ListFile({'q': f"'{_id}' in parents and title='{folder}' and trashed=false and mimeType='application/vnd.google-apps.folder'", 'fields': 'items(id, title, mimeType)'}).GetList()
_id = folder_list[0]['id']
title = folder_list[0]['title']
if idx == (len(folder_path) - 1) and folder == title:
return _id
return _id
def get_folder_files(folder_ids, batch_size=100):
base_query = "'{target_id}' in parents"
target_queries = []
query = ''
for idx, folder_id in enumerate(folder_ids):
query += base_query.format(target_id=folder_id)
if len(folder_ids) == 1 or idx > 0 and idx % batch_size == 0:
target_queries.append(query)
query = ''
elif idx != len(folder_ids)-1:
query += " or "
else:
target_queries.append(query)
for query in target_queries:
for f in gdrive.ListFile({'q': f"{query} and trashed=false", 'fields': 'items(id, title, mimeType, version)'}).GetList():
yield f
def get_files(folder_path=None, target_ids=None, files=[]):
if target_ids is None:
target_ids = [resolve_path_to_id(folder_path)]
file_list = get_folder_files(folder_ids=target_ids, batch_size=250)
subfolder_ids = []
for f in file_list:
if f['mimeType'] == 'application/vnd.google-apps.folder':
subfolder_ids.append(f['id'])
else:
files.append(f['title'])
if len(subfolder_ids) > 0:
get_files(target_ids=subfolder_ids)
return files
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
gdrive = GoogleDrive(gauth)
file_list = get_files('/Some/Folder/Path')
for f in file_list:
print(f)
例如:
您的 Google 驱动器包含以下内容:
(folder) Root
(folder) Docs
(subfolder) Notes
(subfolder) School
(file) notes_1.txt
(file) notes_2.txt
(file) notes_3.txt
(file) notes_4.txt
(file) notes_5.txt
(subfolder) Important
(file) important_notes_1.txt
(file) important_notes_2.txt
(file) important_notes_3.txt
(subfolder) Old Notes
(file) old_1.txt
(file) old_2.txt
(file) old_3.txt
(subfolder) Secrets
(file) secret_1.txt
(file) secret_2.txt
(file) secret_3.txt
(folder) Stuff
(file) nothing.txt
(file) this-will-not-be-found.txt
你想从“Notes”文件夹/子文件夹中获取所有文件
你会这样做:
file_list = get_files('/Docs/Notes')
for f in file_list:
print(f)
Output:
>> notes_1.txt
>> notes_2.txt
>> notes_3.txt
>> notes_4.txt
>> notes_5.txt
>> important_notes_1.txt
>> important_notes_2.txt
>> important_notes_3.txt
>> old_1.txt
>> old_2.txt
>> old_3.txt
>> secret_1.txt
>> secret_2.txt
>> secret_3.txt
希望这可以帮助某人:)