【问题标题】:Accessing folders, subfolders and subfiles using PyDrive (Python)使用 PyDrive (Python) 访问文件夹、子文件夹和子文件
【发布时间】:2016-03-10 03:48:06
【问题描述】:

我有以下 PyDrive 文档中的代码,它允许访问我的 Google Drive 中的顶级文件夹。我想从中访问所有文件夹、子文件夹和文件。我将如何去做(我刚开始使用 PyDrive)?

#!/usr/bin/python
# -*- coding: utf-8 -*-
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive


gauth = GoogleAuth()
gauth.LocalWebserverAuth() # Creates local webserver and auto handles authentication

#Make GoogleDrive instance with Authenticated GoogleAuth instance
drive = GoogleDrive(gauth)

#Google_Drive_Tree = 
# Auto-iterate through all files that matches this query
top_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in top_list:
    print 'title: %s, id: %s' % (file['title'], file['id'])
    print "---------------------------------------------"

#Paginate file lists by specifying number of max results
for file_list in drive.ListFile({'q': 'trashed=true', 'maxResults': 10}):
    print 'Received %s files from Files.list()' % len(file_list) # <= 10
    for file1 in file_list:
        print 'title: %s, id: %s' % (file1['title'], file1['id'])

我检查了以下页面 How to list all files, folders, subfolders and subfiles of a Google drive folder ,这似乎是我正在寻找的答案,但代码不再存在。

【问题讨论】:

    标签: python python-2.7 metadata google-api-python-client


    【解决方案1】:

    它需要对文件列表进行迭代。基于this,代码获取文件夹中每个文件的文件标题和url链接。通过提供文件夹的id,例如ListFolder('id'),可以调整代码以获取特定文件夹。下面给出的例子是查询root

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    
    gauth = GoogleAuth()
    gauth.LocalWebserverAuth() # Creates local webserver and auto handles authentication
    
    #Make GoogleDrive instance with Authenticated GoogleAuth instance
    drive = GoogleDrive(gauth)
    
    def ListFolder(parent):
      filelist=[]
      file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % parent}).GetList()
      for f in file_list:
        if f['mimeType']=='application/vnd.google-apps.folder': # if folder
            filelist.append({"id":f['id'],"title":f['title'],"list":ListFolder(f['id'])})
        else:
            filelist.append({"title":f['title'],"title1":f['alternateLink']})
      return filelist
    
    ListFolder('root')
    

    【讨论】:

    • 虽然也可能需要一些速率限制
    【解决方案2】:

    您的代码绝对正确。但是使用 Pydrive 的默认设置,您只能访问根级别的文件和文件夹。 更改 settings.yaml 文件中的 oauth_scope 可解决此问题。

    client_config_backend: settings
    client_config:
    client_id: XXX
    client_secret: XXXX
    
    save_credentials: True
    save_credentials_backend: file
    save_credentials_file: credentials.json
    
    get_refresh_token: True
    
    oauth_scope:
      - https://www.googleapis.com/auth/drive
      - https://www.googleapis.com/auth/drive.metadata
    

    【讨论】:

    • 更改oauth_scope 后,我不得不删除credentials.json,创建一个空的新credentials.json 文件并再次进行身份验证,以便应用访问新范围。
    【解决方案3】:

    这是我对获取子文件夹中所有文件的看法... 这使您可以按设置的路径进行查询。这是不同的,因为它不会为每个文件夹发出 1 个请求。它创建批量文件夹进行查询。

    批处理片段:

    'some_id_1234' in parents or 'some_id_1235' in parents or 'some_id_1236' in parents or 'some_id_1237' in parents or 'some_id_1238' in parents or 'some_id_1239' in parents or 'some_id_1240' in parents and trashed=false
    

    然后您可以一次查询多个文件夹中的文件。您的查询不能太大,因此任何超过 300 多个文件夹('some_id_1234' in parents'),您都会开始出错,因此请将批量大小保持在 250 左右。 p>

    假设您要检查的文件夹有 1,110 个文件夹,并且您将批量大小设置为 250。 然后它将发出 5 个单独的请求来查询所有文件夹。

    -请求 1 查询 250 个文件夹

    -请求 2 查询 250 个文件夹

    -请求 3 查询 250 个文件夹

    -请求 4 查询 250 个文件夹

    -请求5查询110个文件夹

    然后里面的任何子文件夹都会被批量创建并递归查询。

    
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    
    
    
    def parse_gdrive_path(gd_path):
        if ':' in gd_path:
            gd_path = gd_path.split(':')[1]
        gd_path = gd_path.replace('\\', '/').replace('//', '/')
        if gd_path.startswith('/'):
            gd_path = gd_path[1:]
        if gd_path.endswith('/'):
            gd_path = gd_path[:-1]
        return gd_path.split('/')
    
    
    def resolve_path_to_id(folder_path):
        _id = 'root'
        folder_path = parse_gdrive_path(folder_path)
        for idx, folder in enumerate(folder_path):
            folder_list = gdrive.ListFile({'q': f"'{_id}' in parents and title='{folder}' and trashed=false and mimeType='application/vnd.google-apps.folder'", 'fields': 'items(id, title, mimeType)'}).GetList()
            _id = folder_list[0]['id']
            title = folder_list[0]['title']
            if idx == (len(folder_path) - 1) and folder == title:
                return _id
        return _id
    
    
    def get_folder_files(folder_ids, batch_size=100):
    
        base_query = "'{target_id}' in parents"
        target_queries = []
        query = ''
    
        for idx, folder_id in enumerate(folder_ids):
            query += base_query.format(target_id=folder_id)
            if len(folder_ids) == 1 or idx > 0 and idx % batch_size == 0:
                target_queries.append(query)
                query = ''
            elif idx != len(folder_ids)-1:
                query += " or "
            else:
                target_queries.append(query)
    
        for query in target_queries:
            for f in gdrive.ListFile({'q': f"{query} and trashed=false", 'fields': 'items(id, title, mimeType, version)'}).GetList():
                yield f
    
    
    def get_files(folder_path=None, target_ids=None, files=[]):
    
        if target_ids is None:
            target_ids = [resolve_path_to_id(folder_path)]
    
        file_list = get_folder_files(folder_ids=target_ids, batch_size=250)
    
        subfolder_ids = []
    
        for f in file_list:
            if f['mimeType'] == 'application/vnd.google-apps.folder':
                subfolder_ids.append(f['id'])
            else:
                files.append(f['title'])
    
        if len(subfolder_ids) > 0:
            get_files(target_ids=subfolder_ids)
    
        return files
    
    
    gauth = GoogleAuth()
    gauth.LocalWebserverAuth()
    
    gdrive = GoogleDrive(gauth)
    
    
    file_list = get_files('/Some/Folder/Path')
    
    for f in file_list:
        print(f)
    

    例如:

    您的 Google 驱动器包含以下内容:

    (folder) Root
        (folder) Docs
            (subfolder) Notes
                (subfolder) School
                    (file) notes_1.txt
                    (file) notes_2.txt
                    (file) notes_3.txt
                    (file) notes_4.txt
                    (file) notes_5.txt
                    (subfolder) Important
                        (file) important_notes_1.txt
                        (file) important_notes_2.txt
                        (file) important_notes_3.txt
                    (subfolder) Old Notes
                        (file) old_1.txt
                        (file) old_2.txt
                        (file) old_3.txt
                        (subfolder) Secrets
                            (file) secret_1.txt
                            (file) secret_2.txt
                            (file) secret_3.txt
        (folder) Stuff
            (file) nothing.txt
            (file) this-will-not-be-found.txt
    

    你想从“Notes”文件夹/子文件夹中获取所有文件

    你会这样做:

    file_list = get_files('/Docs/Notes')
    
    for f in file_list:
        print(f)
    
    Output:
    
    >> notes_1.txt
    >> notes_2.txt
    >> notes_3.txt
    >> notes_4.txt
    >> notes_5.txt
    >> important_notes_1.txt
    >> important_notes_2.txt
    >> important_notes_3.txt
    >> old_1.txt
    >> old_2.txt
    >> old_3.txt
    >> secret_1.txt
    >> secret_2.txt
    >> secret_3.txt
    

    希望这可以帮助某人:)

    【讨论】:

      猜你喜欢
      • 2014-02-09
      • 1970-01-01
      • 2021-06-04
      • 1970-01-01
      • 2013-02-03
      • 1970-01-01
      • 2021-11-26
      相关资源
      最近更新 更多