【问题标题】:Python list directory, subdirectory, and filesPython 列出目录、子目录和文件
【发布时间】:2011-02-23 23:54:40
【问题描述】:

我正在尝试编写一个脚本来列出给定目录中的所有目录、子目录和文件。
我试过这个:

import sys,os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r,d,f in os.walk(path):
    for file in f:
        print os.path.join(root,file)

不幸的是,它不能正常工作。
我得到了所有文件,但没有得到它们的完整路径。

例如,如果目录结构是:

/home/patate/directory/targetdirectory/123/456/789/file.txt

它会打印:

/home/patate/directory/targetdirectory/file.txt

我需要的是第一个结果。任何帮助将不胜感激!谢谢。

【问题讨论】:

    标签: python file path


    【解决方案1】:

    使用os.path.join 连接目录和文件名称

    for path, subdirs, files in os.walk(root):
        for name in files:
            print(os.path.join(path, name))
    

    注意在串联中使用path 而不是root,因为使用root 是不正确的。


    在 Python 3.4 中,添加了 pathlib 模块以便于路径操作。所以相当于os.path.join 将是:

    pathlib.PurePath(path, name)
    

    pathlib 的优点是可以在路径上使用各种有用的方法。如果您使用具体的Path 变体,您还可以通过它们进行实际的操作系统调用,例如更改目录、删除路径、打开它指向的文件等等。

    【讨论】:

    • 对于许多关于“如何在 python 中递归获取所有文件”的问题,这是唯一有用的答案。
    • 理解列表:all_files = [os.path.join(path, name) for name for files for path, subdirs, files in os.walk(folder)]
    • 在 Python3 中使用括号作为打印函数 print(os.path.join(path, name))
    【解决方案2】:

    以防万一...获取目录和子目录中与某种模式匹配的所有文件(例如*.py):

    import os
    from fnmatch import fnmatch
    
    root = '/some/directory'
    pattern = "*.py"
    
    for path, subdirs, files in os.walk(root):
        for name in files:
            if fnmatch(name, pattern):
                print os.path.join(path, name)
    

    【讨论】:

    • 在 Python3 中使用括号作为打印函数 print(os.path.join(path, name))。你也可以使用print(pathlib.PurePath(path, name))
    【解决方案3】:

    无法评论,所以在这里写答案。这是我见过的最清晰的一行:

    import os
    [os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]
    

    【讨论】:

      【解决方案4】:

      这是一个单行:

      import os
      
      [val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk('./')] for val in sublist]
      # Meta comment to ease selecting text
      

      最外层的val for sublist in ... 循环将列表展平为一维。 j 循环收集每个文件基名的列表并将其连接到当前路径。最后,i 循环遍历所有目录和子目录。

      本例在os.walk(...)调用中使用了硬编码的路径./,你可以补充任何你喜欢的路径字符串。

      注意:os.path.expanduser 和/或 os.path.expandvars 可用于路径字符串,如 ~/

      扩展这个例子:

      添加文件基名测试和目录名测试很容易。

      例如,测试*.jpg 文件:

      ... for j in i[2] if j.endswith('.jpg')] ...
      

      另外,不包括.git目录:

      ... for i in os.walk('./') if '.git' not in i[0].split('/')]
      

      【讨论】:

      • 它确实有效,但要排除 .git 目录,您需要检查 '.git' 是否不在路径中。
      • 是的。应该是如果 '.git' 不在 i[0].split('/')]
      • 我会推荐 os.walk 而不是手动目录循环,生成器很棒,去使用它们。
      【解决方案5】:

      您可以看看我制作的这个示例。它使用了已弃用的 os.path.walk 函数,请注意。使用列表来存储所有文件路径

      root = "Your root directory"
      ex = ".txt"
      where_to = "Wherever you wanna write your file to"
      def fileWalker(ext,dirname,names):
          '''
          checks files in names'''
          pat = "*" + ext[0]
          for f in names:
              if fnmatch.fnmatch(f,pat):
                  ext[1].append(os.path.join(dirname,f))
      
      
      def writeTo(fList):
      
          with open(where_to,"w") as f:
              for di_r in fList:
                  f.write(di_r + "\n")
      
      
      
      
      
      
      if __name__ == '__main__':
          li = []
          os.path.walk(root,fileWalker,[ex,li])
      
          writeTo(li)
      

      【讨论】:

        【解决方案6】:

        由于这里的每个示例都只是使用walk(与join),我想展示一个很好的示例并与listdir进行比较:

        import os, time
        
        def listFiles1(root): # listdir
            allFiles = []; walk = [root]
            while walk:
                folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
                for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
            return allFiles
        
        def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\' instead)
            allFiles = []; walk = [root]
            while walk:
                folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
                for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
            return allFiles
        
        def listFiles3(root): # walk (takes ~1.5x as long)
            allFiles = []
            for folder, folders, files in os.walk(root):
                for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] # folder+"\\"+file still ~1.5x
            return allFiles
        
        def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
            allFiles = []
            for folder, folders, files in os.walk(root):
                for file in files: allFiles+=[os.path.join(folder,file)]
            return allFiles
        
        
        for i in range(100): files = listFiles1("src") # warm up
        
        start = time.time()
        for i in range(100): files = listFiles1("src") # listdir
        print("Time taken: %.2fs"%(time.time()-start)) # 0.28s
        
        start = time.time()
        for i in range(100): files = listFiles2("src") # listdir and join
        print("Time taken: %.2fs"%(time.time()-start)) # 0.38s
        
        start = time.time()
        for i in range(100): files = listFiles3("src") # walk
        print("Time taken: %.2fs"%(time.time()-start)) # 0.42s
        
        start = time.time()
        for i in range(100): files = listFiles4("src") # walk and join
        print("Time taken: %.2fs"%(time.time()-start)) # 0.47s
        

        如您所见,listdir 版本效率更高。 (而且join 很慢)

        【讨论】:

          【解决方案7】:

          简单一点的单行:

          import os
          from itertools import product, chain
          
          chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])
          

          【讨论】:

          • 如何列出每个文件?
          【解决方案8】:

          只是一个加法,有了这个你就可以把数据转成CSV格式了

          import sys,os
          try:
              import pandas as pd
          except:
              os.system("pip3 install pandas")
              
          root = "/home/kiran/Downloads/MainFolder" # it may have many subfolders and files inside
          lst = []
          from fnmatch import fnmatch
          pattern = "*.csv"      #I want to get only csv files 
          pattern = "*.*"        # Note: Use this pattern to get all types of files and folders 
          for path, subdirs, files in os.walk(root):
              for name in files:
                  if fnmatch(name, pattern):
                      lst.append((os.path.join(path, name)))
          df = pd.DataFrame({"filePaths":lst})
          df.to_csv("filepaths.csv")
          

          【讨论】:

            【解决方案9】:

            相当简单的解决方案是运行几个子进程调用以将文件导出为 CSV 格式:

            import subprocess
            
            # Global variables for directory being mapped
            
            location = '.' # Enter the path here.
            pattern = '*.py' # Use this if you want to only return certain filetypes
            rootDir = location.rpartition('/')[-1]
            outputFile = rootDir + '_directory_contents.csv'
            
            # Find the requested data and export to CSV, specifying a pattern if needed.
            find_cmd = 'find ' + location + ' -name ' + pattern +  ' -fprintf ' + outputFile + '  "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
            subprocess.call(find_cmd, shell=True)
            

            该命令生成逗号分隔值,可以在 Excel 中轻松分析。

            f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py
            

            生成的 CSV 文件没有标题行,但您可以使用第二个命令添加它们。

            # Add headers to the CSV
            headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
            subprocess.call(headers_cmd, shell=True)
            

            根据您返回的数据量,您可以使用 Pandas 进一步对其进行按摩。以下是我发现的一些有用的东西,尤其是在您处理要查看的多个级别的目录时。

            将这些添加到您的导入中:

            import numpy as np
            import pandas as pd
            

            然后将其添加到您的代码中:

            # Create DataFrame from the csv file created above.
            df = pd.read_csv(outputFile)
                
            # Format columns
            # Get the filename and file extension from the filepath 
            df['FileName'] = df['FilePath'].str.rsplit("/",1).str[-1]
            df['FileExt'] = df['FileName'].str.rsplit('.',1).str[1]
            
            # Get the full path to the files. If the path doesn't include a "/" it's the root directory
            df['FullPath'] = df["FilePath"].str.rsplit("/",1).str[0]
            df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)
            
            # Split the path into columns for the parent directory and its children
            df['ParentDir'] = df['FullPath'].str.split("/",1).str[0]
            df['SubDirs'] = df['FullPath'].str.split("/",1).str[1]
            # Account for NaN returns, indicates the path is the root directory
            df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)
            
            # Determine if the item is a directory or file.
            df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')
            
            # Split the time stamp into date and time columns
            df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
            df['Time'] = df['Time'].str.split('.').str[0]
            
            # Show only files, output includes paths so you don't necessarily need to display the individual directories.
            df = df[df['Type'].str.contains('File')]
            
            # Set columns to show and their order.
            df=df[['FileName','ParentDir','SubDirs','FullPath','DocType','ModifiedDate','Time', 'Size']]
            
            filesize=[] # Create an empty list to store file sizes to convert them to something more readable.
            
            # Go through the items and convert the filesize from bytes to something more readable.
            for items in df['Size'].items():
                filesize.append(convert_bytes(items[1]))
                df['Size'] = filesize 
            
            # Send the data to an Excel workbook with sheets by parent directory
            with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
                for directory, data in df.groupby('ParentDir'):
                data.to_excel(writer, sheet_name = directory, index=False) 
                    
            
            # To convert sizes to be more human readable
            def convert_bytes(size):
                for x in ['b', 'K', 'M', 'G', 'T']:
                    if size < 1024:
                        return "%3.1f %s" % (size, x)
                    size /= 1024
            
                return size
            

            【讨论】:

              【解决方案10】:

              如果您想在 SharePoint 上列出文件,这就是您列出它的方式。您的路径可能会在“\teams\”部分之后开始

                  import os
                  root = r"\\mycompany.sharepoint.com@SSL\DavWWWRoot\teams\MyFolder\Policies and Procedures\Deal Docs\My Deals"
                  list = [os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]
                  print(list)
              

              【讨论】:

                【解决方案11】:

                另一种选择是使用标准库中的glob 模块:

                import glob
                
                path = "/home/patate/directory/targetdirectory/**"
                
                for path in glob.glob(path, recursive=True):
                    print(path)
                

                如果你需要一个迭代器,你可以使用iglob 作为替代:

                for file in glob.iglob(my_path, recursive=True):
                    # ...
                

                【讨论】:

                  猜你喜欢
                  • 1970-01-01
                  • 2012-11-14
                  • 2012-09-02
                  • 2014-10-16
                  • 2012-06-16
                  • 1970-01-01
                  • 2023-03-03
                  相关资源
                  最近更新 更多