【问题标题】:How to traverse through the files in a directory?如何遍历目录中的文件?
【发布时间】:2011-06-22 13:15:44
【问题描述】:

我有一个目录日志文件。我想使用 Python 脚本处理此目录中的每个文件。

for file in directory:
      # do something

我该怎么做?

【问题讨论】:

    标签: python


    【解决方案1】:

    使用os.listdir()os.walk(),取决于您是否要递归执行。

    【讨论】:

      【解决方案2】:

      在 Python 2 中,您可以尝试以下操作:

      import os.path
      
      def print_it(x, dir_name, files):
          print dir_name
          print files
      
      os.path.walk(your_dir, print_it, 0)
      

      注意:os.path.walk 的第三个参数是你想要的。你会得到它作为回调的第一个参数。

      在 Python 3 中,os.path.walk 已被删除;请改用os.walk。无需回调,您只需将目录传递给它,它就会产生(dirpath, dirnames, filenames) 三元组。所以上面的粗略等价变成了

      import os
      
      for dirpath, dirnames, filenames in os.walk(your_dir):
          print dirpath
          print dirnames
          print filenames
      

      【讨论】:

      • 虽然由于os 处理path 的方式这将起作用,但您应该始终明确导入os.path
      • 已修复!感谢您的评论。
      • @Ignacio Vazquez-Abrams:os 是一个模块(不是一个包)。 import os 适用于 Python 2.4-3.2、jython、pypy。为什么需要import os.path
      • @JF:虽然os 当前不是一个包,但文档引用了os.path,并且它是否应该更改为os 一个带有@ 的包987654335@ 作为一个实际的模块,只导入os 但使用os.path 的代码(和习惯)将会中断。
      【解决方案3】:

      您可以像这样递归地列出目录中的每个文件。

      from os import listdir
      from os.path import isfile, join, isdir
      
      def getAllFilesRecursive(root):
          files = [ join(root,f) for f in listdir(root) if isfile(join(root,f))]
          dirs = [ d for d in listdir(root) if isdir(join(root,d))]
          for d in dirs:
              files_in_d = getAllFilesRecursive(join(root,d))
              if files_in_d:
                  for f in files_in_d:
                      files.append(join(root,f))
          return files
      

      【讨论】:

        【解决方案4】:
        import os
        # location of directory you want to scan
        loc = '/home/sahil/Documents'
        # global dictonary element used to store all results
        global k1 
        k1 = {}
        
        # scan function recursively scans through all the diretories in loc and return a dictonary
        def scan(element,loc):
        
            le = len(element)
        
            for i in range(le):   
                try:
        
                    second_list = os.listdir(loc+'/'+element[i])
                    temp = loc+'/'+element[i]
                    print "....."
                    print "Directory %s " %(temp)
                    print " "
                    print second_list
                    k1[temp] = second_list
                    scan(second_list,temp)
        
                except OSError:
                    pass
        
            return k1 # return the dictonary element    
        
        
        # initial steps
        try:
            initial_list = os.listdir(loc)
            print initial_list
        except OSError:
            print "error"
        
        
        k =scan(initial_list,loc)
        print " ..................................................................................."
        print k
        

        我将此代码用作目录扫描器,以便为我的音频播放器创建播放列表功能,它将递归扫描目录中存在的所有子目录。

        【讨论】:

          【解决方案5】:

          你可以试试glob:

          import glob
          
          for file in glob.glob('log-*-*.txt'):
            # Etc.
          

          但是glob 不能递归工作(据我所知),因此如果您的日志位于该目录内的文件夹中,您最好查看 Ignacio Vazquez-Abrams strong> 已发布。

          【讨论】:

            【解决方案6】:

            如果您需要检查多种文件类型,请使用

            glob.glob("*.jpg") + glob.glob("*.png")
            

            Glob 不关心列表中文件的顺序。如果您需要按文件名排序的文件,请使用

            sorted(glob.glob("*.jpg"))
            

            【讨论】:

              【解决方案7】:
              import os
              rootDir = '.'
              for dirName, subdirList, fileList in os.walk(rootDir):
                  print('Found directory: %s' % dirName)
                  for fname in fileList:
                      print('\t%s' % fname)
                  # Remove the first entry in the list of sub-directories
                  # if there are any sub-directories present
                  if len(subdirList) > 0:
                      del subdirList[0]
              

              【讨论】:

                【解决方案8】:

                这是我的基于 Matheus Araujo 答案的递归文件遍历器版本,它可以采用可选的排除列表参数,这在处理不需要某些目录/文件/文件扩展名的树副本时非常有用.

                import os
                
                def get_files_recursive(root, d_exclude_list=[], f_exclude_list=[], ext_exclude_list=[], primary_root=None):
                """
                Walk a path to recursively find files
                Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
                :param root: path to explore
                :param d_exclude_list: list of root relative directories paths to exclude
                :param f_exclude_list: list of filenames without paths to exclude
                :param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
                :param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
                :return: list of files found in path
                """
                
                # Make sure we use a valid os separator for exclusion lists, this is done recursively :(
                d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]
                
                files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
                         and f not in f_exclude_list and os.path.splitext(f)[1] not in ext_exclude_list]
                dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
                for d in dirs:
                    p_root = os.path.join(primary_root, d) if primary_root is not None else d
                    if p_root not in d_exclude_list:
                        files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list, primary_root=p_root)
                        if files_in_d:
                            for f in files_in_d:
                                files.append(os.path.join(root, f))
                return files
                

                【讨论】:

                  【解决方案9】:

                  这是我上一个版本的更新,它在排除列表中接受 glob 样式通配符。 该函数基本上进入给定路径的每个子目录,并返回这些目录中所有文件的列表,作为相对路径。 函数的工作方式类似于 Matheus 的答案,并且可以使用可选的排除列表。

                  例如:

                  files = get_files_recursive('/some/path')
                  files = get_files_recursive('/some/path', f_exclude_list=['.cache', '*.bak'])
                  files = get_files_recursive('C:\\Users', d_exclude_list=['AppData', 'Temp'])
                  files = get_files_recursive('/some/path', ext_exclude_list=['.log', '.db'])
                  

                  希望这可以帮助像这个线程的初始答案这样的人帮助我:)

                  import os
                  from fnmatch import fnmatch
                  
                  def glob_path_match(path, pattern_list):
                      """
                      Checks if path is in a list of glob style wildcard paths
                      :param path: path of file / directory
                      :param pattern_list: list of wildcard patterns to check for
                      :return: Boolean
                      """
                      return any(fnmatch(path, pattern) for pattern in pattern_list)
                  
                  
                  def get_files_recursive(root, d_exclude_list=None, f_exclude_list=None, ext_exclude_list=None, primary_root=None):
                      """
                      Walk a path to recursively find files
                      Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
                      and accepts glob style wildcards on files and directories
                      :param root: path to explore
                      :param d_exclude_list: list of root relative directories paths to exclude
                      :param f_exclude_list: list of filenames without paths to exclude
                      :param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
                      :param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
                      :return: list of files found in path
                      """
                  
                      if d_exclude_list is not None:
                          # Make sure we use a valid os separator for exclusion lists, this is done recursively :(
                          d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]
                      else:
                          d_exclude_list = []
                      if f_exclude_list is None:
                          f_exclude_list = []
                      if ext_exclude_list is None:
                          ext_exclude_list = []
                  
                      files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
                               and not glob_path_match(f, f_exclude_list) and os.path.splitext(f)[1] not in ext_exclude_list]
                      dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
                      for d in dirs:
                          p_root = os.path.join(primary_root, d) if primary_root is not None else d
                          if not glob_path_match(p_root, d_exclude_list):
                              files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list,
                                                               primary_root=p_root)
                              if files_in_d:
                                  for f in files_in_d:
                                      files.append(os.path.join(root, f))
                      return files
                  

                  【讨论】:

                    猜你喜欢
                    • 2021-07-09
                    • 2014-10-02
                    • 2022-01-02
                    • 1970-01-01
                    • 2022-11-26
                    • 2010-12-16
                    相关资源
                    最近更新 更多