【问题标题】:How do you walk through the directories using python?你如何使用 python 遍历目录?
【发布时间】:2011-02-24 18:03:21
【问题描述】:

我有一个叫做notes的文件夹,自然会被分类成文件夹,在这些文件夹中也会有子文件夹的子分类。现在我的问题是我有一个遍历 3 级子目录的函数:

def obtainFiles(path):
      list_of_files = {}
      for element in os.listdir(path):
          # if the element is an html file then..
          if element[-5:] == ".html":
              list_of_files[element] = path + "/" + element
          else: # element is a folder therefore a category
              category = os.path.join(path, element)
              # go through the category dir
              for element_2 in os.listdir(category):
                  dir_level_2 = os.path.join(path,element + "/" + element_2)
                  if element_2[-5:] == ".html":
                      print "- found file: " + element_2
                      # add the file to the list of files
                      list_of_files[element_2] = dir_level_2
                  elif os.path.isdir(element_2):
                      subcategory = dir_level_2
                      # go through the subcategory dir
                      for element_3 in os.listdir(subcategory):
                          subcategory_path = subcategory + "/" + element_3
                        if subcategory_path[-5:] == ".html":
                            print "- found file: " + element_3
                            list_of_files[element_3] = subcategory_path
                        else:
                            for element_4 in os.listdir(subcategory_path):
                                 print "- found file:" + element_4

请注意,这仍然是一项正在进行的工作。在我眼里太丑了…… 我在这里想要实现的是遍历所有文件夹和子文件夹,并将所有文件名放在一个名为“list_of_files”的字典中,名称为“key”,完整路径为“value”。该函数目前还不能很好地工作,但想知道如何使用 os.walk 函数来做类似的事情?

谢谢

【问题讨论】:

标签: python


【解决方案1】:

根据您的简短描述,这样的事情应该可以工作:

list_of_files = {}
for (dirpath, dirnames, filenames) in os.walk(path):
    for filename in filenames:
        if filename.endswith('.html'): 
            list_of_files[filename] = os.sep.join([dirpath, filename])

【讨论】:

  • .endswith() 是查找字符串是否以字符 .html 结尾的更好解决方案
【解决方案2】:

另一种方法是使用生成器,以@ig0774 的代码为基础

import os
def walk_through_files(path, file_extension='.html'):
   for (dirpath, dirnames, filenames) in os.walk(path):
      for filename in filenames:
         if filename.endswith(file_extension): 
            yield os.path.join(dirpath, filename)

然后

for fname in walk_through_files():
    print(fname)

【讨论】:

    【解决方案3】:

    我多次遇到这个问题,但没有一个答案让我满意 - 所以创建了a script for that。在遍历目录时,Python 使用起来非常麻烦。

    它的使用方法如下:

    import file_walker
    
    
    for f in file_walker.walk("/a/path"):
         print(f.name, f.full_path) # Name is without extension
         if f.isDirectory: # Check if object is directory
             for sub_f in f.walk(): # Easily walk on new levels
                 if sub_f.isFile: # Check if object is file (= !isDirectory)
                     print(sub_f.extension) # Print file extension
                     with sub_f.open("r") as open_f: # Easily open file
                         print(open_f.read())
                    
                
    

    【讨论】:

      【解决方案4】:

      你可以这样做:

      list_of_files = dict([ (file, os.sep.join((dir, file)))
                             for (dir,dirs,files) in os.walk(path)
                             for file in files
                             if file[-5:] == '.html' ])
      

      【讨论】: