【问题标题】:Best way to search through folders and delete files if they exist in a list?搜索文件夹并删除文件(如果它们存在于列表中)的最佳方法?
【发布时间】:2024-05-22 21:10:02
【问题描述】:

我创建了一个列表,其中包含我要删除的文件的文件路径。搜索文件夹的最 Pythonic 方式是什么,这些文件是这些文件的子文件夹,然后删除它们?

目前我正在遍历文件路径列表,然后遍历目录并将目录中的文件与列表中的文件进行比较。必须有更好的方法。

for x in features_to_delete:

    name_checker = str(x) + '.jpg'
    print 'this is name checker {}'.format(name_checker)

    for root, dir2, files in os.walk(folder):
        print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)

        for b in files:
            if b.endswith('.jpg'):
                local_folder = os.path.join(folder, root)
                print 'Here is name of file {}'.format(b)
                print 'Here is name of name checker {}'.format(name_checker)

                if b == name_checker:
                    counter += 1
                    print '{} needs to be deleted..'.format(b)
                    #os.remove(os.path.join(local_folder, b))
                    print 'Removed {} \n'.format(os.path.join(day_folder, b))

                else:
                    print 'This file can stay {} \n'.format(b)
            else:
                pass

所以澄清一下,我现在正在做的是循环遍历要删除的整个功能列表,每次迭代我还循环遍历目录和所有子目录中的每个文件,并将该文件与当前正在循环删除列表的功能。这需要很长时间,而且似乎是一种糟糕的方法。

【问题讨论】:

  • 不幸的是,我实际上使用的是 2.7。我将它与一些仅支持 2.7 的 GIS 功能一起使用
  • 他的链接是针对 python 2 的?我没有看到问题。
  • 从 py 3.5 开始 glob 获得了递归支持,这将简化此代码。见here。使用 py 2,它永远不会与 OP 已经发布的内容完全不同。
  • 我对“功能”感到困惑,它是指向“C:\home\”之类的目录的路径吗?如何删除要删除的文件名?是不是像 "C:\home*.jpg" 并且由于你显示的代码中没有设置 "folder",它是什么?

标签: python python-2.7 list python-3.x


【解决方案1】:

您应该只访问每个目录一次。您可以使用集合将给定目录中的文件名列表与您的删除列表进行比较。包含和不包含文件的列表成为简单的一步操作。如果您不关心打印出文件名,它相当紧凑:

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    for delete_name in delete_set.intersection(files):
        os.remove(os.path.join(root, delete_name))

但是如果你想边走边打印,你必须添加一些中间变量

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    files = set(files)
    delete_these = delete_set & files
    keep_these = files - delete_set
    print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)
    print 'delete these: {}'.format('\n '.join(delete_these))
    print 'keep these: {}'.format('\n '.join(keep_these))
    for delete_name in delete_these:
        os.remove(os.path.join(root, delete_name))

【讨论】:

    【解决方案2】:

    创建一个函数,将递归 glob 类功能与您自己的删除逻辑分开。然后只需遍历列表并删除任何与您的黑名单匹配的内容。

    您可以创建set 以提高匹配文件名的性能。列表越大,改进越大,但对于较小的列表,它可能可以忽略不计。

    from fnmatch import fnmatch
    import os
    from os import path
    
    def globber(rootpath, wildcard):
        for root, dirs, files in os.walk(rootpath):
            for file in files:
                if fnmatch(file, wildcard):
                    yield path.join(root, file)
    
    features_to_delete = ['blah', 'oh', 'xyz']
    
    todelete = {'%s.jpg' % x for x in features_to_delete}
    
    print(todelete)
    for f in globber('/home/prooney', "*.jpg"):
        if f in todelete:
            print('deleting file: %s' % f)
            os.remove(f)
    

    【讨论】:

      【解决方案3】:

      请查看此代码是否对您有帮助。我包括了一个比较两种不同方法的时间的计时器。

      import os
      from timeit import default_timer as timer
      
      features_to_delete = ['a','b','c']
      start = timer()
      for x in features_to_delete:
      
          name_checker = str(x) + '.jpg'
          print 'this is name checker {}'.format(name_checker)
          folder = '.'
          for root, dir2, files in os.walk(folder):
              print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)
      
              for b in files:
                  if b.endswith('.jpg'):
                      local_folder = os.path.join(folder, root)
                      print 'Here is name of file {}'.format(b)
                      print 'Here is name of name checker {}'.format(name_checker)
                      counter = 0
                      if b == name_checker:
                          counter += 1
                          print '{} needs to be deleted..'.format(b)
                          os.remove(os.path.join(local_folder, b))
                          print 'Removed {} \n'.format(os.path.join(local_folder, b))
      
                      else:
                          print 'This file can stay {} \n'.format(b)
                  else:
                      pass
      
      end = timer()
      print(end - start)
      
      start = timer()
      features_to_delete = ['d','e','f']
      matches = []
      folder = '.'
      for x in features_to_delete:
          x = str(x) + '.jpg'
      features_to_delete = [e + '.jpg' for e in features_to_delete]
      print 'features' + str(features_to_delete)
      for root, dirnames, filenames in os.walk(folder):
          for filename in set(filenames).intersection(features_to_delete):#fnmatch.filter(filenames, features_to_delete)# fnmatch.filter(filenames, features_to_delete):
              local_folder = os.path.join(folder, root)
              os.remove(os.path.join(local_folder, filename))
              print 'Removed {} \n'.format(os.path.join(local_folder, filename))
      end = timer()
      print(end - start)
      

      测试

      $ touch foo/bar/d.jpg
      $ touch foo/bar/b.jpg
      $ python deletefiles.py 
      this is name checker a.jpg
      This is the root directory at the moment:. The following are files inside of it
      This is the root directory at the moment:./.idea The following are files inside of it
      This is the root directory at the moment:./foo The following are files inside of it
      This is the root directory at the moment:./foo/bar The following are files inside of it
      Here is name of file d.jpg
      Here is name of name checker a.jpg
      This file can stay d.jpg 
      
      Here is name of file b.jpg
      Here is name of name checker a.jpg
      This file can stay b.jpg 
      
      this is name checker b.jpg
      This is the root directory at the moment:. The following are files inside of it
      This is the root directory at the moment:./.idea The following are files inside of it
      This is the root directory at the moment:./foo The following are files inside of it
      This is the root directory at the moment:./foo/bar The following are files inside of it
      Here is name of file d.jpg
      Here is name of name checker b.jpg
      This file can stay d.jpg 
      
      Here is name of file b.jpg
      Here is name of name checker b.jpg
      b.jpg needs to be deleted..
      Removed ././foo/bar/b.jpg 
      
      this is name checker c.jpg
      This is the root directory at the moment:. The following are files inside of it
      This is the root directory at the moment:./.idea The following are files inside of it
      This is the root directory at the moment:./foo The following are files inside of it
      This is the root directory at the moment:./foo/bar The following are files inside of it
      Here is name of file d.jpg
      Here is name of name checker c.jpg
      This file can stay d.jpg 
      
      0.000916957855225
      features['d.jpg', 'e.jpg', 'f.jpg']
      Removed ././foo/bar/d.jpg 
      
      0.000241994857788
      

      【讨论】: