解压特定目录中的特定文件并解决任何错误情况？答案

【问题标题】：Untar specific files in a specific directories and account for any error cases?解压特定目录中的特定文件并解决任何错误情况？
【发布时间】：2021-02-17 17:14:47
【问题描述】：

我正在尝试减少以下代码，并可能创建一个可重复用于解压缩文件的函数。目前它执行以下操作：

遍历目录并查看属于当前年份和月份 (YYMM*)“其中 * 不关心”并包含特定文件（例如 file.tar）的文件夹
测试文件夹和所需的 tar 文件，看看它是否有特定的读/写权限被阻止，如果是，则生成一个 .txt 文件作为日志记录的形式，并且不允许重复引用锁定的文件
对于未锁定的文件（读/写被拒绝）并包含我要查找的特定文件（例如 file.tar），解压文件并将内容保留在原始文件夹中
解压完成后，删除 tar 文件并将 tar 文件内容保留在文件夹中

目前我能想到找出文件/文件夹是否被锁定的唯一方法是通过硬编码值。

import os, re, tarfile
from datetime import datetime
dateTimeObj = datetime.now()
curr = dateTimeObj.strftime('%y%m.')
path = r'C:/Users/UserName/Documents/TestFolder/Folder/'
Path_to_example_tarfile_parent_list = [] #Defines list for example specific folders
RXList = []

def oswalk_directory(your_path):
    for directory_path, subdirectories, files in os.walk(path):
        for each_folder_name in subdirectories:
             #Add path+folder_name to end of each folder path
            for each_folder_name in subdirectories:
                Path_to_example_tarfile_parent_list.append(path+each_folder_name)
                #print (each_folder_name)
                if re.search('example_Logs', each_folder_name) :#Traverse directories specific directories that have example_Logs folder
                    Path_to_example_tarfile_parent_list.append(path+each_folder_name)
oswalk_directory(path)    
#Create a list of directories to traverse in current year and month:
print(os.getcwd())                           
print (Path_to_example_tarfile_parent_list)           
for i in range(len(Path_to_example_tarfile_parent_list)):
    #If a directory/folder has write permissions
    if(os.stat(Path_to_example_tarfile_parent_list[i]).st_mode == 16895):
        print("Checking file permissions RW = ok")
        
        for  directory_path, subdirectories, files in os.walk(Path_to_example_tarfile_parent_list[i]):       
            for each_folder_name in subdirectories:
                print ("Just before checking for example_Logs")
                if re.search('example_Logs', each_folder_name) :#Traverse directories specific directories that have example_Logs folder
                    isFile = False
                    print("If is not a file check")
                    print("Print path to file")
                    print (os.path.abspath(each_folder_name))
                    print(each_folder_name)
                    RXList.append((directory_path+'/'+each_folder_name).replace("\\","/")) #Append new list of folders to traverse, replace double slashes with single
                    isFile = os.path.isfile(directory_path+'/'+each_folder_name+'/example.tar')#Check if file exists in path

                    if isFile == True:
                        print("If is a file check")
                        if(os.stat(directory_path+'/'+each_folder_name+'/example.tar').st_mode == 33206):#Permissions for tar/archive file
                            #print (tarfile.info(root+'/'+each_folder_name+'/example.tar'))
                            print("Open tar file")
                            print(directory_path)

                            print(directory_path+'/'+each_folder_name+'/')
                            t = tarfile.open(directory_path+'/'+each_folder_name+'/')
                            for filename in ['example.tar']:
                                try:
                                    f = t.extractfile(filename)
                                except KeyError:
                                    print("Did not find tar filename")
                                else:
                                    print("Found file")
                            #tarfile.extract(directory_path+'/'+each_folder_name+'/')
                            #tarfile.extractfile(directory_path+'/'+each_folder_name+'/') #extract tar file contents to folder

                            tarfile.close()
                            print("Close tar file after extraction")
                            #os.remove(directory_path+'/'+name+'/example.tar')
                        elif(os.stat(directory_path+'/'+each_folder_name+'/example.tar').st_mode == 33060): #Else if: no write permissions, break
                            print("Break if file is not writeable")
                            break
                    else:#else, there is no example tar file
                        break
                                                  
    #If a directory has write permissions are denied                    
    if(os.stat(Path_to_example_tarfile_parent_list[i]).st_mode == 16749):
        print("If directory has write permissions denied then proceed to opening text file")
        found=False#Set found (duplicate indicator) to false prior to loop
        #Check to see if No_Write_Permission_Folder exists to store files with denied permissions 
        isFile = os.path.isfile(path+'tmp/No_Write_Permission_To_SIL_Folder'+curr+'txt')
        if isFile == False:
          f=open(path+'tmp/No_Write_Permission_To_Folder'+curr+'txt','w+')
          f.close
        else:
            with open(path+'tmp/No_Write_Permission_To_example_Folder'+curr+'txt', 'r') as Readfile:
                for line in Readfile:#For each line in txt file

                    if re.search(Path_to_example_tarfile_parent_list[i], line): #If current folder matches current line in txt file
                        found=True #Set found (duplicate) to True, matching line found in txt file 
                        break #terminate from inner loop
                if found == False:
                    with open(path+'tmp/No_Write_Permission_To_example_Folder'+curr+'txt', 'a') as no_write_file:
                        no_write_file.seek(0,0) #Set cursor to beginning of file to allow line-by-line printing 
                        no_write_file.write(Path_to_example_tarfile_parent_list[i]+'\n'.replace("\\","/"))
f = open(path+'/start_script.txt', 'a')
f.close()

【问题讨论】：

不清楚你在问什么。您是否要求人们总体上缩小您的代码？您是在问如何更好地检查 r/w 权限吗？您只是要求人们调试您的代码吗？
一般缩小。目录遍历的硬编码，以及权限读写测试，错误案例测试。
因为我在这方面不像大多数 Python 那样经验丰富
这听起来像 bash 脚本可能会更好。 Python 是必需的吗？如果是这样，您会接受使用subprocess 调用bash 脚本的python 答案吗？
当你提到你想检查if it has specific read/writing privileges blocked...你能在Linux Permissions和Owner/Group的上下文中重新表述一下吗i> / 公开 ?该权限组需要读写还是可以？

标签： python list function os.walk os.path

【解决方案1】：

对于您的“拒绝”文件，使用set 来存储您的文件路径，而不是文件。您可以在最后写入文件，或者在收集了一百万个目录或其他任何内容之后写入文件，但您可能不需要

my_set={'not_allowed_twice'}
my_set.add('not_allowed_twice')
my_set.add('this_is_fine')
my_set

要获得单个函数，请使用递归函数来执行遍历，而不是遍历一次然后循环 - 尽管将逻辑适当地拆分为多个合适的函数然后使用此代码可能会更优雅作为具有单一功能接口的包，如untar_tree()... 这是你的递归walker：

import os
def walk_it(folder):
 if os.path.isdir(folder):
  for f in os.listdir():
   if os.path.isfile(f):
    # do your logic here (is the file writable, if not, add it to a set)
    print(f)
   elif os.path.isdir(f):
    for f2 in os.listdir(f):
     # do your logic here (is the folder writable, if not, add it to another set)
     print("Keep on walking " + f2)
     walk_it(f2)

不要担心硬编码文件权限，它们不会在您的系统上改变。您可以使用stat.filemode(mode) 将权限转换为-rwxrwxrwx 字符串，这样可以更轻松地了解正在发生的事情。

【讨论】：

这将以适当的逻辑（锁定/解锁，特定名称）遍历目录。然后我需要添加文本文件以添加任何被锁定的文件（txt 文件中不允许重复）。看起来代码少了很多？
第一个代码 sn-p 显示了如何使用一个集合而不是一个文件来存储您的去重文件列表，第二个是一个单独的函数来遍历您的目录层次结构。添加您的文件权限测试，在我离开 cmets 的地方进行解压缩工作，一旦功能完成，使用您的设置登录到去重“拒绝”文件和文件夹的文件

【解决方案2】：

这是 bash 中的一个解决方案。我已使用您的要求将脚本标记为 cmets。我已经测试了这个脚本，它运行良好。

#!/bin/bash

# 1. Traverse a directory and look in folders that are of the current year and
# month (YYMM*) "where * is don't care" and contain a specific file (example file.tar)
FILENAME="file.tar"
find -type f -iname "${FILENAME}" -newermt "$(date '+%Y-%m-')1" | while read F
do
    DIR=$(dirname "${F}")
    # 2. Test the folder and desired tar file to see if it has specific read/writing privileges blocked, if so
    # generate a .txt file as a form of logging and do not allow duplicate references of locked files
    if stat -c %A "$DIR" | grep -q 'drw.rw.rw.' && stat -c %A "$F" | grep -q '.rw.rw.rw.'
    then
        # 3. With the files that are not locked (read/write denied) and contain the specific file I am
        # looking for (example file.tar), untar the file and leave contents in the original folder
        tar -xvf "$F" -C "$DIR"
        # 4. When untar is complete, remove tar file and leave tar file contents in folder
        rm -f "$F"
    else
        touch "$F"_lock.txt
    fi
done

我不确定阅读您的问题时没有收集到的两件事：

您希望过滤哪些所有者/组/公共权限？我假设-rw-rw-rw-
tar 档案是压缩为 tar.gz 还是简单地压缩为 tar 文件？如果它们可以是.gz 格式，我们将不得不更改tar -xvf "$F" -C "$DIR" 行以包含-z 标志

【讨论】：

我宁愿保持它用 Python 编写。就所有人的权限而言，rw。我要么尝试将其导出为 .txt 或 csv，并保留无法解压缩的文件日志。该脚本将在每个文件夹中搜索特定文件，然后继续浏览具有 YYMMDD 格式的文件夹目录。