【问题标题】:A script that searches through the text of all sub-directories' files for a string then prints to a created file在所有子目录文件的文本中搜索字符串然后打印到创建的文件的脚本
【发布时间】:2023-02-26 20:12:47
【问题描述】:

我是一个脚本菜鸟,我被困在这个问题上。

我希望代码能做几件事:

  1. 要求用户输入要搜索的字符串。
  2. 遍历给定文件路径的子目录。
  3. 打开具有所列扩展名类型之一的文件。
  4. 打开文件并搜索用户输入的字符串。
  5. 将查询结果打印到文本文件。

    该代码似乎需要一些时间才能运行,但什么也没有。

    import os.path
    
    # Ask the user to enter string to search
    search_str = input("Keyword or phrase:\n")
    
    # Store file names for later printing 
    file_names = []
    
    # Path to search 
    path = os.path.dirname(os.path.realpath(__file__))
    
    # Acceptable file extensions
    extensions = {".xlsx", ".txt", ".pdf", ".doc", ".docx", ".mb", ".xlsm", ".xltx", ".xltm"}
    
    # Create file to store search results
    search_files = open('search.txt', 'w')
    search_files.write(f'I searched for "{search_str}" in your files.\n\nHere is what I found:\n\n')
    
    
    # Program to search files for keyword
    def search_all_files_by_keyword(path):
    
        # Store file count number
        file_count = 0
    
        for root, dirs, files in os.walk(path):
    
            for file in files:
    
                try:
    
                    # Apply file type filter, search for acceptable ext in extension
                    ext = os.path.splitext(file)
                    if ext in extensions:
    
                        # Define file pathway
                        file_path = os.path.join(root, file)
    
                        # Open file for reading
                        with open(file, 'r') as f:
    
                            # Read file and search for keyword or phrase
                            if search_str in f.read():
    
                                # Add file path to file_names and increase file_count, then close file
                                file_names.append(file_path)
                                file_count += 1
                                f.close()
    
                            # If keyword or phrase is not found, do nothing and close file
                            else:
                                f.close()
    
                except:
                    pass
    
        # Print search results to file
        if file_count >= 1:
            search_files.write(f"{file_names}\n")
        else:
            search_files.write(f'No results found for "{search_str}".')
    
    # Run program 
    search_all_files_by_keyword(path)
    

【问题讨论】:

  • 建议考虑改用path.Pathlib
  • 它应该适用于 txt,所以用纯文本写一个 hello world,但是对于大多数其他类型,它通常需要某种类型的索引过滤器来充当文件处理程序,例如,您可以将 Acrobats iFilter 用于 PDF,否则您需要解析每个带有 PDFgrep 或等效文件的 pdf 文件。如果使用 Windows,iFilters 是操作系统的一部分,但您仍然需要来自 Adob​​e、Autodesk、Microsoft 等的专有开发人员变体。

标签: python os.walk


【解决方案1】:

尝试路径模块以这种方式搜索所有文件夹/子文件夹:


import re
from pathlib import Path

# Ask the user to enter string to search
search_str = input("Keyword or phrase:
")

# Store file names for later printing
file_names = []

# Path to search
path = Path("path/to/directory") # Replace with your actual file path

# Acceptable file extensions
extensions = {".xlsx", ".txt", ".pdf", ".doc", ".docx", ".mb"]

# to search for string in a file
def search_in_file(file_path, search_str):
    with open(file_path, 'r', encoding='utf-8') as f:
        file_content = f.read()
        matches = re.findall(search_str, file_content)
        if matches:
            return matches
        else:
            return None

# Iterate through the sub-directories 
for file_path in path.glob("**/*"):
    if file_path.suffix in extensions:
        matches = search_in_file(file_path, search_str)
        
        if matches:
            file_names.append(str(file_path))
            
            with open("results.txt", "a", encoding='utf-8') as results_file:
                results_file.write(f"{file_path}
")
                results_file.write(f"{matches}

")

【讨论】: