从一个位置的多个文本文件中查找标题并添加到具有相同标题的 xlsx 文档中答案

【问题标题】：find headings from multiple text files in one location and add to xlsx document with same headings从一个位置的多个文本文件中查找标题并添加到具有相同标题的 xlsx 文档中
【发布时间】：2017-06-15 10:12:53
【问题描述】：

我需要创建一个 Python 程序，该程序将读取一组目录中的多个 .txt 文件，然后从文本文件中查找特定标题，并将搜索到的文本的标题下找到的数据存储在 .xlsx 文档中

.txt 文件示例

person:         Vyacheslav Danik
address:        Ukraine, Kharkov
phone:          +380675746805
address:        Ukraine, Kharkiv
address:        Pavlova st., 319

我需要 5 个 Excel 电子表格中的标题；编号、组织、角色、姓名和地址。并让 Python 程序将扫描的每个文件的电子表格中的这些标题下的信息放在这些标题下。

任何帮助将不胜感激，因为我在这方面有点挣扎。谢谢

【问题讨论】：

欢迎来到 SO：请使用 tour 并阅读 minimal reproducible example。您的问题对于本论坛来说过于宽泛，您可能无法获得所需的所有答案。你总是可以聘请一些帮助。我会用 5 美元和一些啤酒来编写你的代码 :)

标签： python excel python-2.7

【解决方案1】：

我自己还是个初学者，但我认为这似乎很容易。它更多地是您构建和定制的起点。我只选择做一列（人），我很确定你需要做的一切都在这个例子中。您必须通过运行接下来的 2 个命令来安装访问电子表格所需的 2 个必需的 python 库（假设您使用的是某种类型的 linux，您没有提供足够的信息）：

pip 安装 xlrd

pip 安装 xlutils

这里是例子，cmets大致解释了每一行的作用。

#!/usr/bin/env python

''' Required to install these libraries to access spreadsheets
pip install xlrd
pip install xlutils
'''

import os, re, string

from xlutils.copy import copy    
from xlrd import open_workbook

book_ro = open_workbook("spreadsheet.xls")

# creates a writeable copy
book = copy(book_ro)

# Select first sheet
sheet1 = book.get_sheet(0)

# Create list to hold people, otherwise we have to figure out the next empty column in spreadsheet
peopleList = []

# Get list of files in current folder and filter only the txt files
for root, dirs, docs in os.walk('.', topdown=True):            

    for filename in docs:
            if (filename.endswith(".txt")) or (filename.endswith(".TXT")):
                filepath=(os.path.join(root, name))

                # Open file read only
                TxtFile = open(filepath,"r")

                # Read all the lines at once in variable
                lines = TxtFile.readlines()
                
                # Finished reading, close file
                TxtFile.close()

                # Convert file to big string so it can be searched with re.findall
                lines = '\n'.join(lines)

                # Find all occurences of "person:" and capture rest of line
                people = re.findall(r'person: (.*)',lines)

                # Remove delimeters/special character separating each name
                people = map(lambda x: x.strip(), people)

                # If file has more than 1 person, add each one individually
                for person in people:
                    peopleList.append(person)

row = 0
column = 0

# Sort the list and remove duplicates (set(sort)), then step thru list and write to spreadsheet
for person in set(sorted(peopleList)):
    sheet1.write(row, column, person)
    row += 1

# This will overwrite the original spreadsheet if one existed
book.save("spreadsheet.xls")

【讨论】：