Python：遍历多个文本文件，提取目标字符串，并将它们附加到列表中答案

【问题标题】：Python: Loop through multiple text files, extract target strings, and append them to listPython：遍历多个文本文件，提取目标字符串，并将它们附加到列表中
【发布时间】：2017-08-03 15:53:30
【问题描述】：

（Python 3.6.3）

我需要从多个包含长文本字符串的文本文件中提取 IP 地址和日期。在此之后，我想将这些数据附加到 Python 列表中。这些文本文件也位于子目录中，因此我使用了 'os.path.join(subdir, file) 来确保脚本也捕获这些文件。

这是我的代码：

ip_address = []
dates = []

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file.endswith('.txt'):
            text_data = open(os.path.join(subdir, file))
            ip_address.append(re.findall( r'[0-9]+(?:\.[0-9]+){3}', text_data))
            dates.append(re.findall( r'[0-9]+(?:\/[0-9]+){2}', text_data))
        else:
            pass

但是，我收到以下错误：

TypeError                                 Traceback (most recent call last)
<ipython-input-28-de806e6c6270> in <module>()
      6         if file.endswith('.txt'):
      7             text_data = open(os.path.join(subdir, file))
----> 8             ip_address.append(re.findall( r'[0-9]+(?:\.[0-9]+){3}', text_data))
      9             dates.append(re.findall( r'[0-9]+(?:\/[0-9]+){2}', text_data))
     10         else:

C:\Users\591159\AppData\Local\Continuum\Anaconda3\lib\re.py in findall(pattern, string, flags)
    220 
    221     Empty matches are included in the result."""
--> 222     return _compile(pattern, flags).findall(string)
    223 
    224 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

我假设我尝试提取的数据不是字符串形式，但不完全理解它的含义。我感谢任何指向正确方向的指针。谢谢！

【问题讨论】：

您的意思是 read() 中的数据 - 例如text_data.read() 目前，text_data 是字符串的迭代器，而不是字符串。
是的，我的意思是将数据作为字符串读取。我应该使用什么方法来替换我的代码？提前致谢。

标签： regex python-3.x text-extraction

【解决方案1】：

在接受@AChampion 的建议后，我将我的代码修改为以下内容，它按预期工作：

ip_address = []
dates = []

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file.endswith('.txt'):
            with open(os.path.join(subdir, file), 'r', encoding='utf8') as text_file:
                text_data = text_file.read().replace('\n', '')
            ip_address.append(re.findall( r'[0-9]+(?:\.[0-9]+){3}', text_data))
            dates.append(re.findall( r'[0-9]+(?:\/[0-9]+){2}', text_data))
        else:
            pass

【讨论】：