从文件读取时去除空格和新行答案

【问题标题】：strip white spaces and new lines when reading from file从文件读取时去除空格和新行
【发布时间】：2017-07-17 13:59:02
【问题描述】：

我有以下代码，它在从文件读取时成功去除了行尾字符，但对于任何前导和尾随空格都没有这样做（我希望保留它们之间的空格！）

实现这一目标的最佳方法是什么？（注意，这是一个具体的例子，所以不是一般的字符串剥离方法的重复）

我的代码：（尝试使用测试数据：“Mr Moose”（未找到）如果您尝试 “Mr Moose”（即 Moose 后面的空格）会起作用。

#A COMMON ERROR is leaving in blank spaces and then finding you cannot work with the data in the way you want!

"""Try the following program with the input: Mr Moose
...it doesn't work..........
but if you try "Mr Moose " (that is a space after Moose..."), it will work!
So how to remove both new lines AND leading and trailing spaces when reading from a file into a list. Note, the middle spaces between words must remain?
"""

alldata=[]
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
      for line in f.readlines():
            alldata.append((line.strip()))
      print(alldata)


      print()
      print()

      for x in alldata: 
             teacher_names.append(x.split(delimiter)[col_num]) 

      teacher=input("Enter teacher you are looking for:")
      if teacher in teacher_names: 
            print("found")
      else:
            print("No")

期望的输出，关于生成列表 alldata

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

即 - 删除开头、分隔符之前或之后的所有前导和尾随空格。必须留下诸如 Mr Moose 等单词之间的空格。

教科书内容：

Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English

提前致谢

【问题讨论】：

teacher_names.append(x.split(delimiter)[col_num].strip())是你想要的吗？
用我上面编辑的代码再试一次（我之前的括号错了）
teacherbook.txt的内容是什么？
@Chris_Rands - 完美。是的，这会起作用，但作为解决方案，我所追求的是将它从文件中读入 all_data 而不 /n 以及前导和尾随空格，而不是在之后处理它 - 我认为这将是更好的解决方案
@MissComputing：由于只有在分隔符拆分后才能确定哪些字符实际上是前导/尾随空格，所以我认为 Chris_Rands 的答案与您将得到的一样接近。

标签： python file split

【解决方案1】：

你可以使用正则表达式：

txt='''\
Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English'''

>>> [re.sub(r'\s*:\s*', ':', line).strip() for line in txt.splitlines()]
['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

所以你的代码变成了：

import re
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
    alldata=[re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip() for line in f]
    print(alldata)

    for x in alldata: 
         teacher_names.append(x.split(delimiter)[col_num]) 
    print(teacher_names)

打印：

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']
['Mr Moose', 'Mr Goose', 'Mrs Congenelipilling']

关键部分是正则表达式：

re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip()

          ^                          0 to unlimited spaced before the delimiter
            ^                        place for the delimiter
              ^                      unlimited trailing space

Interactive Demo

对于全 Python 解决方案，我将使用 str.partition 获取分隔符的左侧和右侧，然后根据需要去除空格：

alldata=[]    
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        alldata.append(lh.rstrip() + sep + rh.lstrip())

同样的输出

另一个建议。您的数据更适合dict，而不是列表。

你可以这样做：

di={}
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        di[lh.rstrip()]=rh.lstrip()

或理解版：

with open("teacherbook.txt") as f:
    di={lh.rstrip():rh.lstrip() 
          for lh,_,rh in (line.rstrip().partition(delimiter) for line in f)}

然后像这样访问：

>>> di['Mr Moose']
'Maths'

【讨论】：

你能逐个字符解释 [re.sub(r'\s*:\s*', ':', line) ...！？
虽然我很喜欢正则表达式，但它们相当昂贵。我的纯 python 解决方案与此模式匹配解决方案的 10000 次迭代表明此解决方案慢了 64 倍（0.0547s 与 0.00085s）。我一直认为，如果您可以轻松避免正则表达式模式，那么您应该这样做，在这种情况下，纯 Python 解决方案是一种 Python 单线。
尽管在这种情况下，性能差异是微不足道的。取决于您要优化的内容；我个人认为正则表达式更具可读性，但其他人可能不同意。
@MissComputing：查看带有重新解释的编辑和一个简单的全 Python 解决方案
感谢您提供 python 解决方案 - 并使用str.partition() 大声喊叫。我唯一的小抱怨是''.join(...) 比str + str + str 好，但感谢您的良好解释。

【解决方案2】：

无需使用readlines()，您可以简单地遍历文件对象以获取每一行，并使用strip() 删除\n 和空格。因此，您可以使用此列表推导；

with open('teacherbook.txt') as f:
    alldata = [':'.join([value.strip() for value in line.split(':')]) 
               for line in f]
    print(alldata)

输出；

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

【讨论】：

你是否可以在不使用快捷方式的情况下逐行写出来（因此它对于教学目的更具可读性）。我的意思是，扩展 for 循环等并评论每一行.....？
虽然我同意 f.readlines() 不是一个好的模式的评论，但评论 this 会为您提供已经去除 \n 字符的每一行 是错误的。试试[line for line in f]。您正在剥离 \n 和 value.strip() 以及所有前导空格（如果相关）。
感谢您指出这一点，我从未意识到这一点 - 我想我一直将它与 strip() 结合使用。我应该说的是它将文件拆分为行列表，这正是readlines() 所做的，因此它消除了不必要的额外步骤，而不是解决问题。

【解决方案3】：

变化：

teacher_names.append(x.split(delimiter)[col_num])

到：

teacher_names.append(x.split(delimiter)[col_num].strip())

【讨论】：

teacher_names.append(x.split(delimiter)[col_num].strip()) 是 ChrisRands 不久前在他的评论中建议的......

【解决方案4】：

删除开头、分隔符之前或之后的所有前导和尾随空格。必须留下诸如 Mr Moose 等单词之间的空格。

您可以在分隔符处拆分字符串，从它们中去除空格，然后将它们重新连接在一起：

for line in f.readlines():
    new_line = ':'.join([s.strip() for s in line.split(':')])
    alldata.append(new_line)

示例：

>>> lines = ['  Mr Moose :   Maths', ' Mr Goose :  History  ']
>>> lines
['  Mr Moose :   Maths', ' Mr Goose :  History  ']
>>> data = []
>>> for line in lines:
    new_line = ':'.join([s.strip() for s in line.split(':')])
    data.append(new_line)


>>> data
['Mr Moose:Maths', 'Mr Goose:History']

【讨论】：

虽然这在技术上是正确的，但它没有理由先拆分，然后再次连接，如果它会再次拆分（在同一个分隔符上）。
@HendrikMakait 我不确定你的意思是什么？它并不缺乏理由。您拆分的原因是在行的每个部分之前删除前导和尾随空格。然后，您需要使用原始分隔符将剥离的部分重新连接在一起。这可能看起来有点奇怪，但我很难说这是没有道理的。
我可能是错的（请注意对原始问题的评论，要求此处提供更清晰的规范），但就代码示例的进展而言，必须在同一分隔符上再次拆分数据以实现其目的.因此，在最终拆分之前再次连接它只会创建更多代码和工作。我正在将它与 Chris_Rands 回答/评论的基准进行比较。
实际上，我同意你的看法，@Hendrik。看起来他们稍后会再次拆分数据。但是，正如您在之前的评论中所指出的，我正在回答 OP 在问题中提出的最后一个问题。这就是为什么我在回答中引用了这个问题；明确我正在回答的什么。另请注意，我故意试图使我的答案具有通用性，以便它可以应用未来用户的问题，而不是这个特定的问题。但对于 OP 的具体情况，Chris_Rands 方法可能更好。
@HendrikMakait 我不赞成您的编辑，因为我相信我已经明确了我要回答的问题的哪一部分；这就是我引用 OP 问题的目的。我将把它留给 OP 和未来的读者，我的解决方案是否适合原始代码。不过还是谢谢。

【解决方案5】：

您可以使用正则表达式轻松完成 - re.sub：

import re

re.sub(r"[\n \t]+$", "", "aaa \t asd \n ")
Out[17]: 'aaa \t asd'

第一个参数模式 - [所有要删除的字符]++ - 一个或多个匹配项$$ - 字符串结尾

https://docs.python.org/2/library/re.html

【讨论】：

【解决方案6】：

使用 string.rstrip('something') 你可以像这样从字符串的右端删除那个'something'：

a = 'Mr Moose \n'

print a.rstrip(' \n') # prints 'Mr Moose\n' instead of 'Mr Moose \n\n'

【讨论】：

抱歉，这不是我想要的 - 但感谢您的建议（这对其他想要剥离字符串的人很有用）。在将文件读入列表时，我需要知道如何去除换行符以及前导和尾随空格，从而使空格保持不变。
@MissComputing 您的代码确实从每一行中去除前导和尾随空格，问题是您还想去除教师姓名和分隔符之间的空格，这你的代码不行。
尝试使用raw_input 而不是input。据我所知，字符串的拆分似乎工作正常，因为a = 'Mr Moose:Maths' b = 'Mr Moose' print b == a.split(':')[0] 产生 true....
raw_input 不是仅适用于 Python 2 吗？（不是 Python 3？）
哦，是的，因为你没有指定你使用的python版本我在Python 2.7中尝试过，在这种情况下你必须使用raw_input