删除文本文件中特定位置的换行符答案

【问题标题】：Remove linebreak at specific position in textfile删除文本文件中特定位置的换行符
【发布时间】：2010-03-18 08:54:47
【问题描述】：

我有一个大文本文件，由于控制台宽度，它在第 80 列有换行符。文本文件中的许多行不是 80 个字符长，并且不受换行符的影响。在伪代码中，这就是我想要的：

遍历文件中的行
如果行匹配此正则表达式模式：^(.{80})\n(.+)
- 将此行替换为由 match.group(1) 和 match.group(2) 组成的新字符串。只需从此行中删除换行符即可。
如果行与正则表达式不匹配，请跳过！

也许我不需要正则表达式来执行此操作？

【问题讨论】：

【解决方案1】：

f=open("file")
for line in f:
    if len(line)==81:
       n=f.next()
       line=line.rstrip()+n
    print line.rstrip()
f.close()

【讨论】：

小心，您没有在调用 f.next() 时处理 StopIteration，因此如果最后一行有 81 个字符，此代码将失败。
如果一行很长并且被多次换行，那么每隔一个换行符就会被删除。

【解决方案2】：

这里有一些代码应该可以解决问题

def remove_linebreaks(textfile, position=81):
    """
    textfile : an file opened in 'r' mode
    position : the index on a line at which \n must be removed

    return a string with the \n at position removed
    """
    fixed_lines = []
    for line in textfile:
        if len(line) == position:
            line = line[:position]
        fixed_lines.append(line)
    return ''.join(fixed_lines)

请注意，与您的伪代码相比，这将合并任意数量的连续折叠行。

【讨论】：

【解决方案3】：

考虑一下。

def merge_lines( line_iter ):
    buffer = ''
    for line in line_iter:
        if len(line) <= 80:
            yield buffer + line
            buffer= ''
        else:
            buffer += line[:-1] # remove '\n'

with open('myFile','r') as source:
    with open('copy of myFile','w') as destination:
        for line in merge_lines( source ):
            destination.write(line)

我发现显式生成器函数可以更轻松地测试和调试脚本的基本逻辑，而无需创建模拟文件系统或进行大量花哨的设置和拆卸测试。

【讨论】：

看起来很有趣！会试试这个。
小心，您的代码不能正确处理超过 80 个字符的 2 行连续行。
@gurney alex：谢谢。固定。

【解决方案4】：

这是一个如何使用正则表达式来归档的示例。但是正则表达式并不是所有地方的最佳解决方案，在这种情况下，我认为不使用正则表达式更有效。无论如何，这是解决方案：

text = re.sub(r'(?<=^.{80})\n', '', text)

您还可以在使用可调用对象调用re.sub 时使用您的正则表达式：

text = re.sub(r'^(.{80})\n(.+)', lambda m: m.group(1)+m.group(2), text)

【讨论】：