【问题标题】:Splitting textfile into section with special delimiter line - python使用特殊分隔线将文本文件拆分为部分 - python
【发布时间】:2014-10-03 07:30:50
【问题描述】:

我有一个这样的输入文件:

This is a text block start
This is the end

And this is another
with more than one line
and another line.

所需的任务是按由某些特殊行分隔的部分读取文件,在这种情况下,它是一个空行,例如[出]:

[['This is a text block start', 'This is the end'],
['And this is another','with more than one line', 'and another line.']]

这样做我得到了想要的输出:

def per_section(it):
    """ Read a file and yield sections using empty line as delimiter """
    section = []
    for line in it:
        if line.strip('\n'):
            section.append(line)
        else:
            yield ''.join(section)
            section = []
    # yield any remaining lines as a section too
    if section:
        yield ''.join(section)

但如果特殊行是以#开头的行,例如:

# Some comments, maybe the title of the following section
This is a text block start
This is the end
# Some other comments and also the title
And this is another
with more than one line
and another line.

我必须这样做:

def per_section(it):
    """ Read a file and yield sections using empty line as delimiter """
    section = []
    for line in it:
        if line[0] != "#":
            section.append(line)
        else:
            yield ''.join(section)
            section = []
    # yield any remaining lines as a section too
    if section:
        yield ''.join(section)

如果我允许per_section() 有一个分隔符参数,我可以试试这个:

def per_section(it, delimiter== '\n'):
    """ Read a file and yield sections using empty line as delimiter """
    section = []
    for line in it:
        if line.strip('\n') and delimiter == '\n':
            section.append(line)
        elif delimiter= '\#' and line[0] != "#":
            section.append(line)
        else:
            yield ''.join(section)
            section = []
    # yield any remaining lines as a section too
    if section:
        yield ''.join(section)

但是有没有办法让我不对所有可能的分隔符进行硬编码?

【问题讨论】:

  • 为什么不直接作为参数传入而不是硬编码?
  • 顺便说一句,@falsetru 的 per_section() 已添加到 github.com/alvations/lazyme =)

标签: python file delimiter yield


【解决方案1】:

传递一个谓词怎么样?

def per_section(it, is_delimiter=lambda x: x.isspace()):
    ret = []
    for line in it:
        if is_delimiter(line):
            if ret:
                yield ret  # OR  ''.join(ret)
                ret = []
        else:
            ret.append(line.rstrip())  # OR  ret.append(line)
    if ret:
        yield ret

用法:

with open('/path/to/file.txt') as f:
    sections = list(per_section(f))  # default delimiter

with open('/path/to/file.txt.txt') as f:
    sections = list(per_section(f, lambda line: line.startswith('#'))) # comment

【讨论】:

    【解决方案2】:

    只需这样做:

    with open('yorfileaname.txt') as f: #open desired file
        data = f.read() #read the whole file and save to variable data
        print(*(data.split('=========='))) #now split data when "=.." and print it 
        #usually it would ouput a list but if you use * it will print as string
    

    输出:

    content content
    more content
    content conclusion
    
    content again
    more of it
    content conclusion
    
    content
    content
    contend done
    

    【讨论】:

    • 通常最好对代码的作用进行解释。这使新开发人员能够了解 coed 的工作原理。
    • 你是对的,所以我在我的代码中将所有内容都解释为 cmets。
    • 如果文件很大,请不要这样做。 stackoverflow.com/questions/25189262/…
    【解决方案3】:

    这样的事情怎么样?

    from itertools import groupby
    
    def per_section(s, delimiters=()):
        def key(s):
            return not s or s.isspace() or any(s.startswith(x) for x in delimiters)
        for k, g in groupby(s.splitlines(), key=key):
            if not k:
                yield list(g)
    
    
    if __name__ == '__main__':
        print list(per_section('''This is a text block start
    This is the end
    
    And this is another
    with more than one line
    and another line.'''))
    
        print list(per_section('''# Some comments, maybe the title of the following section
    This is a text block start
    This is the end
    # Some other comments and also the title
    And this is another
    with more than one line
    and another line.''', ('#')))
    
    print list(per_section('''!! Some comments, maybe the title of the following section
    This is a text block start
    This is the end
    $$ Some other comments and also the title
    And this is another
    with more than one line
    and another line.''', ('!', '$')))    
    

    输出:

    [['This is a text block start', 'This is the end'], ['And this is another', 'with more than one line', 'and another line.']]
    [['This is a text block start', 'This is the end'], ['And this is another', 'with more than one line', 'and another line.']]
    [['This is a text block start', 'This is the end'], ['And this is another', 'with more than one line', 'and another line.']]
    

    【讨论】:

      猜你喜欢
      • 2017-05-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-08-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多