【问题标题】:How to put a newline for every n'th sentence?如何为每第 n 个句子换行?
【发布时间】:2020-07-13 16:30:21
【问题描述】:

苦苦思索如何为长文本字符串中的每 5 个句子添加一个新行。

输入示例

text = 'The puppy is cute. Summer is great. Happy Friday. Sentence4. Sentence5. Sentence6. Sentence7.

期望的输出:

The puppy is cute. Summer is great. Happy friday. Sentence4. Sentence5.
Sentence6. Sentence7.

有人可以帮忙吗?

【问题讨论】:

  • 句子是如何定义的?您是否允许输入诸如先生或其他缩写?还是椭圆?等等你有没有尝试过?
  • 句子以大写字母开头,以句号结尾。所以句号可以用来区分句子
  • 您的示例包含第一个字母的小写。这是否意味着您希望它不使用大写字母来区分?
  • 很高兴使用大写字母。我将用大写字母更新上面的示例。

标签: python string newline


【解决方案1】:

试试这个:

text = 'The puppy is cute. Summer is great. Happy friday. sentence4. sentence5. sentence6. sentence7.'
splittext = text.split(".")
for x in range(5, len(splittext), 5):
    splittext[x] = "\n"+splittext[x].lstrip()
text = ".".join(splittext)
print(text)

【讨论】:

  • 感谢您的功能!
【解决方案2】:

使用正则表达式。在“[not .] 后跟 .”的 5 次匹配后添加 \n。

import re
text = 'The puppy is cute. Summer is great. Happy friday. sentence4. sentence5. sentence6. sentence7.'

print(re.sub(r'((?:[^.]+\.\s*){5})',r'\1\n',text))

一个更高级的正则表达式句子匹配器,通过匹配结束标点符号来处理缩写和其他标点符号。
参考:https://mikedombrowski.com/2017/04/regex-sentence-splitter/
注意:仍然存在失败的边缘情况,例如 T.V. 后跟 Mr. 需要双空格来表示单独的句子。带有句子的引用将被拆分。等等

import re
sentence_regex = r'((.*?([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)){5})'
text = 'The puppy is cute. Watch T.V.  Mr. Summers is great. Say "my name."  My name is.  Or not... Happy friday? Sentence4. Sentence5. Sentence6. Sentence7.'
text += " " + text

print(re.sub(sentence_regex,r'\1\n',text))

任何比这更复杂的东西,您都可能想研究语言处理工具包。

【讨论】:

    【解决方案3】:

    这是一个在第 5 句末尾添加换行符的简单函数

    def new_line(sentence: str):
        # characters that mark the end of a sentence
        end_of_sentence_markers = ['.', '!', '?', '...']
        # after n sentences insert new_line
        n = 5
    
        # keeps track 
        count = 0
        # final string as list for efficiency
        final_str = []
        # split at space
        sentence_split = sentence.split(' ')
    
        # traverse the sentence split
        for word in sentence_split:
            # if end of sentence is present then increase count
            if word[-1] in end_of_sentence_markers:
                count += 1
            # if count is equal to n then add newline otherwise add space
            if count == n:
                final_str.append(word + '\n')
                count = 0
            else:
                final_str.append(word + ' ')
    
    
        # return the string version of the list
        return ''.join(final_str)
    

    这是修改后的版本:

    def new_line_better(sentence: str, n: int):
        # final string as list for efficiency
        final_str = []
        # split at period and remove extra spaces
        sentence_split = list( map( lambda x : x.strip(),  sentence.split('.') ) )
        # pop off last space
        sentence_split.pop()
        
        # keeps track 
        count = 0
        # traverse the sentences
        for sentence in sentence_split:
            count += 1
            if count == n:
                count = 0
                final_str.append(sentence+'.\n')
            else:
                final_str.append(sentence+'. ')
    
        # return the string version of the list
        return ''.join(final_str)
    

    【讨论】:

      【解决方案4】:

      另一种方法:

      text = 'The puppy is cute. Summer is great. Happy friday. sentence4. sentence5. sentence6. sentence7.'
      
      out = ''
      for i, e in enumerate(text.split(".")):
          if (i > 0)  & (i % 5 == 0):
              out = out + '\n'
          out = out + e + '.'
      out
      

      结果:

      'The puppy is cute. Summer is great. Happy friday. sentence4. sentence5.\n sentence6. sentence7..'
      

      【讨论】:

        【解决方案5】:

        使用列表理解

        text = 'The puppy is cute. Summer is great. Happy friday. sentence4. sentence5. sentence6. sentence7.'
        lines = text.split(".")
        result = ".".join([l if i % 5 else "\n"+l for (i, l) in enumerate(lines)]).lstrip()
        print(result)
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-12-19
          • 2012-09-04
          • 2013-06-04
          • 2021-03-28
          • 1970-01-01
          • 2016-07-08
          • 1970-01-01
          • 2021-10-18
          相关资源
          最近更新 更多