【问题标题】:Splitting text file into sentences [duplicate]将文本文件拆分为句子[重复]
【发布时间】:2017-10-08 15:31:55
【问题描述】:

我正在尝试将文本文件拆分为句子,使用标点符号作为分隔符。到目前为止我的代码有效,但分隔符正在单独打印在一行上。如何让标点和句子保持一致?

import re
string = ""
with open("text.txt") as file:
    for line in file:
        for l in re.split(r"(\. |\? |\! )",line):
            string += l + "\n"
print(string)

示例输出:

This is the flag of the Prooshi — ous, the Cap and Soracer
. 
This is the bullet that byng the flag of the Prooshious
. 
This is the ffrinch that fire on the Bull that bang the flag of the Prooshious
.

【问题讨论】:

    标签: python-3.x


    【解决方案1】:

    实际上很简单,在每次迭代中添加 \n(换行符),因此,例如,您拆分 Kek. 它将添加到字符串变量 Kek\n,然后添加到 .\n。 你需要做这样的事情:

    with open("text.txt") as file:
    for line in file:
        for l in re.split(r"(\. |\? |\! )",line):
            string += l
        string += '\n'
    

    【讨论】:

      猜你喜欢
      • 2011-11-03
      • 1970-01-01
      • 2014-02-11
      • 2013-04-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-04-18
      • 1970-01-01
      相关资源
      最近更新 更多