【问题标题】:Python count words of split sentence?Python计算拆分句子的单词?
【发布时间】:2020-09-11 16:48:53
【问题描述】:

不确定如何删除输出末尾的“\n”

基本上,我有这个 txt 文件,其中包含以下句子:

"What does Bessie say I have done?" I asked.

"Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child 
 taking up her elders in that manner.
 
Be seated somewhere; and until you can speak pleasantly, remain silent."

我设法用分号用代码分割句子:

import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
    low = word.lower()
    re.split(';',low)

但不确定如何计算拆分句子的单词,因为 len() 不起作用: 句子的输出:

['"what does bessie say i have done?" i asked.\n']
['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a 
child taking up her elders in that manner.\n']
['be seated somewhere', ' and until you can speak pleasantly, remain silent."\n']

例如第三句,我想数左边3个字,右边8个字。

感谢阅读!

【问题讨论】:

  • 你不能只用空白分割并得到结果列表的长度吗?
  • 这能回答你的问题吗? Count Words in Python
  • 结帐.splitlines()
  • 正则表达式也有像 \b 和 \w 这样的东西,它们可能会对你有所帮助。您应该举例说明您的目标是作为此类数据的结果。

标签: python python-3.x nlp


【解决方案1】:

字数是空格数加一:

例如 两个空格,三个字:

世界很美好

代码:

import re
import string

lines = []
with open('file.txt', 'r') as f:
    lines = f.readlines()

DELIMETER = ';'
word_count = []
for i, sentence in enumerate(lines):
    # Remove empty sentance
    if not sentence.strip():
        continue
    # Remove punctuation besides our delimiter ';'
    sentence = sentence.translate(str.maketrans('', '', string.punctuation.replace(DELIMETER, '')))
    # Split by our delimeter
    splitted = re.split(DELIMETER, sentence)
    # The number of words is the number of spaces plus one
    word_count.append([1 + x.strip().count(' ') for x in splitted])

# [[9], [7, 9], [7], [3, 8]]
print(word_count)

【讨论】:

    【解决方案2】:

    使用str.rstrip('\n') 删除每个句子末尾的\n

    要统计一个句子中的单词,可以使用len(sentence.split(' '))

    要将句子列表转换为计数列表,您可以使用map 函数。

    原来是这样:

    import re
    
    with open("testing.txt") as file:
        for i, line in enumerate(file.readlines()):
            # Ignore empty lines
            if line.strip(' ') != '\n':
                line = line.lower()
                # Split by semicolons
                parts = re.split(';', line)
                print("SENTENCES:", parts)
                counts = list(map(lambda part: len(part.split()), parts))
                print("COUNTS:", counts)
    

    输出

    SENTENCES: ['"what does bessie say i have done?" i asked.']
    COUNTS: [9]
    SENTENCES: ['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a child ']
    COUNTS: [7, 9]
    SENTENCES: [' taking up her elders in that manner.']
    COUNTS: [7]
    SENTENCES: ['be seated somewhere', ' and until you can speak pleasantly, remain silent."']
    COUNTS: [3, 8]
    

    【讨论】:

      【解决方案3】:

      你需要图书馆 nltk

      from nltk import sent_tokenize, word_tokenize
      
      mytext = """I have a dog. 
      The dog is called Bob."""
      
      for sent in sent_tokenize(mytext): 
          print(len(word_tokenize(sent)))
      

      输出

      5
      6
      

      分步说明:

      for sent in sent_tokenize(mytext): 
          print('Sentence >>>',sent) 
          print('List of words >>>',word_tokenize(sent)) 
          print('Count words per sentence>>>', len(word_tokenize(sent))) 
      

      输出:

      Sentence >>> I have a dog.
      List of words >>> ['I', 'have', 'a', 'dog', '.']
      Count words per sentence>>> 5
      Sentence >>> The dog is called Bob.
      List of words >>> ['The', 'dog', 'is', 'called', 'Bob', '.']
      Count words per sentence>>> 6
      

      【讨论】:

        【解决方案4】:

        `

        import re
        sentences = []                                                   #empty list for storing result
        with open('testtext.txt') as fileObj:
            lines = [line.strip() for line in fileObj if line.strip()]   #makin list of lines allready striped from '\n's
        for line in lines:
            sentences += re.split(';', line)                             #spliting lines by ';' and store result in sentences
        for sentence in sentences:
            print(sentence +' ' + str(len(sentence.split())))            #out
        

        【讨论】:

          【解决方案5】:

          试试这个:

          import re
            with open("testing.txt") as file:
            read_file = file.readlines()
            for i, word in enumerate(read_file):
            low = word.lower()
            low = low.strip()
            low = low.replace('\n', '')
            re.split(';',low)
          

          【讨论】:

          • 为什么strip 两次然后然后 删除\n?此外,re.split 的结果不会去任何地方。
          猜你喜欢
          • 2018-01-21
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多