Python 3中字符串中每个句子的大写答案

【问题标题】：Capitalization of each sentence in a string in Python 3Python 3中字符串中每个句子的大写
【发布时间】：2014-12-06 21:06:03
【问题描述】：

这应该很容易，但不知何故我不太明白。

我的任务是：

编写一个函数sentenceCapitalizer，它有一个字符串类型的参数。该函数返回一个每个句子的第一个字符大写的字符串的副本。该函数应该返回 “你好。我的名字是乔。你叫什么名字？”如果函数的参数是“hello.我的名字是乔。你叫什么名字？”假设一个句子由句点和空格分隔。”

到目前为止我所拥有的是：

def sentenceCapitalizer (string1: str):
    words = string1.split(". ")
    words2=words.capitalize()
    string2=words2.join()
    return (string2)

print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))

执行时出现错误：

Traceback (most recent call last):
  File "C:\Users\Andrew\Desktop\lab3.py", line 83, in <module>
    print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
  File "C:\Users\Andrew\Desktop\lab3.py", line 79, in sentenceCapitalizer
    words2=words.capitalize()
AttributeError: 'list' object has no attribute 'capitalize'"

这告诉我什么，我该如何解决这个问题？我尝试按照在列为 python 软件基础的页面上找到的说明进行操作，所以我想我会拥有这个。

【问题讨论】：

注意：Python 3.5 尚未发布（仍在开发中），因此您声称使用 3.5.5 令人惊讶；你可能有一个 不同的 版本的 Python，而不是使用时间机器。

标签： python string capitalize

【解决方案1】：

你试图在错误的对象上使用字符串方法； words 是列表对象包含字符串。改为在每个单独的元素上使用该方法：

words2 = [word.capitalize() for word in words]

但这将应用错误转换；您不想将整个句子大写，而只是首字母。 str.capitalize() 会将其他所有内容都小写，包括Joe 中的J：

>>> 'my name is Joe'.capitalize()
'My name is joe'

将自己限制在仅第一个字母，然后将字符串的其余部分原样加回：

words2 = [word[0].capitalize() + word[1:] for word in words]

接下来，列表对象也没有.join() 方法；这也是一个字符串方法：

string2 = '. '.join(words2)

这会将words2 中的字符串与'. '（句号和空格）连接符连接起来。

您可能希望在这里使用更好的变量名；您的字符串是句子，而不是单词，因此您的代码可以更好地反映这一点。

共同构成你的功能：

def sentenceCapitalizer (string1: str):
    sentences = string1.split(". ")
    sentences2 = [sentence[0].capitalize() + sentence[1:] for sentence in sentences]
    string2 = '. '.join(sentences2)
    return string2

演示：

>>> def sentenceCapitalizer (string1: str):
...     sentences = string1.split(". ")
...     sentences2 = [sentence[0].capitalize() + sentence[1:] for sentence in sentences]
...     string2 = '. '.join(sentences2)
...     return string2
... 
>>> print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
Hello. My name is Joe. What is your name?

【讨论】：

他问的是每个句子的第一个字符大写，而不是每个单词。编辑：太好了，你更新了这个问题。
@danijar：不要被变量名弄糊涂了，拆分在'. '。
但是.capitalize() 会破坏单词的大写，所以Joe 会变成joe，而' '.join 会破坏原来的句号。
@DSM：我试图解决最初的问题first；整个作业问题的解决有点远，你不觉得吗？ :-)
我当然不会扔石头，因为我通常只是在最后说“你可能还想检查 X”，但是我非常懒惰..

【解决方案2】：

这可以完成工作。由于它会提取所有句子，包括它们的尾随空格，因此如果您有多个段落（句子之间有换行符），这也适用。

import re

def sentence_case(text):
    # Split into sentences. Therefore, find all text that ends
    # with punctuation followed by white space or end of string.
    sentences = re.findall('[^.!?]+[.!?](?:\s|\Z)', text)

    # Capitalize the first letter of each sentence
    sentences = [x[0].upper() + x[1:] for x in sentences]

    # Combine sentences
    return ''.join(sentences)

这是working example。

【讨论】：

【解决方案3】：

在点之后允许任意空格。或者要将完整的单词大写（它可能会对 Unicode 文本产生影响），您可以 use regular expressions -- re module:

#!/usr/bin/env python3
import re

def sentenceCapitalizer(text):
    return re.sub(r"(\.\s+|^)(\w+)",
                  lambda m: m.group(1) + m.group(2).capitalize(),
                  text)

s = "hEllo. my name is Joe. what is your name?"
print(sentenceCapitalizer(s))
# -> 'Hello. My name is Joe. What is your name?'

注意：pep8 建议函数使用小写名称，例如 capitalize_sentence() 而不是 sentenceCapitalizer()。

要接受更多种类的文本，您可以use nltk package：

# $ pip install nltk
from nltk.tokenize import sent_tokenize, word_tokenize 

def sent_capitalize(sentence):
    """Capitalize the first word in the *sentence*."""
    words = word_tokenize(sentence)
    if words:
       words[0] = words[0].capitalize()
    return " ".join(words[:-1]) + "".join(words[-1:]) # dot

text = "hEllo. my name is Joe. what is your name?"
# split the text into a list of sentences
sentences = sent_tokenize(text)
print(" ".join(map(sent_capitalize, sentences)))
# -> Hello. My name is Joe. What is your name?

【讨论】：

【解决方案4】：

我没有使用“拆分”，而是使用了 while 循环。这是我的代码。

my_string = input('Enter a string: ')
new_string = ''
new_string += my_string[0].upper()
i = 1

while i < len(my_string)-2:
    new_string += my_string[i]
    if my_string[i] == '.' or my_string[i] == '?' or my_string[i] == '!':
        new_string += ' '
        new_string += my_string[i+2].upper()
        i = i+3
    else:
        if i == len(my_string)-3:
            new_string += my_string[len(my_string)-2:len(my_string)]
        i = i+1

print(new_string)

这是它的工作原理：

Enter a string: hello. my name is Joe. what is your name?
Hello. My name is Joe. What is your name

【讨论】：

【解决方案5】：

只是因为我在这里找不到这个解决方案。

您可以使用 nltk 中的“sent_tokenize”方法。

import nltk
string = "hello. my name is Joe. what is your name?"
sentences = nltk.sent_tokenize(string)
print (' '.join([s.replace(s[0],s[0].capitalize(),1) for s in sentences]) )

还有输出

Hello. My name is Joe. What is your name?

【讨论】：

【解决方案6】：

try:
    import textwrap
except ImportError:
    print("textwrap library module error")
try:
    import re
except ImportError:
    print("re library module errror")

txt = "what ever you want. this will format it nicely. it makes me happy"

txt = '.'.join(map(lambda s: s.strip().capitalize(), txt.split('. ')))

user = "Joe"
prefix = user + ":\t"
preferredWidth = 79

wrapper = textwrap.TextWrapper(initial_indent=prefix, 
width=preferredWidth, subsequent_indent=' ' * len(prefix) + " ")

print(wrapper.fill(txt))

我尝试使用尽可能少的互联网相关功能。我发现这对我有用，希望这对某人有用

【讨论】：