第一关
我的第一个想法如下。它适用于给定的输入。
输入:
string textIn = @"first sentence. second sentence. third sentence.";
在一行代码中拆分句点、修剪、应用句子大小写和重组:
string textOut = string.Join(". ",
textIn.Split(new char[] { '.' })
.Select(x => x.Trim())
.Where(x => x.Length > 0)
.Select(x => x.Substring(0,1).ToUpper() + x.Substring(1)))
+ ".";
Console.WriteLine(textOut);
输出:
First sentence. Second sentence. Third sentence.
string.split() 函数将字符串拆分为四个组件,最后一个组件为空。这就是为什么我测试每个组件的长度并跳过空的。你可能想也可能不想这样做。
第二遍
然后我看到上面关于缩写的评论,所以我用这个字符串尝试了。
string text = @"first sentence. i was born in the U.S.A. third sentence.";
输出是这样的
First sentence. I was born in the U. S. A. Third sentence.
看起来仍然不错,但是在美国的前两个时期之后插入了空格。这可能会也可能不会接受。
string.split() 函数将字符串拆分为一个或多个单个字符。
我还尝试使用正则表达式将字符串拆分为句点、问号和感叹号,后跟一个或多个空格字符。
string textIn = @"first sentence. i was born in the U.S.A. third sentence.";
string[] sentences = Regex.Split(textIn, @"\.\s+");
string textOut = string.Join(". ",
sentences
.Select(x => x.Trim())
.Where(x => x.Length > 0)
.Select(x => x.Substring(0, 1).ToUpper() + x.Substring(1)))
+ ".";
Console.WriteLine(textOut);
输出
First sentence. I was born in the U.S.A. Third sentence.
正则表达式正确地忽略了 U.S.A. 中的前两个句点(因为它们后面没有空格)并且出于同样的原因不匹配最后一个句点。
第三遍
另一种选择是使用finite state machine。
此示例说明如何将句子的第一个字符大写,其中句子定义为字符串,后跟句号、问号或感叹号,后跟空格。
在此示例中,状态由整数“状态”表示。
定义了三种状态:
0 = 搜索下一个句子的第一个字符。
1 = 搜索下一个分隔符。
2 = 检查分隔符后的第一个字符以确定我们是否到达了句子的结尾。
有两种操作:(1) 将当前字符大写,(2) 将当前字符附加到输出中。
字符串是逐个字符解析的。执行的动作和状态转换由当前状态和当前角色决定。
static string CapitalizeSentencesInString(string textIn) {
string textOut = "";
// Delimiters: dot, hook, & bang.
char[] delimiters = new char[] { '.', '?', '!' };
int state = 0;
foreach(char ch in textIn) {
switch (state) {
// Searching for first character of the next sentence.
case 0:
// Space character.
if (ch == ' ') {
// Action: append character to output.
textOut += ch;
// Next state: keep searching for the first
// character of the next sentence (i.e., do
// not change state).
// Dot, hook, or bang.
} else if (delimiters.Contains(ch)) {
// Action: append to output.
textOut += ch;
// Next state: check next character.
state = 2;
// Upper case character.
} else if (char.IsUpper(ch)) {
// Action: append to output.
textOut += ch;
// Next state: search for the end of the
// current sentence.
state = 1;
// Lower case character.
} else if (char.IsLower(ch)) {
// Action: convert to upper case and append to
// output.
textOut += char.ToUpper(ch);
// Next state: search for the end of the
// current sentence.
state = 1;
// Default option.
} else {
// Action: append to output.
textOut += ch;
// Next state: search for the end of the
// current sentence.
state = 1;
}
break;
// Searching for next delimiter.
case 1:
// Dot, hook, or bang.
if (delimiters.Contains(ch)) {
// Action: append to output.
textOut += ch;
// Next state: check next character.
state = 2;
} else {
// Action: append to output.
textOut += ch;
// Next state: keep searching for the next
// delimiter. (i.e., do not change state).
}
break;
// Previous character was a delimiter. This character
// determines whether we have reached the end of the sentence.
case 2:
// Space. We have reached the end of the sentence.
if (ch == ' ') {
// Action: append to output.
textOut += ch;
// Next state: search for the first character of
// the next sentence.
state = 0;
// Dot, hook, or bang.
} else if (delimiters.Contains(ch)) {
// Action: append to output.
textOut += ch;
// Next state: check next character
// (i.e., do not change state).
// Not a space. We have not reached the end of
// the sentence.
} else {
// Action: append to output.
textOut += ch;
// Next state: search for the next delimiter.
state = 1;
}
break;
}
}
return textOut;
}
输入:
string textIn = @"first sentence! can i also a handle ellipses...? i was born in the U.S.A. third sentence.";
输出:
First sentence! Can i also a handle ellipses...? I was born in the U.S.A. Third sentence.
看起来不错,但是这个呢?
string text = @"i was born in the U.K. but I live in the U.S.A.. Third sentence."
输出:
First sentence! i was born in the U.K. But I live in the U.S.A.. Third sentence.
天啊! 'but' 中的 'b' 大写!
结论
这是一个重要的问题。您可以尝试几种不同的方法。你的成功将取决于你对“句末”的定义有多严格,以及你能否将它与句中句号的其他用法区分开来。您可以很轻松地处理绝大多数情况,但总是有可能出现意外的边缘情况。