正则表达式匹配前两次出现的大写字母，后跟几个小写字母答案

【问题标题】：Regex match first two occurrences of a capital letter followed by several lower case正则表达式匹配前两次出现的大写字母，后跟几个小写字母
【发布时间】：2014-04-04 17:03:28
【问题描述】：

我一直在查看此处的示例以了解如何进行类似的正则表达式匹配，但我无法让它适用于我的情况。

我有一个类似ThisisMystringItsTooLong 的字符串，我想找回ThiMys（前两次出现大写字母，后跟两个小写字母）但是，如果字符串只是 Thisismystring（只有一个大写字母），那么我只想
返回Thi。

我已经尝试([A-Z]{1})([a-z]{2}){0,1} 来获取我的匹配项的第一次出现，如果有超过 2 个大写字母，但我不确定如何应用第二个条件。

【问题讨论】：

你不能把第一个和第二个匹配连接起来吗？否则，您不能真正使用正则表达式跳过字符。
@MattBurland 是的，我想我可以走那条路，我不经常使用正则表达式，所以我很好奇是否可以按照我提到的方式处理多个场景。不错的选择。不知道你跳过字符是什么意思，我更想按照子字符串的方式做一些事情
@JWiley，我为你创建了一个函数

标签： c# regex

【解决方案1】：

你不能仅仅用正则表达式来做到这一点，因为匹配总是输入的连续子字符串。您当然可以将多个匹配项组合成一个最终结果。

String.Join(String.Empty, Regex.Matches(input, "[A-Z][a-z]{2}")
                               .Cast<Match>()
                               .Take(2)
                               .Select(match => match.Value));

【讨论】：

你能解释一下第二个 Join 参数的后半部分是如何工作的吗？
它只获取[A-Z][a-z]{2} 的所有匹配项，出于技术原因对生成的匹配集合进行强制转换（MatchCollection 仅实现 IEnumerable 而不是IEnumerable<Match>），最多需要前两个匹配项，获取每个匹配项的匹配文本，最后通过将它们与空字符串连接起来。

【解决方案2】：

我会简单地使用正则表达式模式[A-Z][a-z]{2} 并“手动”执行其他逻辑。

public string ShortIdentifier(string longIdentifier)
{
    MatchCollection matches = Regex.Matches(longIdentifier, "[A-Z][a-z]{2}");
    if (matches.Count == 1) {
        return matches[0].Value;
    } else if (matches.Count >= 2) {
        return matches[0].Value + matches[1].Value;
    }
    return longIdentifier.Substring(0, Math.Min(longIdentifier.Length, 6));
    // Or return whatever you want when there is no match.
}

如果您想返回一个大写字母后跟一个或两个小写字母，请将 Regex 更改为 [A-Z][a-z]{1,2}。

【讨论】：

我正在尝试实施您的解决方案，但我看到它跳过了像 HelloThisIsString1 这样的匹配。正则表达式只查找一个大写字母后跟 2 个小写字母。我想要HelThi返回的匹配，但是这个方法会返回HelloTh。
你一定有错别字。我收到"HelThi"。第一场比赛是"Hel"，第二场比赛是"Thi"，还有第三场未使用的"Str"。
啊，是的，我添加了一些东西来解释后面只有一个小写字母的单词，{1,2}，这似乎迫使它匹配那个确切的长度。有没有办法检查任何数字，但只返回前 2 个？
[A-Z][a-z]{1,2} 应该可以工作。它将匹配尽可能多的小写字母，但最多匹配两个，即使单词更长。（已测试）。对于“HiThere”，它将返回“HiThe”，对于“JonnyIsHere”，它将返回“JonIs”。

【解决方案3】：

你可以像这样创建一个方法：

public string GetMyCharacters(string s)
        {
            int numOfCaps = Regex.Matches(s, "[A-Z]").Count;
            if (numOfCaps > 2)
            {
                var matches = Regex.Matches(s, "[A-Z][a-z]{2}");
                return matches[0].Value + matches[1].Value;
            }
            else if (numOfCaps == 1)
            {
                var matches = Regex.Matches(s, "[A-Z][a-z]{2}");
                return matches[0].Value;
            }
            else { return null; }
        }

然后这样称呼它：

Console.WriteLine(GetMyCharacters("ThisisMystringItsTooLong")); // ThiMys
Console.WriteLine(GetMyCharacters("Thisismystring")); // Thi
Console.WriteLine(GetMyCharacters("wijfowro"));// null

【讨论】：

【解决方案4】：

我最初误解了要求，但这里是固定版本：

Regex.Replace(
    "ThisisMystringItsTooLong",
    "^(?:.*?([A-Z][a-z]{2}))?(?:.*?([A-Z][a-z]{2}))?.*$",
    "$1$2"
)

它匹配整个输入字符串，从开始（^）到结束（$），它被分割成：

(?:.*?([A-Z][a-z]{2}))? - optional non-capturing group, which consists of
                          a bunch of non-greedy anything followed
                          by substring sought, which is captured
(?:.*?([A-Z][a-z]{2}))? - another exactly same group; if we want to place
                          some limits on what can be between substrings
                          sought (like no spaces etc.) it goes here
                          instead of the anything
?.*                     - anything else

然后它通过使用 Regex.Replace 方法连接两个（可能是空的）匹配来构造输出字符串。测试：

"ThisisMystringItsTooLong" -> "ThiMys"
"Thisismystring"           -> "Thi"
"thisismystring"           -> ""
"that is His String"       -> "HisStr"
"oh Hi There!"             -> "The"
"oh Hi There Go Here"      -> "TheHer"

与 Danies 的回答不同，除了正则表达式之外不使用任何东西，但不确定它的性能是好是坏。

【讨论】：

【解决方案5】：

试试http://regex101.com/r/pU4aB5

([A-Z]{1}[a-z]{2})[a-z]*([A-Z]{1}[a-z]{2})?

然后您需要连接两个捕获组以获得最终结果。

【讨论】：