【问题标题】:Get substring from string using culture-sensitive comparison使用文化敏感比较从字符串中获取子字符串
【发布时间】:2016-02-18 15:31:53
【问题描述】:

有没有办法使用文化敏感的相等比较从字符串中获取匹配的子字符串?例如,在 en-US 文化下,æae 被视为相等。 "Encyclopædia".IndexOf("aed") 计算结果为 8,表示匹配;但是,有没有一种方法可以提取匹配的子字符串æd,而不涉及迭代源字符串?请注意,查找的子串和匹配的子串的长度可以相差几个字符。

【问题讨论】:

标签: c# .net string


【解决方案1】:

我最终解决了这个问题,首先调用IndexOf 来获取匹配的起始位置,然后反复尝试识别它的长度。我针对匹配的热路径进行了优化,该路径与指定的子字符串具有相同的长度;在这种情况下,只执行一次比较。

public static class StringExtensions
{
    public static void Find(this string source, string substring, StringComparison comparisonType, out int matchIndex, out int matchLength)
    {
        Find(source, substring, 0, source.Length, comparisonType, out matchIndex, out matchLength);
    }

    public static void Find(this string source, string substring, int searchIndex, StringComparison comparisonType, out int matchIndex, out int matchLength)
    {
        Find(source, substring, searchIndex, source.Length - searchIndex, comparisonType, out matchIndex, out matchLength);
    }

    public static void Find(this string source, string substring, int searchIndex, int searchLength, StringComparison comparisonType, out int matchIndex, out int matchLength)
    {
        matchIndex = source.IndexOf(substring, searchIndex, searchLength, comparisonType);
        if (matchIndex == -1)
        {
            matchLength = -1;
            return;
        }

        matchLength = FindMatchLength(source, substring, searchIndex, searchLength, comparisonType, matchIndex);

        // Defensive programming, but should never happen
        if (matchLength == -1)
            matchIndex = -1;
    }

    private static int FindMatchLength(string source, string substring, int searchIndex, int searchLength, StringComparison comparisonType, int matchIndex)
    {
        int matchLengthMaximum = searchLength - (matchIndex - searchIndex);
        int matchLengthInitial = Math.Min(substring.Length, matchLengthMaximum);

        // Hot path: match length is same as substring length.
        if (Compare(source, matchIndex, matchLengthInitial, substring, 0, substring.Length, comparisonType) == 0)
            return matchLengthInitial;

        int matchLengthDecrementing = matchLengthInitial - 1;
        int matchLengthIncrementing = matchLengthInitial + 1;

        while (matchLengthDecrementing >= 0 || matchLengthIncrementing <= matchLengthMaximum)
        {
            if (matchLengthDecrementing >= 0)
            {
                if (Compare(source, matchIndex, matchLengthDecrementing, substring, 0, substring.Length, comparisonType) == 0)
                    return matchLengthDecrementing;

                matchLengthDecrementing--;
            }

            if (matchLengthIncrementing <= matchLengthMaximum)
            {
                if (Compare(source, matchIndex, matchLengthIncrementing, substring, 0, substring.Length, comparisonType) == 0)
                    return matchLengthIncrementing;

                matchLengthIncrementing++;
            }
        }

        // Should never happen
        return -1;
    }

    private static int Compare(string strA, int indexA, int lengthA, string strB, int indexB, int lengthB, StringComparison comparisonType)
    {
        switch (comparisonType)
        {
            case StringComparison.CurrentCulture:
                return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.None);

            case StringComparison.CurrentCultureIgnoreCase:
                return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.IgnoreCase);

            case StringComparison.InvariantCulture:
                return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.None);

            case StringComparison.InvariantCultureIgnoreCase:
                return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.IgnoreCase);

            case StringComparison.Ordinal:
                return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.Ordinal);

            case StringComparison.OrdinalIgnoreCase:
                return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.OrdinalIgnoreCase);

            default:
                throw new ArgumentException("The string comparison type passed in is currently not supported.", nameof(comparisonType));
        }
    }
}

使用示例:

int index, length;
source.Find(remove, StringComparison.CurrentCulture, out index, out length);
string clean = index < 0 ? source : source.Remove(index, length);

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-08-23
    • 2015-08-06
    • 1970-01-01
    • 2013-02-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多