【问题标题】:Safe truncate string contains color tag安全截断字符串包含颜色标签
【发布时间】:2021-08-16 13:33:18
【问题描述】:

我有一个包含颜色标签的字符串。

var myString = "My name is <color=#FF00EE>ABCDE</color> and I love <color=#FFEE00>music</color>";

我的字符串变成“我的名字是 ABCDE*(粉红色)*,我喜欢音乐*(黄色)*”

如果字符串达到最大长度但仍保留颜色标签,我想截断

var myTruncateString = "My name is <color=#FF00EE>ABCDE</color> and I love <color=#FFEE00>mu</color>";

我的字符串变成“我的名字是 ABCDE*(pink)* and I love mu*(yellow)*”

你有什么建议吗?

var stringWithoutFormat = String.Copy(myString);
stringWithoutFormat = Regex.Replace(stringWithoutFormat, "<color.*?>|</color>", "");

var maxLength = 20;
if (stringWithoutFormat.Length > maxLength)
{
    // What should I do next?
}

【问题讨论】:

  • 那么你到底想要什么?你只是想限制字符数吗?那么原因: int max = 300; var myTruncateString = mystring[..max];
  • @Foitn 我想截断我的字符串但仍保留颜色标签
  • 没那么容易,有效!我会首先检查字符串长度。如果它太长,那么从末尾搜索任何&lt;color&gt; 标签。如果找到,则截断其内容或在需要时将其完全删除。如果字符串没有以颜色标签结束,那么检查它的结束位置,看看我们是否可以截断它后面的文本,或者我们是否还必须截断它的内容。
  • 您可能必须解码 XML 结构,检查解码标签中值的总长度并在需要的地方截断(最终删除整个标签或其子标签)......顺便说一句,你想要什么获取整个“音乐”字是否超出最大长度?
  • “如果字符串达到最大长度但仍保留颜色标签,我想截断”最大长度也计入 标签?

标签: c# regex


【解决方案1】:

这是一个相对简单且不是的错误处理示例,我认为您正在尝试完成:

  • 检查最大长度时不要计算颜色标签
  • 从末尾删除字符,不要破坏颜色标签
  • 如果您最终得到的颜色标签之间没有文字,请删除这些标签

注意:此代码未经彻底测试。随意使用它来做任何你想做的事情,但我会在这里写很多的单元测试。我特别害怕会导致无限循环的极端情况的存在。

public static string Shorten(string input, int requiredLength)
{
    var tokens = Tokenize(input).ToList();
    int current = tokens.Count - 1;
    
    // assumption: color tags doesn't contribute to *visible* length
    var totalLength = tokens.Where(t => t.Length == 1).Count();
    
    while (totalLength > requiredLength && current >= 0)
    {
        // infinite-loop detection
        if (lastCurrent == current && lastTotalLength == totalLength)
            throw new InvalidOperationException("Infinite loop detected");
        lastCurrent = current;
        lastTotalLength = totalLength;

        if (tokens[current].Length > 1)
        {
            if (current == 0)
                return "";
            
            if (tokens[current].StartsWith("</") && tokens[current - 1].StartsWith("<c"))
            {
                // Remove a <color></color> pair with no text between
                tokens.RemoveAt(current);
                tokens.RemoveAt(current - 1);
                current -= 2;
                
                // Since color tags doesn't contribute to length, don't adjust totalLength
                continue;
            }
            
            // Remove one character from inside the color tags
            tokens.RemoveAt(current - 1);
            current--;
            totalLength--;
        }
        else
        {
            // Remove last character from string
            tokens.RemoveAt(current);
            current--;
            totalLength--;
        }
    }

    // If we're now at the right length, but the last two tokens are <color></color>, remove them
    if (tokens.Count >= 2 && tokens.Last().StartsWith("</") && tokens[tokens.Count - 2].StartsWith("<c"))
    {
        tokens.RemoveAt(tokens.Count - 1);
        tokens.RemoveAt(tokens.Count - 1);
    }
    return string.Join("", tokens);
}

public static IEnumerable<string> Tokenize(string input)
{
    int index = 0;
    while (index < input.Length)
    {
        if (input[index] == '<')
        {
            int endIndex = index;
            while (endIndex < input.Length && input[endIndex] != '>')
                endIndex++;
            if (endIndex < input.Length)
                endIndex++;
            yield return input.Substring(index, endIndex - index);
            index = endIndex;
        }
        else
        {
            yield return input.Substring(index, 1);
            index++;
        }
    }
}

示例代码:

var myString = "My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>";
for (int length = 1; length < 100; length++)
    Console.WriteLine($"{length}: {Shorten(myString, length)}");

输出:

1: M
2: My
3: My 
4: My n
5: My na
6: My nam
7: My name
8: My name 
9: My name i
10: My name is
11: My name is 
12: My name is <color=#ff00ee>A</color>
13: My name is <color=#ff00ee>AB</color>
14: My name is <color=#ff00ee>ABC</color>
15: My name is <color=#ff00ee>ABCD</color>
16: My name is <color=#ff00ee>ABCDE</color>
17: My name is <color=#ff00ee>ABCDE</color> 
18: My name is <color=#ff00ee>ABCDE</color> a
19: My name is <color=#ff00ee>ABCDE</color> an
20: My name is <color=#ff00ee>ABCDE</color> and
21: My name is <color=#ff00ee>ABCDE</color> and 
22: My name is <color=#ff00ee>ABCDE</color> and I
23: My name is <color=#ff00ee>ABCDE</color> and I 
24: My name is <color=#ff00ee>ABCDE</color> and I l
25: My name is <color=#ff00ee>ABCDE</color> and I lo
26: My name is <color=#ff00ee>ABCDE</color> and I lov
27: My name is <color=#ff00ee>ABCDE</color> and I love
28: My name is <color=#ff00ee>ABCDE</color> and I love 
29: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>m</color>
30: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>mu</color>
31: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>mus</color>
32: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>musi</color>
33: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
34: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
35: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
36: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
37: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
38: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
39: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
... and so on

【讨论】:

    【解决方案2】:

    我生成了 2 个列表:

    • 一个包含真实文本的索引
    • 一个包含标签的开始和结束索引

    然后我将文本提取到第一个数组中的最大长度。 最后,我检查是否有一个开始标签,如果有,我就关闭它。

    注意:我的代码不处理嵌套标签。您必须更改结束标记部分。

    public static string Truncate(string text, int maxLength)
        {
            if (text.Length <= maxLength) return text;
    
            var tagIndexes = new List<int>();
            var realTextIndexes = new List<int>();
            bool isInTag = false;
            for (int i = 0; i < text.Length; i++)
            {
                if (text[i] == '<')
                {
                    isInTag = true;
                    tagIndexes.Add(i);
                }
    
                if (!isInTag)
                {
                    realTextIndexes.Add(i);
                }
    
                if (text[i] == '>')
                {
                    isInTag = false;
                    tagIndexes.Add(i);
                }
            }
    
            if (realTextIndexes.Count <= maxLength) return text;
    
            string truncatedText = text.Substring(0, realTextIndexes[maxLength - 1] + 1);
    
            // Should we close a tag ?
            for (int i = 0; i < tagIndexes.Count; i++)
            {
                if (tagIndexes[i] > realTextIndexes[maxLength - 1])
                {
                    if ((i % 4) == 2) // If the next tag is a closing tag
                    {
                        truncatedText += text.Substring(tagIndexes[i], tagIndexes[i + 1] - tagIndexes[i] + 1);
                    }
    
                    break;
                }
            }
    
            return truncatedText;
        }
    

    【讨论】:

      猜你喜欢
      • 2018-08-24
      • 2019-01-03
      • 1970-01-01
      • 2012-07-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多