【问题标题】:Text Parsing - My Parser Skipping commands文本解析 - 我的解析器跳过命令
【发布时间】:2010-05-25 19:06:23
【问题描述】:

我正在尝试解析文本格式。我想用反引号(`)标记内联代码,就像 SO 一样。规则应该是,如果你想在内联代码元素内使用反引号,你应该在内联代码周围使用双反引号。

像这样:

``用反引号(`)标记内联代码``

由于某种原因,我的解析器似乎完全跳过了双反引号。下面是执行内联代码解析的函数的代码:

    private string ParseInlineCode(string input)
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] == '`' && input[i - 1] != '\\')
            {
                if (input[i + 1] == '`')
                {
                    string str = ReadToCharacter('`', i + 2, input);
                    while (input[i + str.Length + 2] != '`')
                    {
                        str += ReadToCharacter('`', i + str.Length + 3, input);
                    }
                    string tbr = "``" + str + "``";
                    str = str.Replace("&", "&amp;");
                    str = str.Replace("<", "&lt;");
                    str = str.Replace(">", "&gt;");
                    input = input.Replace(tbr, "<code>" + str + "</code>");
                    i += str.Length + 13;
                }
                else
                {
                    string str = ReadToCharacter('`', i + 1, input);
                    input = input.Replace("`" + str + "`", "<code>" + str + "</code>");
                    i += str.Length + 13;
                }
            }
        }
        return input;
    }

如果我在某些东西周围使用单个反引号,它会正确地将其包装在 &lt;code&gt; 标记中。

【问题讨论】:

  • RegEx 不是更适合这份工作吗?

标签: c# text-parsing


【解决方案1】:

while-循环中

while (input[i + str.Length + 2] != '`')
{
    str += ReadToCharacter('`', i + str.Length + 3, input);
}

您查看了错误的索引 - i + str.Length + 2 而不是 i + str.Length + 3 - 然后您必须在正文中添加反引号。应该是

while (input[i + str.Length + 3] != '`')
{
    str += '`' + ReadToCharacter('`', i + str.Length + 3, input);
}

但是您的代码中还有一些错误。如果输入的第一个字符是反引号,则以下行将导致 IndexOutOfRangeException

 if (input[i] == '`' && input[i - 1] != '\\')

如果输入包含奇数个单独的反引号并且输入的最后一个字符是反引号,则以下行将导致IndexOutOfRangeException

if (input[i + 1] == '`')

您可能应该将您的代码重构为更小的方法,而不是在一个方法中处理许多情况 - 这很容易出现错误。如果您还没有为代码编写单元测试,我强烈建议您这样做。并且由于解析器不是很容易测试,因为您必须为各种无效输入做好准备,您可以查看PEX - 一个通过分析所有分支点并尝试为您的代码自动生成测试用例的工具采用所有可能的代码路径。

我迅速启动 PEX 并针对代码运行它 - 它找到了我想到的 IndexOutOfRangeException 以及更多内容。当然,如果输入是空引用,PEX 会发现明显的NullReferenceExceptions。以下是 PEX 发现导致异常的输入。

case1 = "`"

case2 = "\0`"

case3 = "\0``"

case4 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0````"

case5 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`"

case6 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``<\0\0```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0`\0```````````````"

我对代码的“修复”更改了导致异常的输入(并且可能还引入了新的错误)。 PEX 在修改后的代码中捕获了以下内容。

case7 = "\0```"

case8 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0`\0"

case9 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``\0`\0`\0``"

所有三个输入都没有导致原始代码中的异常,而案例 4 和 6 不再导致修改后的代码中的异常。

【讨论】:

    【解决方案2】:

    这是一个在 LinqPad 中测试的小 sn-p 以帮助您入门

    void Main()
    {
        string test = "here is some code `public void Method( )` but ``this is not code``";
        Regex r = new Regex( @"(`[^`]+`)" );
    
        MatchCollection matches = r.Matches( test );
    
        foreach( Match match in matches )
        {
            Console.Out.WriteLine( match.Value );
            if( test[match.Index - 1] == '`' )
                Console.Out.WriteLine( "NOT CODE" );
                else
            Console.Out.WriteLine( "CODE" );
        }
    }
    

    输出:

    `public void Method( )`
    CODE
    `this is not code`
    NOT CODE
    

    【讨论】:

    • 我认为您将反引号与单引号混淆
    • 确实我输入了单引号,固定的。
    猜你喜欢
    • 2016-05-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-10-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多