【问题标题】:Matching a sequence of 2 strings匹配2个字符串的序列
【发布时间】:2017-02-16 12:43:22
【问题描述】:

我做了一个小应用程序,其中有一个原始字符串和一个编辑过的字符串。原始字符串称为“one”,我编辑的字符串称为“two”。我想浏览每个字符串和对字符串进行的编辑,并将编辑后的单词以大写形式添加到原始字符串中,例如Original "This is original"edited "This is edited" 输出("This is original EDITED")。我希望它通过一个字符串找到匹配的字符串,一旦它发生变化就停止并更改它的大写并将单词添加到原始字符串的那个位置。到目前为止,这就是我在字符串中找到所有已编辑单词的内容。我的问题是加入字符串。预期输出"This This THIS is a new value VALUES"

我的代码是休闲的

string one = "This is a new value";
        string two = "This This is a new values";
        int index = 0;
        var coll = two.Split(' ').Select(p => one.Contains(p) ? p : p.ToUpperInvariant());

        var col2 = two.Split(' ');
        var col1 = one.Split(' ');


        for (int i = 0; i < col1.Length; i++)
        {
            var a = two.IndexOf(col2[i].ToString(), index);
            if (col2[index].ToString()==col1[i].ToString())
            {
                Debug.WriteLine(col2[index]);
            }
            else
            {




                Debug.WriteLine(col2[index].ToUpper());
                two.Insert(index, col1[i].ToString().ToUpper());
                //Debug.WriteLine(col1[i]);

                i--;

            }
            index++;
            if (index==col2.Length)
            {
                break;
            }
        }

        Console.WriteLine(string.Join(" ", two));
        Console.ReadKey();

【问题讨论】:

  • 这与你两天前问的看似相同的问题有什么不同吗? stackoverflow.com/questions/42210366/…
  • @PaulF 我以不同的方式提出了这个问题,而且我试图以不同的方式更好地编码它,考虑到字符串是否包含 2 个相同的单词等
  • 我最初的一个问题是,当没有共同的词时,预期的输出是什么:string1="This is a new Value"; string2="abc def ghi jkl mno";应该是“This ABC DEF GHI JKL MNO is a new value”还是“This ABC is DEF a GHI new KLM value MNO”或其他内容。
  • @PaulF 应该是第一个

标签: c#


【解决方案1】:

您正在解决Edit Distance 问题。您有一系列项目 - 在您的情况下是单词 - 您正在尝试计算对第一个序列所做的最小更改次数以达到第二个序列。

我建议您遵循上面链接的 Wikipedia 文章中的算法,您将获得一个非常好的实现。这些算法一开始可能看起来很吓人,但实际上它们非常简单。

以下是 C# 中的完整实现。它基于动态编程,它重构了从原始字符串到最终字符串的步骤。请注意,我的解决方案是将删除的单词写在方括号中。如果您只想跳过已删除的单词,请避免将它们添加到 ReconstructEdit() 方法的输出中。

private static string CalculateMinimumEdit(string[] original, string[] final)
{
    int[,] costs = new int[original.Length + 1, final.Length + 1];

    // =, +, - or * for equal words, inserted, deleted or modified word
    char[,] resultOf = new char[original.Length + 1, final.Length + 1];

    // Set all costs to invalid values (mark all positions not reached)
    InitializeInvalidCosts(costs);

    // Empty sequences are equal and their edit costs is 0
    // This is setting the initial state for the following calculation
    resultOf[0, 0] = '=';
    costs[0, 0] = 0;

    for (int originalIndex = 0; originalIndex < original.Length + 1; originalIndex++)
    {
        for (int finalIndex = 0; finalIndex < final.Length + 1; finalIndex++)
        {
            SetDeleteCost(costs, resultOf, originalIndex, finalIndex);
            SetInsertCost(costs, resultOf, originalIndex, finalIndex);
            SetModifiedCost(costs, resultOf, originalIndex, finalIndex);
            SetEqualCost(costs, resultOf, originalIndex, finalIndex, original, final);
        }
    }

    return ReconstructEdit(costs, resultOf, original, final);
}

private static void InitializeInvalidCosts(int[,] costs)
{
    // Set all costs to negative values
    // That will indicate that none of the positions
    // in the costs matrix has been analyzed yet
    for (int i = 0; i < costs.GetLength(0); i++)
    {
        for (int j = 0; j < costs.GetLength(1); j++)
        {
            costs[i, j] = -1;
        }
    }
}

private static void SetInsertCost(int[,] costs, char[,] resultOf, 
                                    int originalIndex, int finalIndex)
{
    // You can always assume that the new word was inserted
    // Position in original sequence remains the same
    // Position in final sequence moves by one and that is the new word
    // Cost of this change is 1
    SetCostIfBetter(costs, resultOf, originalIndex, finalIndex + 1,
                    costs[originalIndex, finalIndex] + 1, '+');
}

private static void SetDeleteCost(int[,] costs, char[,] resultOf,
                                    int originalIndex, int finalIndex)
{
    // You can always assume that one word was deleted from original sequence
    // Position in original sequence moves by one and that is the deleted word
    // Position in final sequence remains the same
    // Cost of this change is 1
    SetCostIfBetter(costs, resultOf, originalIndex + 1, finalIndex,
                    costs[originalIndex, finalIndex] + 1, '-');
}

private static void SetModifiedCost(int[,] costs, char[,] resultOf,
                                    int originalIndex, int finalIndex)
{
    // You can always assume that one word was replaced with another
    // Both positions in original and final sequences move by one
    // That means that one word from input was consumed
    // and it was replaced by a new word from the final sequence
    // Cost of this change is 1
    SetCostIfBetter(costs, resultOf, originalIndex + 1, finalIndex + 1,
                    costs[originalIndex, finalIndex] + 1, '*');
}

private static void SetEqualCost(int[,] costs, char[,] resultOf,
                                    int originalIndex, int finalIndex,
                                    string[] original, string[] final)
{
    // If incoming words in original and final sequence are the same
    // then you can take advantage and move to the next position
    // at no cost
    // Position in original sequence moves by 1
    // Position in final sequence moves by 1
    // Cost of this change is 0
    if (originalIndex < original.Length &&
        finalIndex < final.Length &&
        original[originalIndex] == final[finalIndex])
    {
        // Attempt to set new cost only if incoming words are equal
        SetCostIfBetter(costs, resultOf, originalIndex + 1, finalIndex + 1,
                        costs[originalIndex, finalIndex], '=');
    }
}

private static void SetCostIfBetter(int[,] costs, char[,] resultOf,
                                    int originalIndex, int finalIndex,
                                    int cost, char operation)
{
    // If destination cost is not set (i.e. it is negative)
    // or destination cost is non-negative but new cost is lower than that
    // then the cost can be set to new value and 
    // new operation which has caused the change can be indicated
    if (IsBetterCost(costs, originalIndex, finalIndex, cost))
    {
        costs[originalIndex, finalIndex] = cost;
        resultOf[originalIndex, finalIndex] = operation;
    }
}

private static bool IsBetterCost(int[,] costs, int originalIndex, 
                                    int finalIndex, int cost)
{
    // New cost is better than existing cost if
    // either existing cost is negative (not set), 
    // or new cost is lower
    return
        originalIndex < costs.GetLength(0) && 
        finalIndex < costs.GetLength(1) &&
        (costs[originalIndex, finalIndex] < 0 ||
            cost < costs[originalIndex, finalIndex]);
}

private static string ReconstructEdit(int[,] costs, char[,] resultOf,
                                        string[] original, string[] final)
{
    string edit = string.Empty;

    int originalIndex = original.Length;
    int finalIndex = final.Length;

    string space = string.Empty;

    while (originalIndex > 0 || finalIndex > 0)
    {
        edit = space + edit;
        space = " ";

        char operation = resultOf[originalIndex, finalIndex];

        switch (operation)
        {
            case '=':
                originalIndex -= 1;
                finalIndex -= 1;
                edit = original[originalIndex] + edit;
                break;
            case '*':
                originalIndex -= 1;
                finalIndex -= 1;
                edit = final[finalIndex].ToUpper() + edit;
                break;
            case '+':
                finalIndex -= 1;
                edit = final[finalIndex].ToUpper() + edit;
                break;
            case '-':
                originalIndex -= 1;
                edit = "[" + original[originalIndex] + "]" + edit;
                break;
        }
    }

    return edit;
}

【讨论】:

  • 感谢您的链接。你是对的,它看起来确实很吓人:P
  • 我是一个新手程序员,请你帮我解决这个算法。
  • 真的很简单。制作一个矩阵,其中 (i, j) 处的元素显示将第一个序列的前 i 个元素转换为第二个序列的前 j 个元素的成本。绘制 (i, j) 和 (i + 1, j)、(i, j + 1) 和 (i + 1, j + 1) 之间的关系 - 这是一个令人大开眼界的想法 - 然后迭代地填充矩阵。最后,只需阅读通往 (n, m) 的最佳路径,这就是您的最佳编辑序列。明白了吗? :)
  • 我理解它背后的原因,但我不确定我将如何编写它
  • 我添加了重构最低成本编辑步骤的代码。我可能会在下周将这个解决方案作为文章发布到codinghelmet.com,所以你也可以在那里查看更多解释。
猜你喜欢
  • 1970-01-01
  • 2021-08-02
  • 1970-01-01
  • 2013-12-28
  • 1970-01-01
  • 2010-11-22
  • 1970-01-01
  • 2017-07-01
  • 1970-01-01
相关资源
最近更新 更多