【问题标题】：Longest common prefix based on elements基于元素的最长公共前缀
【发布时间】：2020-12-14 15:02:08
【问题描述】：

我在一个数组中有一个字符串元素数组：

["000", "1110", "01", "001", "110", "11"]

对于数组中的一个元素，

我想找到具有最长公共前缀的前一个元素索引。
如果我有多个匹配元素，则选择最近的元素索引。
如果没有找到，则只需选择上一个索引。

示例：

["000", "1110", "01", "001", "110", "11"]

Output:
[0,1,1,1,2,5]

a) "000" - output is 0, because we do not have any previous elements.
b) "1110" - output is 1, no previous element with longest prefix so select previous index.
c) "01" - output is 1,"000" has longest prefix, so its index is 1.
d) "001" - output is 1, "000" has longest prefix, so its index is 1.
e) "110" - output is 2,  "1110" has longest prefix, so its index 2.
f) "11" - output 5, "110" is most nearest element with longest prefix so its index 5.

我无法理解我需要采取什么方法来完成这项任务。你能帮帮我吗？

【问题讨论】：

在您的示例中您从 1 而不是 0 开始索引是否正确？为什么你的 commonPrefix 返回一个字符串而不是一个 int[] 或 List<Integer> 如你在你的例子中所示？
是的，起始索引是 1。我从 geekforgeeks 中的现有代码开始，但这不是正确的方法，所以卡住了从哪里开始。
您在每种方法中尝试执行的操作与方法注释和签名所说的不同。我想这些已经为你准备好了，你了解那些cmets吗？
commonPrefixUtil 看起来不错，但你没有正确使用它
@Hawk，我现在在我的帖子中添加了另一个代码，但是它的时间复杂度更高，现在如何改进它。

标签： java algorithm

【解决方案1】：

基于前一个问题的朴素解决方案

commonPrefix 应该是（根据评论）数组中最长的前缀，直到索引 n。那是什么意思？您需要计算所有前缀并选择最长的。

static String commonPrefix(String arr[], int n) {
    String longestPrefix = "";
    for (int i = 0; i < n; i++) {
        final String currentPrefix = commonPrefixUtil(arr[i], arr[n]);
        if (currentPrefix.length() > longestPrefix.length()) {
            longestPrefix = currentPrefix;
        }
    }
    return longestPrefix;
}

因此，"00" 将产生 arr = ["000", "1110", "01", "001", "110", "11"]; n = 3。

现在我们有了最长的前缀，什么？我们需要找到以该前缀开头的最接近n 的索引...

static int closestIndex(String[] arr, String longestPrefix, int n) {
    for (int i = n - 1; i >= 0; i--) {
        if (arr[i].startsWith(longestPrefix)) {
            return i + 1; // + 1 because the solution wants starting index with 1
        }
    }
    return 0;
}

如何组合？只需为每个输入调用这两个方法

public static void main(String[] args) {
    String[] words = { "000", "1110", "01", "001", "110", "11" };
    int[] output = new int[words.length];

    for (int i = 0; i < words.length; i++) {
        final String longestPrefix = commonPrefix(words, i);
        output[i] = closestIndex(words, longestPrefix, i);
    }

    System.out.println(Arrays.toString(output));
}

您已从问题中删除了您的 commonPrefixUtil 实现，因此我添加了自己的：

static String commonPrefixUtil(String str1, String str2) {
    int shorterStringLength = Math.min(str1.length(), str2.length());
    int length = 0;
    for (; length < shorterStringLength; length++) {
        if (str1.charAt(length) != str2.charAt(length)) {
            break;
        }
    }
    return str1.substring(0, length);
}

优化解决方案

我使用带有制表的动态编程创建了一个新的解决方案（如果我理解正确的话），即我使用了一个已经包含所有前缀的哈希图，这些前缀指向前缀来自的单词的索引。 Map 的值是一个排序树，因此可以很容易地确定哪个具有公共前缀的单词最接近当前索引。 HashMap 保证恒定时间操作，TreeSet 保证 log(n) 时间成本。

更简单的解释，我处理所有单词的第一个字母，然后是下一个等等。在这个过程中，我记住所有前缀子字符串的位置，同时它们会自动排序。我在处理完最长单词的最后一个字母后停止循环。

public static void main(String[] args) {
    String[] words = { "000", "1110", "01", "001", "110", "11" };

    int[] result = new int[words.length];
    final int firstWordLength = words.length > 0 ? words[0].length() : 8;
    // prefix -> [indexes of prefix occurrence]
    HashMap<String, TreeSet<Integer>> prefixes = new HashMap<>(words.length * (firstWordLength + 1) * 2);
    int wordLength = 1;
    boolean isUpdatedResult;
    do { // O(k)
        isUpdatedResult = false;
        for (int wordIdx = 0; wordIdx < words.length; wordIdx++) { // O(n)
            if (words[wordIdx].length() < wordLength) {
                continue;
            }
            final String currentPrefix = words[wordIdx].substring(0, wordLength); // Java >= 7 update 6 ? O(k) : O(1)
            final TreeSet<Integer> prefixIndexes = prefixes.get(currentPrefix); // O(1)
            if (prefixIndexes != null) {
                // floor instead of lower, because we have put `wordIdx + 1` inside
                final Integer closestPrefixIndex = prefixIndexes.floor(wordIdx); // O(log n)
                if (closestPrefixIndex != null) {
                    result[wordIdx] = closestPrefixIndex;
                    isUpdatedResult = true;
                }
            }
            // take the previous index for the result if no match
            if (result[wordIdx] == 0) {
                result[wordIdx] = wordIdx;
            }
            final TreeSet<Integer> newPrefixIndexes = prefixes.computeIfAbsent(currentPrefix, p -> new TreeSet<>()); // O(1)
            // the result index must be + 1
            newPrefixIndexes.add(wordIdx + 1); // O(log n)
        }
        wordLength++;
    } while (isUpdatedResult);

    System.out.println(Arrays.toString(result));
}

我已经用大 O 时间复杂度标记了所有操作。 n 是输入数组中的单词数，即words.length 和k 是最长单词的长度。根据Jon Skeet's post，Java 7 update 6 中子字符串的时间复杂度已更改为线性。

所以我们可以计算：

O(k) * O(n) * (O(log n) + O(k))

希望代码是可以理解的，并且我正确计算了复杂度。

【讨论】：

谢谢，但是我们差不多有4个循环，能不能降低时间复杂度？
我已经创建了一个更优化的解决方案，请查看:)
prefixes.get(currentPrefix) 是 Theta(k)，而不是 O(1)。
你应该使用prefixes.computeIfAbsent(currentPrefix, k -> new TreeSet<>()) insted of prefixes.getOrDefault(currentPrefix, new TreeSet<>())

【解决方案2】：

使用 trie 可以很容易地找到迄今为止与输入字符总数成线性关系的最长公共前缀。在 Python 中（对不起）：

import collections


class Trie:
    def __init__(self):
        self._children = collections.defaultdict(Trie)
        self._previous_index = 0

    # Find the longest prefix of word that appears in the trie,
    # return the value of _previous_index at that node.
    def previous_index(self, word):
        node = self
        for letter in word:
            child = node._children.get(letter)
            if child is None:
                break
            node = child
        return node._previous_index

    # Ensure that each of the prefixes of word exists in the trie.
    # At each node corresponding to a prefix, set its _previous_index to index.
    def insert(self, index, word):
        self._previous_index = index
        node = self
        for letter in word:
            node = node._children[letter]
            node._previous_index = index


def longest_common_prefix_indexes(words):
    trie = Trie()
    for index_minus_one, word in enumerate(words):
        print(trie.previous_index(word))
        trie.insert(index_minus_one + 1, word)


longest_common_prefix_indexes(["000", "1110", "01", "001", "110", "11"])

【讨论】：

能否请您添加 cmets，以便我尝试在 Java 中执行此操作
您能否检查一下您对线性复杂度的说法？我认为您显示的代码是 O(n*k) （插入每个单词，每个字母）。而且我不知道 get 的复杂性是什么（从快速搜索中我看到 O(key_length) 但真的不确定）。
@Hawk 与字母数量成线性关系，而不是字数。
@Hawk get 是恒定时间的（如果我们假设一个 O(1) 大小的字母表是肯定的，但对于较大的字母表来说是随机的）。

【解决方案3】：

我认为这可能对你有用

这是一个返回两个字符串的最长公共前缀长度的函数：

int commonPrefixLength(String s, String t) {
    if (s.length() == 0 || t.length() == 0) {
        return 0;
    }

    int commonPrefixLength = 0;
    int shorter = Math.min(s.length(), t.length());

    for (int i = 0; i < shorter; i++) {
        if (s.charAt(i) != t.charAt(i)) {
            break;
        }
        commonPrefixLength++;
    }

    return commonPrefixLength;
}

对于数组中的每个字符串，您可以检查以前的字符串并选择具有最长公共前缀的字符串：

    //indexing from 1
    String[] strings = new String[] {"", "000", "1110", "01", "001", "110", "11"};

    for (int i = 1; i < strings.length; i++) {
        int longestCommonPrefix = 0;
        int answer = 0;

        //for every previous string
        for (int j = i - 1; j > 0; j--) {
            int commonPrefix = commonPrefixLength(strings[j], strings[i]);
            if (commonPrefix > longestCommonPrefix) {
                longestCommonPrefix = commonPrefix;
                answer = j;
            }
        }

        //no common prefix found
        if (answer == 0) {
            answer = i - 1;
        }

        System.out.println(answer);
    }

【讨论】：