【问题标题】:How to implement the remove function of a trie in python?如何在python中实现trie的remove功能?
【发布时间】:2013-03-29 18:30:24
【问题描述】:

我已经阅读了 python 中 trie 的以下实现: https://stackoverflow.com/a/11016430/2225221

并尝试为其设置删除功能。 基本上,我一开始就有问题:如果你想从树中删除一个词,它可以有子“词”,也可以是另一个词的“子词”。

如果你用“del dict[key]”删除,你也删除了上面提到的这两种词。 任何人都可以帮我解决这个问题,如何正确删除所选单词(让我们假设它在 trie 中)

【问题讨论】:

    标签: python python-2.7 trie


    【解决方案1】:

    基本上,要从特里删除一个单词(因为它在您链接到的答案中实现),您只需要删除它的 _end 标记,例如:

    def remove_word(trie, word):
        current_dict = trie
        for letter in word:
            current_dict = current_dict.get(letter, None)
            if current_dict is None:
                # the trie doesn't contain this word.
                break
        else:
            del current_dict[_end]
    

    但是请注意,这并不能确保 trie 具有最小的大小。删除单词后,可能会在左侧的 trie 中存在不再被任何单词使用的分支。这不会影响数据结构的正确性,它只是意味着 trie 可能会消耗比绝对必要更多的内存。您可以通过从叶节点向后迭代并删除分支来改进这一点,直到找到一个具有多个子节点的分支。

    编辑: 这里有一个想法,你可以如何实现一个删除函数,该函数还剔除任何不必要的分支。可能有一种更有效的方法,但这可能会让你开始:

    def remove_word2(trie, word):
        current_dict = trie
        path = [current_dict]
        for letter in word:
            current_dict = current_dict.get(letter, None)
            path.append(current_dict)
            if current_dict is None:
                # the trie doesn't contain this word.
                break
        else:
            if not path[-1].get(_end, None):
                # the trie doesn't contain this word (but a prefix of it).
                return
            deleted_branches = []
            for current_dict, letter in zip(reversed(path[:-1]), reversed(word)):
                if len(current_dict[letter]) <= 1:
                    deleted_branches.append((current_dict, letter))
                else:
                    break
            if len(deleted_branches) > 0:
                del deleted_branches[-1][0][deleted_branches[-1][1]]
            del path[-1][_end]
    

    本质上,它首先找到将要删除的单词的“路径”,然后向后迭代以找到可以删除的节点。然后它会删除可以删除的路径的根(这也隐式删除了_end 节点)。

    【讨论】:

    • 谢谢,真是个好主意!我现在只有向后迭代的问题。因为你可以进入任何字典来获取键/值,但你不能(据我所知)得到“父”字典。但是如果你想重新添加同一个词,你只需“重新-添加”_end 符号,非常感谢! :)
    • 是的,如果不能直接访问“父”字典,这有点棘手,请参阅我的编辑,了解如何在不改变整体数据结构的情况下做到这一点。
    • 再次编辑以提高效率。基本上,没有必要单独删除所有已删除的字典,只需删除找到的路径的“根”就足够了(所有其他的都将是该路径的子目录)。
    • 您可以通过在搜索单词(第一个 for 循环)期间保存不同的信息来稍微简化remove_word2。如果len(current_dict) &gt; 1(替换任何以前保存的值),您可以只保存当前字典和字母,而不是总是将当前字典添加到您的path 列表中。然后在循环结束后,从保存的字典中删除保存的字母。
    • @Blckknght 我认为如果删除的单词是 trie 中另一个单词的前缀,那将不起作用(尽管第二个 for 循环不必构建列表,因为它只使用之后的最后一个元素)。
    【解决方案2】:

    我觉得最好是递归的,代码如下:

    def remove(self, word):
        self.delete(self.tries, word, 0)
    
    def delete(self, dicts, word, i):
        if i == len(word):
            if 'end' in dicts:
                del dicts['end']
                if len(dicts) == 0:
                    return True
                else:
                    return False
            else:
                return False
        else:
            if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
                if len(dicts[word[i]]) == 0:
                    del dicts[word[i]]
                    return True
                else:
                    return False
    
            else:
                return False
    

    【讨论】:

      【解决方案3】:
      def remove_a_word_util(self, word, idx, node):
          if len(word) == idx:
              node.is_end_of_word = False
              return bool(node.children)
      
          ch = word[idx]
          if ch not in node.children:
              return True
      
          flag = self.remove_a_word_util(word, idx+1, node.children[ch])
          if flag:
              return True
      
          node.children.pop(ch)
          return bool(node.children) or node.is_end_of_word
      

      【讨论】:

        【解决方案4】:

        处理此类结构的一种方法是通过recursion。在这种情况下,递归的好处在于它会拉到 trie 的底部,然后将返回的值通过分支传递回来。

        下面的函数就是这样做的。它转到叶子并删除_end 值,以防输入单词是另一个单词的前缀。然后它传递一个布尔值 (boo),表示 current_dict 仍位于外围分支中。一旦我们达到当前 dict 有多个子节点的点,我们删除相应的分支并将 boo 设置为 False,这样每个剩余的递归都不会执行任何操作。

        def trie_trim(term, trie=SYNONYMS, prev=0):
            # checks that we haven't hit the end of the word
            if term:
                first, rest = term[0], term[1:]
                current_length = len(trie)
                next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)
        
                # this statement avoids trimming excessively if the input is a prefix because
                # if the word is a prefix, the first returned value will be greater than 1
                if boo and next_length > 1:
                    boo = False
        
                # this statement checks for the first occurrence of the current dict having more than one child
                # or it checks that we've hit the bottom without trimming anything
                elif boo and (current_length > 1 or not prev):
                    del trie[first]
                    boo = False
        
                return current_length, boo
        
            # when we do hit the end of the word, delete _end
            else:
                del trie[_end]
                return len(trie) + 1, True
        

        【讨论】:

          【解决方案5】:

          有点长,但我希望这有助于回答您的问题:

          class Trie:
              WORD_END = "$"
              
              def __init__(self):
                  self.trie = {}
          
              def insert(self, word):
                  cur = self.trie
                  for char in word:
                      if char not in cur:
                          cur[char] = {}
                      cur = cur[char]
                  cur[Trie.WORD_END] = word
          
              def delete(self, word):
                  def _delete(word, cur_trie, i=0):
                      if i == len(word):
                          if Trie.WORD_END not in cur_trie:
                              raise ValueError("'%s' is not registered in the trie..." %word)
                          cur_trie.pop(Trie.WORD_END)
                          if len(cur_trie) > 0:
                              return False
                          return True
                      if word[i] not in cur_trie:
                          raise ValueError("'%s' is not registered in the trie..." %word)
                      cont = _delete(word, cur_trie[word[i]], i+1)
                      if cont:
                          cur_trie.pop(word[i])
                          if Trie.WORD_END in cur_trie:
                              return False
                          return True
                      return False
                  _delete(word, self.trie)
          
          t = Trie()
          t.insert("bar")
          t.insert("baraka")
          t.insert("barakalar")
          
          t.delete("barak") # raises error as 'barak' is not a valid WORD_END although it is a valid path.
          t.delete("bareka") # raises error as 'e' does not exist in the path.
          t.delete("baraka") # deletes the WORD_END of 'baraka' without deleting any letter as there is 'barakalar' afterwards.
          t.delete("barakalar") # deletes until the previous word (until the first Trie.WORD_END; "$" - by going backwards with recursion) in the same path (until 'baraka').
          

          【讨论】:

            【解决方案6】:

            如果您需要整个 DS:

            class TrieNode:
                def __init__(self):
                    self.children = {}
                    self.wordCounter = 0
                    self.prefixCounter = 0
            
            class Trie:
                def __init__(self):
                    self.root = TrieNode()
            
                def insert(self, word: str) -> None:
                    node = self.root
                    for char in word:
                        if char not in node.children:
                            node.children[char] = TrieNode()
                        
                        node.prefixCounter += 1                  
                        node = node.children[char] 
            
                    node.wordCounter += 1
            
                def countWordsEqualTo(self, word: str) -> int:
                    node = self.root
                    if node.children:
                        for char in word:
                            node = node.children[char]               
                    else:
                        return 0
                        
                    return node.wordCounter
             
                def countWordsStartingWith(self, prefix: str) -> int:
                    node = self.root
                    if node.children:
                        for char in prefix:
                            node = node.children[char]               
                    else:
                        return 0
            
                    return node.prefixCounter
            
                def erase(self, word: str) -> None:
                    node = self.root
                    for char in word:
                        if node.children:
                            node.prefixCounter -= 1
                            node = node.children[char]
                        else:
                            return None
            
                    node.wordCounter -= 1
            
                    if node.wordCounter == 0:
                        self.dfsRemove(self.root, word, 0)
            
                def dfsRemove(self, node: TrieNode, word: str, idx: int) -> None:
                    if len(word) == idx:
                        node.wordCounter = 0
                        return
            
                    char = word[idx]
                    if char not in node.children:
                        return
            
                    self.dfsRemove(node.children[char], word, idx+1)
                    
                    node.children.pop(char)
                    
                        
            
            
            
            trie = Trie();
            trie.insert("apple");                     #// Inserts "apple".
            trie.insert("apple");                     #// Inserts another "apple".
            print(trie.countWordsEqualTo("apple"))    #// There are two instances of "apple" so return 2.
            print(trie.countWordsStartingWith("app")) #// "app" is a prefix of "apple" so return 2.
            trie.erase("apple")                       #// Erases one "apple".
            print(trie.countWordsEqualTo("apple"))    #// Now there is only one instance of "apple" so return 1.
            print(trie.countWordsStartingWith("app")) #// return 1
            trie.erase("apple");                      #// Erases "apple". Now the trie is empty.
            print(trie.countWordsEqualTo("apple"))    #// return 0
            print(trie.countWordsStartingWith("app")) #// return 0
            
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2012-06-25
              • 1970-01-01
              • 2020-06-18
              • 2019-07-05
              相关资源
              最近更新 更多