霍夫曼后缀代码答案

【问题标题】：Huffman suffix-code霍夫曼后缀代码
【发布时间】：2017-06-24 11:40:31
【问题描述】：

我正在尝试为给定的一组字符及其概率有效地构造一个二进制后缀代码（即一组单词，其中没有一个是任何其他的后缀）。

我的基本想法是使用 Huffman 算法的实现来构造前缀码。通过反转代码字，我得到一个无后缀的代码。虽然这个解决方案有效，但它可能看起来不是最优的，因为我必须反转可变长度代码字（因此我需要一个结合位移位的查找表）。

有什么方法可以修改霍夫曼算法以更有效地创建后缀代码？

【问题讨论】：

为什么会有问题？无论如何，反转只会发生一次，不是吗？其实不需要显式，只要把树转换成查找表时逆向构建代码即可。
@harold 你是对的，反转只发生一次。我当然可以在构建查找表时反转代码。我只是好奇在构建树时是否有任何方法可以进行反转。只是为了优化。
是同一棵树。只是解释不同

标签： compression huffman-code

【解决方案1】：

我会将 HuffmanNode 实现为

class HuffmanNode implements Comparable<HuffmanNode>
{
    // data
    private String text;
    private double frequency;

    // linkage
    private HuffmanNode left;
    private HuffmanNode right;
    private HuffmanNode parent;

    public HuffmanNode(String text, double frequency)
    {
        this.text = text;
        this.frequency = frequency;
    }
    public HuffmanNode(HuffmanNode n0, HuffmanNode n1)
    {
        if(n0.frequency < n1.frequency)
        {
            left = n0;
            right = n1;
        }else if(n0.frequency > n1.frequency)
        {
            left = n1;
            right = n0;
        }else
        {
            if(n0.text.compareTo(n1.text) < 0)
            {
                left = n0;
               right = n1;
            }else
            {
                left = n1;
                right = n0;
            }
        }
        left.parent = this;
        right.parent = this;
        text = left.text + right.text;
        frequency = left.frequency + right.frequency;
    }

    public HuffmanNode getParent() {
        return parent;
    }

    public HuffmanNode getLeft() {
       return left;
    }

    public HuffmanNode getRight() {
        return right;
    }

    public String getText()
    {
        return text;
    }

    @Override
    public int compareTo(HuffmanNode o) {
        if(frequency < o.frequency)
            return -1;
        else if(frequency > o.frequency)
            return 1;
        else
            return text.compareTo(o.text);
    }

    public Collection<HuffmanNode> leaves()
    {
        if(left == null && right == null)
        {
            Set<HuffmanNode> retval = new HashSet<>();
            retval.add(this);
            return retval;
        }
        else if(left == null || right == null)
        {
            Set<HuffmanNode> retval = new HashSet<>();
            if(left != null)
                retval.addAll(left.leaves());
            if(right != null)
                retval.addAll(right.leaves());
            retval.add(this);
            return retval;
        }
        else
        {
            Set<HuffmanNode> retval = new HashSet<>();
            retval.addAll(left.leaves());
            retval.addAll(right.leaves());
            return retval;
        }
    }

    public String toString()
    {
         return "{" + text + " -> " + frequency + "}";
    }
}

此类表示霍夫曼树中的单个节点。
它具有从（子）树中获取所有叶子的便捷方法。

然后您可以轻松构建树：

private Map<String,String> buildTree(String text)
{
    List<HuffmanNode> nodes = new ArrayList<>();
    for(Map.Entry<String,Double> en : frequency(text).entrySet())
    {
        nodes.add(new HuffmanNode(en.getKey(), en.getValue()));
    }
    java.util.Collections.sort(nodes);
    while(nodes.size() != 1)
    {
        HuffmanNode n0 = nodes.get(0);
        HuffmanNode n1 = nodes.get(1);

        // build merged node
        HuffmanNode newNode = new HuffmanNode(nodes.get(0), nodes.get(1));
        nodes.remove(n0);
        nodes.remove(n1);

        // calculate insertion point
        int insertionPoint = - java.util.Collections.binarySearch(nodes, newNode) - 1;

        // insert
        nodes.add(insertionPoint, newNode);
    }

    // build lookup table
    Map<String, String> lookupTable = new HashMap<>();
    for(HuffmanNode leaf : nodes.iterator().next().leaves())
    {
        String code = "";
        HuffmanNode tmp = leaf;
        while(tmp.getParent() != null)
        {
            if(tmp.getParent().getLeft() == tmp)
                code = "0" + code;
            else
                code = "1" + code;
            tmp = tmp.getParent();
        }
        lookupTable.put(leaf.getText(), code);
    }
    return lookupTable;
}

通过更改构建代码的方法（例如，预先添加下一个数字而不是附加它），您可以更改生成的代码。

【讨论】：

【解决方案2】：

我制作了部署 C++ 的 Huffman 编码树，如下所示：

为此我创建了三个类 - HuffmanTree、BinTree 和 BinNode。

更多详情可以在我的 GitHub 上查看：https://github.com/MouChiaHung/DataStructures

检查这三个文件：bin_node.h、bin_tree.h 和 huffman_tree.h。他们读取源文件“source”，以霍夫曼方式encode到文件“encode”，然后解码文件“encode”并将结果存储到输出文件“decode”。此外，霍夫曼表记录在文件“table”中。

其中一个核心函数是 HuffmanTree::encode()，它从源文件中读取字符。

template<typename T> void amo::HuffmanTree<T>::grow(std::list<Model*>& list) { //ascendantly sorted list
Model* l;
Model* r;
Model* m;
BinNode<T>* lchild;
BinNode<T>* rchild;
BinNode<T>* vertex;
std::list<Model*>::iterator it = list.begin();
std::vector<BinNode<T>*> subs; //roots of sub-trees
typename std::vector<BinNode<T>*>::iterator it_subs = subs.begin();
int i = 0;
while (it!=list.end()) {
    lchild = NULL;
    rchild = NULL;
    vertex = NULL;
    cout << YELLOW << "while-loop:" << ++i << WHITE << endl;
    if (std::next(it,1) == list.end()) { //met the last and single leaf or sub-tree 
        if (subs.size() > 1) {
            cout << RED << "size of sub-tree is more than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
        else if (subs.size() == 1){ 
            if (**it == subs.back()->data) { //met the last sub-tree 
                cout << GREEN << "going to attach the last sub-tree" << WHITE << endl;
                vertex = subs.back();
                subs.pop_back();
            } 
            else { //met the last leaf 
                cout << GREEN << "going to attach the last leaf" << WHITE << endl;
                r = *it;
                lchild = subs.back();
                subs.pop_back();
                cout << CYAN << "lchild points to the root of the last sub-tree:" << *lchild;
                rchild = new BinNode<T>(*r);
                cout << CYAN << "rchild points to a new node:" << *rchild;
                m = new Model(CHAR_VERTEX, (lchild->data.prob)+(r->prob));
                vertex = new BinNode<T>(*m);
                lchild->parent = vertex;
                rchild->parent = vertex;
                vertex->lchild = lchild;
                vertex->rchild = rchild;
            }   
            this->_root = vertex;
            cout << CYAN << "root:" << *this->_root <<  WHITE << endl;
            break;
        }
        else {
            cout << RED << "size of sub-tree is less than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
    }
    else {
        l = *it;
        it++;
        r = *it;
        m = new Model(CHAR_VERTEX, l->prob+r->prob);        

        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set lchild if any sub-tree corresponds with this l model iterated currently 
            if (*l == (*it_subs)->data) {
                cout << CYAN << "lchild points to the root of sub-tree:" << **it_subs;
                lchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (lchild != NULL) break; //tricky but important
        }
        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set rchild if any sub-tree corresponds with this r model iterated currently 
            if (*r == (*it_subs)->data) {
                cout << CYAN << "rchild points to the root of sub-tree:" << **it_subs;
                rchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (rchild != NULL) break; //tricky but important
        }
        if (lchild == NULL) { //set lchild with a new node if no any sub-tree corresponds with this l model iterated currently, which means meeting a row leaf 
            lchild = new BinNode<T>(*l);
            cout << CYAN << "lchild points to a new node:" << *lchild;
        }
        if (rchild == NULL) { //set rchild with a new node if no any sub-tree corresponds with this r model iterated currently, which means meeting a row leaf
            rchild = new BinNode<T>(*r);
            cout << CYAN << "rchild points to a new node:" << *rchild;
        }

        vertex = new BinNode<T>(*m);
        std::cout << GREEN << "growing..." << WHITE << endl;
        std::cout << CYAN << "lchild" << *lchild << WHITE;
        std::cout << CYAN << "rchild" << *rchild << WHITE;
        std::cout << CYAN << "vertex" << *vertex << WHITE;
        lchild->parent = vertex;
        rchild->parent = vertex;
        vertex->lchild = lchild;
        vertex->rchild = rchild;
        subs.push_back(vertex);
        for (std::list<Model*>::iterator itt=it;itt!=list.end();itt++) {
            if ((*m < **itt) || (*m == **itt)) {
                list.insert(itt, m);
                break;
            }
            else if (std::next(itt,1) == list.end()) {
                list.push_back(m);
                break;
            }
        }
        it++;
    }
}

this->updateHeightAll();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree top -*-*-*-*-*-*-*-*" << WHITE << endl;
this->traverseLevel();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree bottom -*-*-*-*-*-*-*-*" << WHITE << endl;

subs.clear();}

另一个核心函数是 Huffman::grow()，它为 PFC 编码生成二叉树。

template<typename T> void amo::HuffmanTree<T>::grow(std::list<Model*>& list) { //ascendantly sorted list
Model* l;
Model* r;
Model* m;
BinNode<T>* lchild;
BinNode<T>* rchild;
BinNode<T>* vertex;
std::list<Model*>::iterator it = list.begin();
std::vector<BinNode<T>*> subs; //roots of sub-trees
typename std::vector<BinNode<T>*>::iterator it_subs = subs.begin();
int i = 0;
while (it!=list.end()) {
    lchild = NULL;
    rchild = NULL;
    vertex = NULL;
    cout << YELLOW << "while-loop:" << ++i << WHITE << endl;
    if (std::next(it,1) == list.end()) { //met the last and single leaf or sub-tree 
        if (subs.size() > 1) {
            cout << RED << "size of sub-tree is more than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
        else if (subs.size() == 1){ 
            if (**it == subs.back()->data) { //met the last sub-tree 
                cout << GREEN << "going to attach the last sub-tree" << WHITE << endl;
                vertex = subs.back();
                subs.pop_back();
            } 
            else { //met the last leaf 
                cout << GREEN << "going to attach the last leaf" << WHITE << endl;
                r = *it;
                lchild = subs.back();
                subs.pop_back();
                cout << CYAN << "lchild points to the root of the last sub-tree:" << *lchild;
                rchild = new BinNode<T>(*r);
                cout << CYAN << "rchild points to a new node:" << *rchild;
                m = new Model(CHAR_VERTEX, (lchild->data.prob)+(r->prob));
                vertex = new BinNode<T>(*m);
                lchild->parent = vertex;
                rchild->parent = vertex;
                vertex->lchild = lchild;
                vertex->rchild = rchild;
            }   
            this->_root = vertex;
            cout << CYAN << "root:" << *this->_root <<  WHITE << endl;
            break;
        }
        else {
            cout << RED << "size of sub-tree is less than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
    }
    else {
        l = *it;
        it++;
        r = *it;
        m = new Model(CHAR_VERTEX, l->prob+r->prob);        

        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set lchild if any sub-tree corresponds with this l model iterated currently 
            if (*l == (*it_subs)->data) {
                cout << CYAN << "lchild points to the root of sub-tree:" << **it_subs;
                lchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (lchild != NULL) break; //tricky but important
        }
        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set rchild if any sub-tree corresponds with this r model iterated currently 
            if (*r == (*it_subs)->data) {
                cout << CYAN << "rchild points to the root of sub-tree:" << **it_subs;
                rchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (rchild != NULL) break; //tricky but important
        }
        if (lchild == NULL) { //set lchild with a new node if no any sub-tree corresponds with this l model iterated currently, which means meeting a row leaf 
            lchild = new BinNode<T>(*l);
            cout << CYAN << "lchild points to a new node:" << *lchild;
        }
        if (rchild == NULL) { //set rchild with a new node if no any sub-tree corresponds with this r model iterated currently, which means meeting a row leaf
            rchild = new BinNode<T>(*r);
            cout << CYAN << "rchild points to a new node:" << *rchild;
        }

        vertex = new BinNode<T>(*m);
        std::cout << GREEN << "growing..." << WHITE << endl;
        std::cout << CYAN << "lchild" << *lchild << WHITE;
        std::cout << CYAN << "rchild" << *rchild << WHITE;
        std::cout << CYAN << "vertex" << *vertex << WHITE;
        lchild->parent = vertex;
        rchild->parent = vertex;
        vertex->lchild = lchild;
        vertex->rchild = rchild;
        subs.push_back(vertex);
        for (std::list<Model*>::iterator itt=it;itt!=list.end();itt++) {
            if ((*m < **itt) || (*m == **itt)) {
                list.insert(itt, m);
                break;
            }
            else if (std::next(itt,1) == list.end()) {
                list.push_back(m);
                break;
            }
        }
        it++;
    }
}

this->updateHeightAll();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree top -*-*-*-*-*-*-*-*" << WHITE << endl;
this->traverseLevel();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree bottom -*-*-*-*-*-*-*-*" << WHITE << endl;

subs.clear();}

而 Huffman::generate() 创建用于编码内容的表格。

template<typename T> void amo::HuffmanTree<T>::generate() {
std::string code = "";
std::queue<BinNode<T>*> queue;
BinNode<T>* node = this->_root;
BinNode<T>* tmp;
queue.push(node);
int i = 0;
while (true) {
    if (queue.empty()) break;
    node = queue.front();
    queue.pop();
    cout << YELLOW << "while-loop:" << ++i << ", node:" << *node << WHITE << endl;

    if (node->data.c == CHAR_VERTEX) {
        //do nothing
    } 
    else {
        if (node->isLeaf()) code = "";
        tmp = node;
        while (tmp!=NULL) {
            if (tmp->isLeftChild()) code.insert(0, "0");
            else if (tmp->isRightChild()) code.insert(0, "1");
            tmp = tmp->parent;
        }
        if (node->data.c != CHAR_VERTEX) codes[node->data.c] = code;
    }

    if (node->hasLeftChild()) queue.push(node->lchild);
    if (node->hasRightChild()) queue.push(node->rchild);
}

for (std::map<char,string>::iterator it=codes.begin();it!=codes.end();it++) {
    cout << YELLOW << "codes[" << distance(codes.begin(),it) << "]:" << " key:" << it->first << " => value:" << it->second << WHITE << endl; 
}}

谢谢，欢迎提出建议。

【讨论】：

当然，我的观点集中在高效地创建基于 Huffman 算法的代码。如果我的代码对您没有建设性，请见谅。