运行长度编码（整数）答案

【问题标题】：Run Length Encoding (Integers)运行长度编码（整数）
【发布时间】：2016-11-07 20:07:47
【问题描述】：

在课堂上，我们正在讨论 RLE，我们的教授向我们展示了以下代码。我试图理解它，但我不太明白。因此，如果有人可以向我解释此示例中的 RLE 是如何工作的，我将非常感激。我确实了解如何实现数据压缩，但我不了解程序的实施。在 cmets 你会找到我的问题。

// Example implementation of a simple variant of // run-length encoding and  decoding of a byte sequence

#include <iostream> 
#include <cassert>

// PRE: 0 <= value <= 255 
// POST: returns true if value is first byte of a tuple, otherwise false 

bool is_tuple_start(const unsigned int value) 
{ 
    assert(0 <= value && value <= 255);
    return value >= 128; //Why is it: value>=128 for first Byte of tuple?
}

// PRE: 1 <= runlength <= 127 //Why must runlength be in this range?
// POST: returns encoded runlength byte 

unsigned int make_tuple_start(const unsigned int run_length) 
{ 
    assert(1 <= run_length && run_length <= 127);
    return run_length + 128; //Why do I add 128?
}

// PRE: n/a 
// POST: returns true if value equals the maximal run-length 

bool is_max_runlength(const unsigned int value)  
{
    return value == 127; //same question: why is max. range 127?
}

// PRE: 128 <= value <= 255 //Why this range for value?
// POST: returns runlength of tuple 

unsigned int get_runlength(const unsigned int value) 
{ 
    assert(128 <= value && value <= 255);
    return value - 128; //Why -128?
}

// PRE: n/a 
// POST: outputs value and adds a newline 

void out_byte(const unsigned int value) 
{ 
    std::cout << value << "\n"; 
}

// PRE: 1 <= runlength <= 127 and 0 <= value <= 255 
// POST: outputs run length encoded bytes of tuple 

void output(const unsigned int run_length, const unsigned int value) 
{ 
    assert(1 <= run_length && run_length <= 127); 
    assert(0 <= value && value <= 255); //Why is value now between 0 and 255?

    if (run_length == 1 && !is_tuple_start(value)) 
        { 
            out_byte(value); 
        } 
    else 
        { 
            out_byte(make_tuple_start(run_length)); 
            out_byte(value); 
        }
}

// PRE: n/a 
// POST: returns true if 0 <= value <= 255, otherwise false 

bool is_byte(const int value) 
{ 
    return 0 <= value && value <= 255; 
}

// PRE: n/a 
// POST: outputs error if value does not indicate end of sequence 

void check_end_of_sequence(const int value) 
{ 
    if (value != -1) 
        { 
            std::cout << "error\n"; 
        } 
}

// PRE: n/a 
// POST: reads byte sequence and outputs encoded bytes 

void encode() 
{ 
    std::cout << "--- encoding: enter byte sequence, terminate with -1\n";
    int value;

    std::cin >> value;

    if (is_byte(value)) 
        { 
            int prev_value = value; //When/Where does value Change?
            unsigned int run_length = 1;

            while(true) 
                {
                    // read next byte, stop if invalid or end of sequence 

                    std::cin >> value; 
                    if (!is_byte(value)) 
                        { break; }

                    // output if value has changed or maximal runlength is reached 
                    // otherwise increase length of current run 

                    if (value != prev_value || is_max_runlength(run_length)) 
                        { 
                            output(run_length, prev_value); 
                            run_length = 1; 
                            prev_value = value; 
                        } 
                    else { ++run_length; }
                }
            output(run_length, prev_value);
        }

    // output "error" if sequence terminated incorrectly 

    check_end_of_sequence(value);
}

// PRE: n/a 
// POST: reads byte sequence and outputs decoded bytes 

void decode() 
{ 
    std::cout << "--- decoding: enter byte sequence, terminate with -1\n";
    int value; 

    while(true) {

        // read next byte, stop if invalid or end of sequence 

        std::cin >> value; //is value only a Byte? Or the whole sequence?

        if (!is_byte(value)) 
            { break; }

        // if this is a tuple output read next byte, otherwise output directly 

        if (is_tuple_start(value)) 
            {
                unsigned int run_length = get_runlength(value);

                // next must be a valid byte, otherwise this is an error 
                std::cin >> value; 

                if (!is_byte(value)) 
                    { 
                        value = 0; 
                        // trigger error in case value = -1 
                        break; 
                    }

                // output uncompressed tuple 

                for(int i = 0; i < run_length; ++i) 
                    { 
                        out_byte(value); 
                    }
            } 

        else { out_byte(value); }
    }

    // output "error" if sequence terminated incorrectly 

    check_end_of_sequence(value);
}


int main(const int argc, const char* argv[]) 
{ 
    std::cout << "--- select mode: 0 = encode / 1 = decode\n"; 

    unsigned int mode; 
    std::cin >> mode;

    if (mode == 0) 
        { 
            encode(); 
        } 
    else if (mode == 1) 
        { 
            decode();
        } 
    else 
        { 
            std::cout << "--- unknown mode, must be 0 (encode) or 1 (decode)\n"; 
        }
}

我希望得到我的问题的答案，并且代码是可读的，基本上是从我的讲义中复制+粘贴。

【问题讨论】：

还给出了一个示例输入：编码：0 42 42 85 85 85 85 172 172 172 13 13 42 -1 和解码：1 2 42 4 85 3 172 2 13 1 42 -1
将其添加到问题中，而不是评论。

标签： c++ encoding integer run-length-encoding

【解决方案1】：

这种编码的工作方式是将一系列重复值存储为：

<length> <value>

当一个非重复值被简单地存储为：

<value>

但是当你在编码序列中看到一个数字时，你怎么知道它是第一种格式的长度部分，还是只是一个单一的、不重复的值？它通过使用我们在编码之前将长度添加 128 的规则来做到这一点。所以任何大于 128 的数字都是<length> 开始第一种格式的字节。

但是如果非重复项的值高于 128 怎么办？解决方案是对大值使用第一种格式，将其视为带有runlength = 1 的重复值。

这应该可以回答您的大部分问题，包括所有范围的加法和减法。

为什么游程必须在这个范围内？

我们将所有内容都存储为从 0 到 255 的字节。如果长度 >127，那么当我们向其添加 128 时，我们会得到一个 >255 的数字，它不适合一个字节。

值只是一个字节吗？还是整个序列？

声明是int value;，所以它只是一个数字。每次执行cin >> value; 时，它都会获取序列中的下一个字节。

为什么现在 value 介于 0 和 255 之间？

值总是被允许是一个完整的字节，只有长度被限制为 127，因为我们给它们加上了 128。参见上面的解释，高值总是被编码为长度在前的元组。

【讨论】：

非常感谢这个启发性的回答！再问一个问题，你知道为什么value > 127 表示is_tuple_start 吗？在我的理解中，编码的普通字符也可以大于127，所以不一定意味着它是元组的开始？
这就是它如何判断字节是长度还是数据字节。
但我认为 128-256 是扩展 ASCII 字符，也可能在数据字节中？还是我们只是假设这些不包含在数据中？
您不能将它们作为单个字节进行。如果你想要字节200，你必须将它表示为129 200：长度1，字节值200。你只能将128以下的值表示为单个字节。
另请注意，传统 RLE 在所有内容之前都使用长度：length1 byte1 length2 byte2 ...。这里的方法能够在长度 = 1 且字节