LZW 压缩似乎无法正常工作答案

【问题标题】：LZW compression doesn't seem to work correctlyLZW 压缩似乎无法正常工作
【发布时间】：2014-05-01 14:51:03
【问题描述】：

我正试图让这段代码正常工作，但是当我尝试对事物进行编码时，它似乎无法正常工作。我有一个 60 字节的文本文件。我对其进行编码，输出文件为 100 字节。当我解码该文件时，它会变成 65 字节。它可以正确解码，但文件大小比原始文件大。我尝试对 jpg 进行编码并且文件大小确实下降了，但是之后我无法打开文件。我试图解码 jpg 文件，但它不起作用，似乎 cmd 已冻结。这是我尝试使用的代码。

import java.util.*;
import java.io.*;

public class LZW {

// Dictionary 
public static short DSIZE = 256;
public static int DSIZEINT = 256;

/** Compress a string to a list of output symbols. */
public static List<Short> compress(String uncompressed) {
    // Build the dictionary.
    short dictSize = DSIZE;
    Map<String,Short> dictionary = new HashMap<String,Short>();
    for (short i = 0; i < DSIZE; i++)
        dictionary.put("" + (char)i, i);

    String w = "";
    List<Short> result = new ArrayList<Short>();
    for (char c : uncompressed.toCharArray()) {
        String wc = w + c;
        if (dictionary.containsKey(wc))
            w = wc;
        else {
            result.add(dictionary.get(w));
            // Add wc to the dictionary.
            dictionary.put(wc, dictSize++);
            w = "" + c;
        }
    }

    // Output the code for w.
    if (!w.equals(""))
        result.add(dictionary.get(w));
    return result;
}

 /** Compress a string to a list of output symbols, supporting larger filesizes. */
public static List<Integer> compressInt(String uncompressed) {
    // Build the dictionary.
    int dictSize = DSIZEINT;
    Map<String,Integer> dictionary = new HashMap<String,Integer>();
    for (int i = 0; i < DSIZEINT; i++)
        dictionary.put("" + (char)i, i);

    String w = "";
    List<Integer> result = new ArrayList<Integer>();
    for (char c : uncompressed.toCharArray()) {
        String wc = w + c;
        if (dictionary.containsKey(wc))
            w = wc;
        else {
            result.add(dictionary.get(w));
            // Add wc to the dictionary.
            dictionary.put(wc, dictSize++);
            w = "" + c;
        }
    }

    // Output the code for w.
    if (!w.equals(""))
        result.add(dictionary.get(w));
    return result;
}

/** Decompress a list of output ks to a string. */
public static String decompress(List<Short> compressed) {
    // Build the dictionary.
    short dictSize = DSIZE;
    Map<Short,String> dictionary = new HashMap<Short,String>();
    for (short i = 0; i < DSIZE; i++)
        dictionary.put(i, "" + (char)i);

    String w = "" + (char)(short)compressed.remove(0);
    String result = w;
    for (short k : compressed) {
        String entry;
        if (dictionary.containsKey(k))
            entry = dictionary.get(k);
        else if (k == dictSize)
            entry = w + w.charAt(0);
        else
            throw new IllegalArgumentException("Bad compressed k: " + k);

        result += entry;

        // Add w+entry[0] to the dictionary.
        dictionary.put(dictSize++, w + entry.charAt(0));

        w = entry;
    }
    return result;
}

/** Decompress a list of output ks to a string, supporting larger filesizes. */
public static String decompressInt(List<Integer> compressed) {
    // Build the dictionary.
    int dictSize = DSIZE;
    Map<Integer,String> dictionary = new HashMap<Integer,String>();
    for (int i = 0; i < DSIZE; i++)
        dictionary.put(i, "" + (char)i);

    String w = "" + (char)(int)compressed.remove(0);
    String result = w;
    for (int k : compressed) {
        String entry;
        if (dictionary.containsKey(k))
            entry = dictionary.get(k);
        else if (k == dictSize)
            entry = w + w.charAt(0);
        else
            throw new IllegalArgumentException("Bad compressed k: " + k);

        result += entry;

        // Add w+entry[0] to the dictionary.
        dictionary.put(dictSize++, w + entry.charAt(0));

        w = entry;
    }
    return result;
}

public static void main(String[] args) {

    String example = "";
    String s = "";
    int command = 0;

    //Check for correct argument
    if(args.length != 1) {
        System.out.println("Please enter 1 argument.\nArg1: Command ('encode', 'decode', 'encodeInt', 'decodeInt')\nAnd ensure that you are feeding in an input file and output file using '<' and '>'");
        System.exit(1);
    }
    if(args[0].equals("encode")){
        command = 1;
    }
    else if(args[0].equals("decode")){
        command = 2;
    }
    else if(args[0].equals("encodeInt")){
        command = 3;
    }
    else if(args[0].equals("decodeInt")){
        command = 4;
    }
    else {
        System.out.println("Please use either 'encode', 'decode', 'encodeInt', 'decodeInt' as the argument.");
        System.exit(1);
    }

    long start;
    long elapsedTime;

    //Compress
    if(command == 1){

        //Read input file
        s = BinaryStdIn.readString();

        //The actual compression
        start = System.nanoTime();
        List<Short> compressed = compress(s);
        elapsedTime = System.nanoTime() - start;

        //System.err.println(compressed);

        //first writes the number of ints to write
        BinaryStdOut.write(compressed.size());
        //writes compression (to file)
        Iterator<Short> compressIterator = compressed.iterator();
        while (compressIterator.hasNext()){
            BinaryStdOut.write(compressIterator.next());
        }

        System.err.println("LZW Encode time: " + elapsedTime + " ns");

    }
    //Decompress
    else if(command == 2){

        //Build Integer List with input
        List<Short> compressed = new ArrayList<Short>();
        int size = BinaryStdIn.readInt();
        while(size > 0){
            try{
                compressed.add(BinaryStdIn.readShort());
            }
            catch(RuntimeException e){
                System.err.print("*");
            }
            size--;
        }

        //System.err.println(compressed);

        //The actual decompression
        start = System.nanoTime();
        String decompressed = decompress(compressed);
        elapsedTime = System.nanoTime() - start;

        //Print out decompressed data (to file)
        System.out.println(decompressed);

        System.err.println("LZW Decode time: " + elapsedTime + " ns");

    }
    //Compress using Integer size
    else if(command == 3){

        //Read input file
        s = BinaryStdIn.readString();

        //The actual compression
        start = System.nanoTime();
        List<Integer> compressed = compressInt(s);
        elapsedTime = System.nanoTime() - start;

        //System.err.println(compressed);

        //first writes the number of ints to write
        BinaryStdOut.write(compressed.size());
        //writes compression (to file)
        Iterator<Integer> compressIterator = compressed.iterator();
        while (compressIterator.hasNext()){
            BinaryStdOut.write(compressIterator.next());
        }

        System.err.println("LZW Encode time: " + elapsedTime + " ns");

    }
    //Decompress using Integer size
    else if(command == 4){

        //Build Integer List with input
        List<Integer> compressed = new ArrayList<Integer>();
        int size = BinaryStdIn.readInt();
        while(size > 0){
            try{
                compressed.add(BinaryStdIn.readInt());
            }
            catch(RuntimeException e){
                System.err.print("*");
            }
            size--;
        }

        //System.err.println(compressed);

        //The actual decompression
        start = System.nanoTime();
        String decompressed = decompressInt(compressed);
        elapsedTime = System.nanoTime() - start;

        //Print out decompressed data (to file)
        System.out.println(decompressed);

        System.err.println("LZW Decode time: " + elapsedTime + " ns");

    }

    BinaryStdOut.close();


}
}

感谢任何帮助。谢谢。

【问题讨论】：

我不会为你调试你的应用程序，但是我之前已经编写了压缩例程，我会给你一个测试它的好方法：从非常小的文件开始。 1 个字符、2 个字符、3 个字符、4 个字符等。尝试重复字母和字母序列的变体。让每个测试都比上一个复杂一点。压缩每一个，解压缩，看看是否匹配。如果匹配，请继续下一个。如果没有，请找出问题所在。使用小型测试文件进行测试比使用 jpeg 更容易。
找到问题的最好方法是逐步增加。继续使文件变大，直到遇到问题。尝试不同的变化。也许这是导致问题的特定值序列。您甚至可以编写一个应用程序来创建大小（和复杂性）不断增加的文本文件，并让它自动压缩和解压缩数据，检查匹配并在遇到问题时准确告诉您。当您达到特定尺寸时，可能会发生这种情况。例如，截止可能是 255 字节或 256 字节，在这种情况下，您可能会遇到边界错误或关闭错误。
说“文本文件似乎编码很好”和“文件大小增加”是矛盾的。如果文件较大，那是因为其中包含原始文件中没有的字符。
不一定。如果你压缩它们，有些东西实际上不会变小。尝试压缩一个 1 字节的文件，它会大于 1 字节。这是因为解压缩需要元数据。
我不确定你的环境是什么。我使用 Visual Studio 和 C#，我可以执行“全部中断”，它可以任意停止执行，然后向您显示执行情况（每个线程的当前执行指令和调用堆栈等）。如果你的环境有这个，你可以试试。另一种可能性是将日志消息添加到您的代码中，以在执行的各个点写出信息。你可以看看类似 log4j 的东西

标签： algorithm encoding compression decode lzw

【解决方案1】：

即使是最好的压缩算法有时也会产生大于输入的输出。事实上，找到这样的输入是一个很好的测试用例。 LZW 通过查找重复序列进行压缩，因此没有任何重复序列的输入必然会变大。

我曾经不得不像这样创建一个测试输入。我认为它类似于“ABCD...ACBDEG...”。

编辑：现在我更仔细地查看了代码，我看到您正在将 Shorts 列表写入输出。这几乎肯定是错误的。必要的步骤之一是将每个输出标记打包成最少的位数，而您完全错过了该步骤。

从您的描述来看，代码也有其他问题，但现在就足够了。

【讨论】：

是的，这确实有道理。我做了一个有很多重复的测试，输出文件仍然更大。你有一个我可以尝试的示例输入，它肯定会给我一个更小的文件大小吗？或者它可能是一个不同的问题？编辑我用我在网上找到的一个短篇小说进行了尝试，它把文件大小减少了一半！如果讲师用小文件来测试这段代码，真的不行，LZW这样正常吗？