在内存中解压缩 *.docx 文件而不写入磁盘 - Java答案

【问题标题】：Unzip *.docx file in memory without write to disk - Java在内存中解压缩 *.docx 文件而不写入磁盘 - Java
【发布时间】：2015-06-01 11:38:35
【问题描述】：

我想在内存中解压缩 *.docx 文件而不将输出写入磁盘。我找到了以下实现，但它只允许读取压缩文件而不能查看目录结构。知道目录树中每个文件的位置对我来说很重要。谁能给我一个方向？

private static void UnzipFileInMemory() {
    try {
        ZipFile zf = new ZipFile("d:\\a.docx");

        int i = 0;
        for (Enumeration e = zf.entries(); e.hasMoreElements();) {
            InputStream in = null;
            try {
                ZipEntry entry = (ZipEntry) e.nextElement();
                System.out.println(entry);
                in = zf.getInputStream(entry);
            } catch (IOException ex) {
                //Logger.getLogger(Tester.class.getName()).log(Level.SEVERE, null, ex);
            } finally {
                try {
                    in.close();
                } catch (IOException ex) {
                    //Logger.getLogger(Tester.class.getName()).log(Level.SEVERE, null, ex);
                }
            }

        }
    } catch (IOException ex) {
        //Logger.getLogger(Tester.class.getName()).log(Level.SEVERE, null, ex);
    }
}

【问题讨论】：

标签： java zip extract unzip in-memory

【解决方案1】：

使用 ZipInputStream ：本例中的 zEntry 为您提供文件位置。

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class unzip {

    public static void main(String[] args) {

        String filePath = "D:/Tmp/Tmp.zip";
        String oPath = "D:/Tmp/";

        new unzip().unzipFile(filePath, oPath);
    }

    public void unzipFile(String filePath, String oPath) {

        FileInputStream fis = null;
        ZipInputStream zipIs = null;
        ZipEntry zEntry = null;
        try {
            fis = new FileInputStream(filePath);
            zipIs = new ZipInputStream(new BufferedInputStream(fis));
            while ((zEntry = zipIs.getNextEntry()) != null) {
                try {                   
                    FileOutputStream fos = null;
                    String opFilePath = oPath + zEntry.getName();
                    fos = new FileOutputStream(opFilePath);
                    System.out.println(zEntry.getName());

                    fos.flush();
                    fos.close();
                } catch (Exception ex) {

                }
            }
            zipIs.close();
            fis.close();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

【讨论】：

【解决方案2】：

您将 zip 格式文件关联为虚拟文件系统 (FileSystem)。对于那个 java 已经有一个协议处理程序，用于jar:file://...。所以你必须在File.toURI() 前面加上"jar:"。

URI docxUri = ,,, // "jar:file:/C:/... .docx"
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
    Path documentXmlPath = zipFS.getPath("/word/document.xml");

现在您可以在真实磁盘文件系统和 zip 之间使用 Files.delete() 或 Files.copy。

使用 XML 时：

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.parse(Files.newInputStream(documentXmlPath));
//Element root = doc.getDocumentElement();

然后您可以使用 XPath 来查找这些位置，并重新写回 XML。

你甚至可能不需要 XML，但可以替换占位符：

byte[] content = Files.readAllBytes(documentXmlPath);
String xml = new String(content, StandardCharsets.UTF_8);
xml = xml.replace("#DATE#", "2014-09-24");
xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
...
content = xml.getBytes(StandardCharsets.UTF_8);
Files.delete(documentXmlPath);
Files.write(documentXmlPath, content);

为了快速开发，将 .docx 的副本重命名为具有 .zip 文件扩展名的名称，然后检查文件。

【讨论】：

【解决方案3】：

只需在循环中添加文件检查代码：

if (!entry.isDirectory()) // Alternatively: if(entry.getName().contains("."))
    System.out.println(entry);

【讨论】：