【发布时间】:2014-05-05 18:38:05
【问题描述】:
我能够读取 DOC 文件并获取其字数,但它是错误的。
我的代码:
public class WordCounter {
public static void main(String[] args) throws Throwable {
processDOC();
}
private static void processDOC() throws Throwable {
File file = new File("/Users/yjiang/Desktop/whatever.doc");
File file2 = new File("/Users/yjiang/Desktop/Test.docx");
File file3 = new File("/Users/yjiang/Desktop/QB Tests 4-14-2014.xls");
File file4 = new File("/Users/yjiang/Desktop/QB Tests 4-14-2014.xlsx");
try {
FileInputStream fs = new FileInputStream(file);
POIFSFileSystem poifsFileSystem = new POIFSFileSystem(fs);
DirectoryEntry directoryEntry = poifsFileSystem.getRoot();
DocumentEntry documentEntry = (DocumentEntry) directoryEntry.getEntry(SummaryInformation.DEFAULT_STREAM_NAME);
DocumentInputStream dis = new DocumentInputStream(documentEntry);
PropertySet ps = new PropertySet(dis);
SummaryInformation si = new SummaryInformation(ps);
System.out.println(si.getWordCount());
} catch (Exception e) {
e.printStackTrace();
}
try {
HWPFDocument hwpfDocument = new HWPFDocument(new FileInputStream(file));
System.out.println(hwpfDocument.getDocProperties().getCWords()); // actually 71 words using word count in MSWord, returned 57.
System.out.println(hwpfDocument.getDocProperties().getCWordsFtnEnd());
XWPFDocument xwpfDocument = new XWPFDocument(new FileInputStream(file2)); // actually 71 words using word count in MSWord, returned 57.
System.out.println(xwpfDocument.getProperties().getExtendedProperties().getUnderlyingProperties().getWords());
System.out.println();
} catch (Exception e) {
e.printStackTrace();
}
}
}
“whatever.doc”有 71 个单词,当我运行它时,它只返回 57 个。
似乎我不能使用相同的方法来读取 DOCX 文件,当我运行它时,我得到以下信息:
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
可以举个例子吗?
【问题讨论】:
-
如果您要求 Word 更新文档统计信息,那么 POI 是否会看到正确的值?
-
如何获取更新文档统计信息?顺便说一句,使用 mac。
标签: java apache apache-poi docx doc