Itext：如何检索pdf的未嵌入字体列表答案

【问题标题】：Itext: How to retrieve list of not embedded fonts of a pdfItext：如何检索pdf的未嵌入字体列表
【发布时间】：2015-08-24 14:30:08
【问题描述】：

我想检查 PDF 是否嵌入了所有字体。我按照How to check that all used fonts are embedded in PDF with Java iText? 中提到的编码进行了操作，但我仍然无法获得正确使用的字体列表。

查看我的示例 pdf：https://www.dropbox.com/s/anvm49vh87d8yqs/000024944.pdf?dl=0，编码完全不返回任何字体，但 acrobat 中的文档属性提到 Helvetica + Verdana（嵌入式子集）+ Verdana-Bold（嵌入式子集）。对于其他 pdf，我确实获得了 Verdana Embedded 子集，仅对于这些类型的 pdf，我无法获得字体列表。

由于我们必须处理来自内部作为外部来源的大量 pdf，我们需要能够嵌入字体以便打印它们。由于几乎不可能嵌入所有字体，我们只想嵌入常用字体，对于外来字体，我们会忽略 printrequest。

谁能帮我解决这个问题？谢谢

【问题讨论】：

正确链接到 pdf dropbox.com/s/anvm49vh87d8yqs/000024944.pdf?dl=0
我用 callas pdfToolbox 检查了你的文件（注意，我隶属于这个工具），它指出 Verdana 和 Verdana 粗体已嵌入（和子集）但 Helvetica 没有嵌入；这与 Adobe Acrobat 报告相同。
还有一个有点“题外话”的评论——您确实意识到即使嵌入标准字体也是一件危险的事情，对吧？无法保证您的字体副本与原始 PDF 文件创建者使用的字体相同，并且您最终可能会遇到不同的宽度，或者在嵌入字体时出现编码问题。
如果我将文本复制到 word 文档中，我找不到任何对 Helvetica 字体的引用，所以我猜它根本没有使用？我设法使用 Itext 以不同的方式获取字体（请参阅下面的回复）。它也根本没有返回 Helvetica。
Helvetica 被定义为/F2，并且只用于空文本字符串：() Tj % show text string（很多）。因此，是否“使用”取决于您对“使用”的定义。

标签： pdf fonts itext

【解决方案1】：

毕竟通过引用 BASEFONT 而不是 FONT 让它工作了：

/**
 * Creates a Set containing information about the fonts in the src PDF file.
 * @param src the path to a PDF file
 * @throws IOException
 */
public void listFonts(PdfReader reader,  Set<String> set) throws IOException {

    try {

        int n = reader.getXrefSize();
        PdfObject object;
        PdfDictionary font;

        for (int i = 0; i < n; i++) {
            object = reader.getPdfObject(i);
            if (object == null || !object.isDictionary()) {
                 continue;
            }

            font = (PdfDictionary)object;

            if (font.get(PdfName.BASEFONT) != null) {
                System.out.println("fontname " + font.getAsName(PdfName.BASEFONT).toString());
                processFont(font,set);

            }

        }


    } catch (Exception e) {
        System.out.println("error " + e.getMessage());
    }


}

/**
 * Finds out if the font is an embedded subset font
 * @param font name
 * @return true if the name denotes an embedded subset font
 */
private boolean isEmbeddedSubset(String name) {
    //name = String.format("%s subset (%s)", name.substring(8), name.substring(1, 7));
    return name != null && name.length() > 8 && name.charAt(7) == '+';
}

private void processFont(PdfDictionary font, Set<String> set) {

        **String name = font.getAsName(PdfName.BASEFONT).toString();**

        if(isEmbeddedSubset(name)) {
            return;
        }

        PdfDictionary desc = font.getAsDict(PdfName.FONTDESCRIPTOR);

        //nofontdescriptor
        if (desc == null) {
            System.out.println("desc null " );
            PdfArray descendant = font.getAsArray(PdfName.DESCENDANTFONTS);

            if (descendant == null) {
                System.out.println("descendant null " );
                set.add(name.substring(1));             
            }
            else {
                System.out.println("descendant not null " );
                for (int i = 0; i < descendant.size(); i++) {
                    PdfDictionary dic = descendant.getAsDict(i);
                    processFont(dic, set);                    
                  }             
            }            
        }
        /**
         * (Type 1) embedded
         */
        else if (desc.get(PdfName.FONTFILE) != null) {
            System.out.println("(TrueType) embedded ");
        }

        /**
         * (TrueType) embedded 
         */
        else if (desc.get(PdfName.FONTFILE2) != null) {
            System.out.println("(FONTFILE2) embedded ");
        }

        /**
         * " (" + font.getAsName(PdfName.SUBTYPE).toString().substring(1) + ") embedded" 
         */     
        else if (desc.get(PdfName.FONTFILE3) != null) {
            System.out.println("(FONTFILE3) ");
        }

        else {
            set.add(name.substring(1));         
        }


}

这给了我与 acrobat reader>properties 中的字体列表相同的结果

【讨论】：

【解决方案2】：

通过结合How to check that all used fonts are embedded in PDF with Java iText? 和http://itextpdf.com/examples/iia.php?id=288 的编码，我设法获得了一些结果。最初它不能作为 font.getAsName(PdfName.BASEFONT).toString();在我的情况下不起作用，但我做了一些小改动并得到了一些结果。

下面是我的代码：

/**
 * Creates a Set containing information about the fonts in the src PDF file.
 * @param src the path to a PDF file
 * @throws IOException
 */
public void listFonts(PdfReader reader,  Set<String> set) throws IOException {

    int n = reader.getXrefSize();
    PdfObject object;
    PdfDictionary font;

    for (int i = 0; i < n; i++) {
         object = reader.getPdfObject(i);
         if (object == null || !object.isDictionary()) {
             continue;
         }

         font = (PdfDictionary)object;

         if (font.get(PdfName.FONTNAME) != null) {

            System.out.println("fontname " + font.get(PdfName.FONTNAME));
            processFont(font,set);

         }
    }
}

/**
 * Finds out if the font is an embedded subset font
 * @param font name
 * @return true if the name denotes an embedded subset font
 */
private boolean isEmbeddedSubset(String name) {
    //name = String.format("%s subset (%s)", name.substring(8), name.substring(1, 7));
    return name != null && name.length() > 8 && name.charAt(7) == '+';
}

private void processFont(PdfDictionary font, Set<String> set) {

    String name = font.get(PdfName.FONTNAME).toString();

    if(isEmbeddedSubset(name)) {
        return;
    }

    PdfDictionary desc = font.getAsDict(PdfName.FONTDESCRIPTOR);

    //nofontdescriptor
    if (desc == null) {
        System.out.println("desc null " );
        PdfArray descendant = font.getAsArray(PdfName.DESCENDANTFONTS);

        if (descendant == null) {
            System.out.println("descendant null " );
            set.add(name.substring(1));             
        }
        else {
            System.out.println("descendant not null " );
            for (int i = 0; i < descendant.size(); i++) {
                PdfDictionary dic = descendant.getAsDict(i);
                processFont(dic, set);                    
            }             
        }            
     }
     /**
      * (Type 1) embedded
     */
     else if (desc.get(PdfName.FONTFILE) != null) {
         System.out.println("(TrueType) embedded ");
     }

     /**
      * (TrueType) embedded 
     */
     else if (desc.get(PdfName.FONTFILE2) != null) {
         System.out.println("(FONTFILE2) embedded ");
     }

     /**
     * " (" + font.getAsName(PdfName.SUBTYPE).toString().substring(1) + ") embedded" 
     */     
     else if (desc.get(PdfName.FONTFILE3) != null) {
         System.out.println("(FONTFILE3) ");
     }

     else {
         set.add(name.substring(1));         
     }
 }

}

所以不要使用 String name = font.getAsName(PdfName.BASEFONT).toString();我把它改成了 String name = font.get(PdfName.FONTNAME).toString();

这肯定会得到一些更好的结果，因为它给了我不同的字体。但是我没有得到字体描述符和后代字体的结果。或者它们在我的 pdf 中根本不可用，或者因为我改变了编码，我永远不会在那里结束。我可以假设是否找到了嵌入字体的子集，如果字体名称中没有可用的子集，我可以假设没有嵌入字体吗？

【讨论】：

这不是答案，对吧？您可以使用编辑按钮更改您的初始帖子。我不确定您在底部的新问题是否符合您在上面提出的问题的逻辑，或者最好将其作为全新的问题发布。
这部分是一个答案，因为我得到了一些更好的结果。不过还是有一些疑惑，主要是因为我不完全理解子集和字体类型的使用。