【问题标题】:Apache FOP. Issue w/ cyrrilic characters阿帕奇 FOP。带有/西里尔字符的问题
【发布时间】:2016-11-03 09:33:27
【问题描述】:

我在 Java8 项目中使用 Apache FOP 库生成了一些 pdf 文件。英文内容显示没有任何问题,但俄文字符很奇怪。它们看起来像这样:Ð#огР̧н

这里的问题似乎与编码有关,但我该如何解决呢?

这是我用来生成 pdf 的类:

public class PdfGenerationTools implements StreamResource.StreamSource
    {
    String content;

    public PdfGenerationTools(String content) {
        this.content = content;
    }

    @Override
    public InputStream getStream()
    {
        ByteArrayInputStream foStream =
                new ByteArrayInputStream(content.getBytes(StringTools.UTF8));

        // Basic FOP configuration. You could create this object
        // just once and keep it.
        FopFactory fopFactory = FopFactory.newInstance();
        fopFactory.setStrictValidation(false); // For an example

        // Configuration for this PDF document - mainly metadata
        FOUserAgent userAgent = getFOUserAgent(fopFactory);

        // Transform to PDF
        ByteArrayOutputStream fopOut = new ByteArrayOutputStream();
        try {
            Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF,
                    userAgent, fopOut);
            TransformerFactory factory =
                    TransformerFactory.newInstance();
            Transformer transformer = factory.newTransformer();
            Source src = new
                    javax.xml.transform.stream.StreamSource(foStream);
            Result res = new SAXResult(fop.getDefaultHandler());
            transformer.transform(src, res);
            fopOut.close();
            return new ByteArrayInputStream(fopOut.toByteArray());

        } catch (Exception e) {
            e.printStackTrace();
        }

        return null;
    }

    private FOUserAgent getFOUserAgent(FopFactory factory)
    {
        FOUserAgent userAgent = factory.newFOUserAgent();

        userAgent.setProducer("Company");
        userAgent.setCreationDate(new Date());
        userAgent.setTitle("Printing jobs");
        userAgent.setTargetResolution(300); // DPI

        return userAgent;
    }

    public static String initDoc()
    {
        return "<?xml version='1.0' encoding='ISO-8859-1'?>"+
                "<fo:root xmlns:fo='http://www.w3.org/1999/XSL/Format'>"+
                "<fo:layout-master-set>"+
                "<fo:simple-page-master master-name='A4' margin='2cm'>"+
                "<fo:region-body />"+
                "</fo:simple-page-master>"+
                "</fo:layout-master-set>"+
                "<fo:page-sequence master-reference='A4'>"+
                "<fo:flow flow-name='xsl-region-body'>";
    }

    public static String closeDoc()
    {
        return "</fo:flow>"+
                "</fo:page-sequence>"+
                "</fo:root>";
    }

    public static String initTable()
    {
        return "<fo:block space-before.optimum=\"10pt\"></fo:block>" +
                "<fo:table table-layout=\"fixed\" border-width=\"1mm\" border-style=\"solid\">" +
                "<fo:table-column column-number=\"1\" column-width=\"50%\"/>" +
                "<fo:table-column column-number=\"2\" column-width=\"50%\"/>" +
                "<fo:table-body>";
    }

    public static String closeTable()
    {
        return "</fo:table-body>" +
                "</fo:table>";
    }

    public static String initTableRow()
    {
        return "<fo:table-row keep-together.within-page=\"always\">";
    }

    public static String closeTableRow()
    {
        return  "</fo:table-row>";
    }

    public static String getCell(String ... args)
    {
        final StringBuilder sb = new StringBuilder();
        sb.append("<fo:table-cell padding=\"1mm\" border-width=\"1mm\" border-style=\"double\">");

        for (String arg : args)
        {
            sb.append("<fo:block font-family=\"SansSerif\">")
                    .append(arg)
                    .append("</fo:block>");
        }

        sb.append("</fo:table-cell>");

        return sb.toString();
    }
}

当我将编码从“ISO-8859-1”更改为“UTF-8”时,我的西里尔子字符串 看起来像这样:'#####'。看来我在这里缺少字体..

【问题讨论】:

  • 这看起来像是多字节 UTF-8 被视为某种单字节 ISO/Windows 编码。剩下的做一些小测试,比如javaranch.com/journal/200409/…
  • 这可能是字体配置问题(this answer of mine 可能会派上用场)或编码问题。添加带有西里尔字符的小型 FO sn-p 有助于获得答案,否则无法尝试重现您的问题(请参阅 MCVE)。
  • 我在上面添加了一个代码 sn-p 来展示我是如何生成 pdf 内容的

标签: java pdf pdf-generation apache-fop


【解决方案1】:

您必须使用 FOP 的配置文件,该配置文件指示您的字体要嵌入到 PDF 文档中,例如:

<?xml version="1.0" encoding="UTF-8"?>
<fop version='1.0'>
    <renderers>
        <renderer mime='application/pdf'>
            <fonts>
                <!-- TTF fonts -->
                <font kerning='yes' embed-url='c:\windows\fonts\arial.ttf'>
                    <font-triplet name='Arial' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\arialbd.ttf'>
                    <font-triplet name='Arial' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\ariali.ttf'>
                    <font-triplet name='Arial' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\arialbi.ttf'>
                    <font-triplet name='Arial' style='italic' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\times.ttf'>
                    <font-triplet name='TimesNewRoman' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\timesbd.ttf'>
                    <font-triplet name='TimesNewRoman' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\timesi.ttf'>
                    <font-triplet name='TimesNewRoman' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\timesbi.ttf'>
                    <font-triplet name='TimesNewRoman' style='italic' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\cour.ttf'>
                    <font-triplet name='CourierNew' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\courbd.ttf'>
                    <font-triplet name='CourierNew' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\couri.ttf'>
                    <font-triplet name='CourierNew' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\courbi.ttf'>
                    <font-triplet name='CourierNew' style='italic' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdana.ttf'>
                    <font-triplet name='Verdana' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdanab.ttf'>
                    <font-triplet name='Verdana' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdanai.ttf'>
                    <font-triplet name='Verdana' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdanaz.ttf'>
                    <font-triplet name='Verdana' style='italic' weight='bold' />
                </font>
            </fonts>
        </renderer>
    </renderers>
</fop>

使用方法:

// configure fopFactory as desired
FopFactory fopFactory = FopFactory.newInstance();
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
fopFactory.setUserConfig(new File("fop.xml"));

【讨论】:

  • 最后我回到了这个问题。问题是我在 Ubuntu 14 下工作。所以这里没有 MS 字体(
  • 您可以使用任何包含西里尔字符的字体。您还可以在 Ubuntu 中设置 MS 字体。打开 Ubuntu 软件中心并搜索“ttf-mscorefonts-installer”。这将安装微软的核心字体。
猜你喜欢
  • 2014-08-22
  • 1970-01-01
  • 2020-10-10
  • 2010-11-02
  • 1970-01-01
  • 1970-01-01
  • 2020-10-12
  • 2015-05-19
  • 2020-08-14
相关资源
最近更新 更多