【发布时间】:2016-11-03 09:33:27
【问题描述】:
我在 Java8 项目中使用 Apache FOP 库生成了一些 pdf 文件。英文内容显示没有任何问题,但俄文字符很奇怪。它们看起来像这样:Ð#огР̧н。
这里的问题似乎与编码有关,但我该如何解决呢?
这是我用来生成 pdf 的类:
public class PdfGenerationTools implements StreamResource.StreamSource
{
String content;
public PdfGenerationTools(String content) {
this.content = content;
}
@Override
public InputStream getStream()
{
ByteArrayInputStream foStream =
new ByteArrayInputStream(content.getBytes(StringTools.UTF8));
// Basic FOP configuration. You could create this object
// just once and keep it.
FopFactory fopFactory = FopFactory.newInstance();
fopFactory.setStrictValidation(false); // For an example
// Configuration for this PDF document - mainly metadata
FOUserAgent userAgent = getFOUserAgent(fopFactory);
// Transform to PDF
ByteArrayOutputStream fopOut = new ByteArrayOutputStream();
try {
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF,
userAgent, fopOut);
TransformerFactory factory =
TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
Source src = new
javax.xml.transform.stream.StreamSource(foStream);
Result res = new SAXResult(fop.getDefaultHandler());
transformer.transform(src, res);
fopOut.close();
return new ByteArrayInputStream(fopOut.toByteArray());
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
private FOUserAgent getFOUserAgent(FopFactory factory)
{
FOUserAgent userAgent = factory.newFOUserAgent();
userAgent.setProducer("Company");
userAgent.setCreationDate(new Date());
userAgent.setTitle("Printing jobs");
userAgent.setTargetResolution(300); // DPI
return userAgent;
}
public static String initDoc()
{
return "<?xml version='1.0' encoding='ISO-8859-1'?>"+
"<fo:root xmlns:fo='http://www.w3.org/1999/XSL/Format'>"+
"<fo:layout-master-set>"+
"<fo:simple-page-master master-name='A4' margin='2cm'>"+
"<fo:region-body />"+
"</fo:simple-page-master>"+
"</fo:layout-master-set>"+
"<fo:page-sequence master-reference='A4'>"+
"<fo:flow flow-name='xsl-region-body'>";
}
public static String closeDoc()
{
return "</fo:flow>"+
"</fo:page-sequence>"+
"</fo:root>";
}
public static String initTable()
{
return "<fo:block space-before.optimum=\"10pt\"></fo:block>" +
"<fo:table table-layout=\"fixed\" border-width=\"1mm\" border-style=\"solid\">" +
"<fo:table-column column-number=\"1\" column-width=\"50%\"/>" +
"<fo:table-column column-number=\"2\" column-width=\"50%\"/>" +
"<fo:table-body>";
}
public static String closeTable()
{
return "</fo:table-body>" +
"</fo:table>";
}
public static String initTableRow()
{
return "<fo:table-row keep-together.within-page=\"always\">";
}
public static String closeTableRow()
{
return "</fo:table-row>";
}
public static String getCell(String ... args)
{
final StringBuilder sb = new StringBuilder();
sb.append("<fo:table-cell padding=\"1mm\" border-width=\"1mm\" border-style=\"double\">");
for (String arg : args)
{
sb.append("<fo:block font-family=\"SansSerif\">")
.append(arg)
.append("</fo:block>");
}
sb.append("</fo:table-cell>");
return sb.toString();
}
}
当我将编码从“ISO-8859-1”更改为“UTF-8”时,我的西里尔子字符串 看起来像这样:'#####'。看来我在这里缺少字体..
【问题讨论】:
-
这看起来像是多字节 UTF-8 被视为某种单字节 ISO/Windows 编码。剩下的做一些小测试,比如javaranch.com/journal/200409/…
-
这可能是字体配置问题(this answer of mine 可能会派上用场)或编码问题。添加带有西里尔字符的小型 FO sn-p 有助于获得答案,否则无法尝试重现您的问题(请参阅 MCVE)。
-
我在上面添加了一个代码 sn-p 来展示我是如何生成 pdf 内容的
标签: java pdf pdf-generation apache-fop