将 pdf 的部分内容渲染为图像答案

【问题标题】：Render partial content of pdf to image将 pdf 的部分内容渲染为图像
【发布时间】：2014-10-06 05:49:51
【问题描述】：

是否有任何工具可以将 pdf 文档呈现为具有部分内容的图像？例如，只有文本但没有图像和矢量，或者只有图像和矢量但没有文本。

【问题讨论】：

它需要是ghostscript还是你也准备做一点Java编程？
欢迎提出任何建议。
Apache Java 库 PDFBox 包含用于呈现 PDF 页面的代码（与当前的 1.8.x 版本相比，当前的 2.0.0 开发快照有很大改进）。这段代码本质上调用了PageDrawer 类。您可以相当简单地调整该类以仅绘制您选择的东西。

【解决方案1】：

执行此操作的“传统”方法是预处理 PDF 文件，以便只保留您想要的元素，然后栅格化剩余的文件。

举个例子，我已经实现了 PDF 到 iPad 的工作流程，其中 callas pdfToolbox（注意，我与这家公司有联系）用于将 PDF 文件拆分为文本文件和“除文本之外的任何内容”文件。之后，“除文本之外的任何内容”文件被光栅化，这两个文件被重新组合。

因此，无论您要使用哪种工具，我都会看到该工具如何预处理文件以删除无用的元素，或者它如何拆分出您想要的文件。然后使用该工具的正常光栅化功能。

【讨论】：

【解决方案2】：

使用Debenu Quick PDF Library，您可以通过两种方式进行提取：

1.PDF2Image只是文本，没有图像

DPL.LoadFromFile("my_file.pdf", "");
int image_count = DPL.FindImages();  //number of embedded images
for(int i=0; i<=image_count; i++)
{
    DPL.ClearImage(i);  //clear the images
}
DPL.RenderageToFile(72, 1, 0, "just_text.bmp"); //save the file to image, without the images

这里是函数列表： http://www.debenu.com/docs/pdf_library_reference/ImageHandling.php

2.PDF2Image 只是文本，没有图像

DPL.LoadFromFile("my_file.pdf", "");
DPL.GetPageText(3); //this returns CSV string with the cordinates of the text

//create new blank file
//XPos is the horizontal position of the text - get it from the CSV string
//YPos is the vertical position of the text - get it from the CSV string
//your_text is the text to draw - get it from the CSV string
DPL.DrawText(XPos, YPos, your_text);
DPL.RenderageToFile(72, 1, 0, "just_text.bmp"); //save the file to image, without the images

【讨论】：