【问题标题】:Insert different text on different pages of a multi page pdf from command line从命令行在多页 pdf 的不同页面上插入不同的文本
【发布时间】:2013-06-16 07:09:43
【问题描述】:

我有一个包含多个名称的文本文件,例如

John Doe
Jane Doe
Mike Miller
...

还有一个 pdf 文件,其页数与文本文件中的名称一样多。

如何在第一页插入/粘贴名字,在第二页插入/粘贴第二个名字等等?我必须从 Linux 服务器上的命令行执行此操作。

【问题讨论】:

    标签: linux pdf command-line ghostscript


    【解决方案1】:

    这个程序对我有用,但我没有做过任何很好的测试。我使用了一个包含 2 个空白页的 PDF 文件和一个包含 2 行文本的文本文件。您将需要修改 InsertText 过程以移动到正确的位置。您还需要修改字体选择和大小(在程序中搜索 findfont)

    这是一个示例用法:

    gs -sDEVICE=pdfwrite    \
       -sOutputFile=out.pdf \
       -sPDF_File=test.pdf  \
       -sText_File=test.txt \
        textinsert.ps
    

    请注意,这完全是 Ghostscript 特定的,绝对不适用于任何其他 PostScript/PDF 消费者。

    %!PS
    %% Copyright (C) Ken Sharp and Artifex Software Inc All rights reserved.
    %% Permission granted to use, copy, modify and redistribute provided this
    %% copyright remains intact. No warranty express or implied.
    %%
    %% Process a PDF file and a text file where one line of text from the text
    %% file is drawn onto a page of the PDF file.
    %% Redefine showpage so that we can draw the PDF page
    %% without emitting it
    %%s
    /TextInsertDict 20 dict dup 3 1 roll def begin
    
    /InsertText {
      %% Read a line of text from the text file
      TFile 256 string readline pop
    
      %% Move to the desired location and draw the text
      100 100 moveto show
    } bind def
    
    /Textpdfshowpage{
       save /PDFSave exch store
       /PDFdictstackcount countdictstack store
       /PDFexecstackcount count 2 sub store
       (before exec) VMDEBUG
    
       % set up color space substitution (this must be inside the page save)
       pdfshowpage_setcspacesub
    
      .writepdfmarks {
    
            % Copy the boxes.
        { /CropBox /BleedBox /TrimBox /ArtBox } {
          2 copy pget {
            % .pdfshowpage_Install transforms the user space do same here with the boxes
            oforce_elems
            2 { Page pdf_cached_PDF2PS_matrix transform 4 2 roll } repeat
            normrect_elems 4 index 5 1 roll fix_empty_rect_elems 4 array astore
            mark 3 1 roll /PAGE pdfmark
          } {
            pop
          } ifelse
        } forall
    
            % Copy annotations and links.
        dup /Annots knownoget {
          0 1 2 index length 1 sub
           { 1 index exch oget
             dup type /dicttype eq {
               dup /Subtype oget annottypes exch .knownget { exec } { pop } ifelse
             } {
               pop
             } ifelse
           }
          for pop
        } if
    
      } if      % end .writepdfmarks
    
            % Display the actual page contents.
       8 dict begin
       /BXlevel 0 def
       /BMClevel 0 def
       /OFFlevels 0 dict def
       /BGDefault currentblackgeneration def
       /UCRDefault currentundercolorremoval def
            %****** DOESN'T HANDLE COLOR TRANSFER YET ******
       /TRDefault currenttransfer def
      matrix currentmatrix 2 dict
      2 index /CropBox pget {
        oforce_elems normrect_elems boxrect
        4 array astore 1 index /ClipRect 3 -1 roll put
      } if
      dictbeginpage setmatrix
      /DefaultQstate qstate store
    
      count 1 sub /pdfemptycount exch store
            % If the page uses any transparency features, show it within
            % a transparency group.
      dup pageusestransparency dup /PDFusingtransparency exch def {
        % Show the page within a PDF 1.4 device filter.
        0 .pushpdf14devicefilter {
          /DefaultQstate qstate store       % device has changed -- reset DefaultQstate
          % If the page has a Group, enclose contents in transparency group.
          % (Adobe Tech Note 5407, sec 9.2)
          dup /Group knownoget {
            1 index /CropBox pget {
              /CropBox exch
            } {
              1 index get_media_box pop /MediaBox exch
            } ifelse
            oforce_elems normrect_elems fix_empty_rect_elems 4 array astore .beginformgroup 
            showpagecontents
            .endtransparencygroup
          } {
            showpagecontents
          } ifelse
        } stopped {
          % todo: discard
          .poppdf14devicefilter
          /DefaultQstate qstate store   % device has changed -- reset DefaultQstate
          stop
        } if .poppdf14devicefilter
        /DefaultQstate qstate store % device has changed -- reset DefaultQstate
      } {
        showpagecontents
      } ifelse
      .free_page_resources
    
      InsertText
    
      % todo: mixing drawing ops outside the device filter could cause
      % problems, for example with the pnga device.
      endpage
      end           % scratch dict
      % Some PDF files don't have matching q/Q (gsave/grestore) so we need
      % to clean up any left over dicts from the dictstack
    
      PDFdictstackcount //false
      { countdictstack 2 index le { exit } if
        currentdict /n known not or
        end
      } loop {
        StreamRunAborted not {
          (   **** Warning: File has unbalanced q/Q operators \(too many q's\)\n)
          pdfformaterror
        } if
      } if
      pop
      count PDFexecstackcount sub { pop } repeat
      (after exec) VMDEBUG
      Repaired      % pass Repaired state around the restore
      PDFSave restore
      /Repaired exch def
    } bind def
    
    %% Check both our arguments are defined leaves true on the
    %% stack if so, false otherwise
    %%
    {PDF_File} stopped
    {
      (No PDF_File defined\n) print
      false
    }
    {
      {Text_File} stopped
      {
        pop (No Text-File defined\n) print
        false
      }
      {
        pop pop true
      }ifelse
    }ifelse
    
    {
    
      %% First find the number of lines of text in the text file
      %%
      /TextLineCount 0 def
      Text_File (r) file dup
      {
        dup 256 string readline
        {
          pop /TextLineCount TextLineCount 1 add def
        }
        {
          pop exit
        } 
        ifelse
      } loop
      closefile
    
      %% First find the number of pages in the PDF file
      %%
      PDF_File (r) file
      runpdfbegin pdfpagecount TextLineCount eq {
        runpdfend true
      }
      {
        runpdfend false
      } ifelse
      exch
      closefile
    
      %% If the number of pages is the same as the number of lines
      %% the we process the files, otherwise warn and exiot
      %%
      {
        %% Select font and size
        %%
        /Times-Roman findfont 20 scalefont setfont
    
        %% Open the text file agaiin
        %%
        /TFile Text_File (r) file def
    
        %% Open the PDF file and begin PDF processing
        PDF_File (r) file
        runpdfbegin
    
        %% For each page....
        %%
        1 1 TextLineCount {
          %% draw the content of this page
          %%
          pdfgetpage dup /Page exch store
          pdfshowpage_init
          pdfshowpage_setpage
          Textpdfshowpage
        } for
    
        %% Terminate PDF processing
        %%
        runpdfend
    
        %% Close the text file
        %%
        TFile closefile
      } 
      {
        (Warning, Number of pages not equal to the number of text lines, aborting!\n) print flush
      }ifelse
    }
    {
      (Incorrect usage\n) print
      (Usage: \n) print
      (gs -sDEVICE=pdfwrite -o <outputfile> -sPDF-File=<PDF file> -sText_File=<Text file> textinsert.ps\n) print
      (NB all switches are case-sensitive\n) print
    } ifelse
    end
    

    【讨论】:

      【解决方案2】:

      您可以通过使用 Ghostscript 在 PostScript 中编程来实现。您需要先找到 PDF 文件中的页数或文本文件中的行数,您可能想检查它们是否相同。

      使用 Ghostscript pdfwrite 设备,从 PDF 文件执行页面描述,然后从文本文件中读取文本。在内容上正确定位当前点,选择合适的字体和大小,并显示文本。然后执行 showpage 来渲染页面。

      您可以获得一个包含所有页面的大型 PDF 文件,或每页一个 PDF 文件。

      请注意,这不是不熟悉 PostScript 编程的人的任务。

      【讨论】:

      • 谢谢,但是你能给我一个示例脚本吗?我已经有一个带有 x 页的 pdf 文件和一个带有 x 行的文本文件。所以我只需要将第 n 行的 xt 放到 pdf 文件的第 n 页的位置 (a,b)。
      • @user2502041:您现在至少可以接受 KenS 对脚本的回答,并为这两个答案投票 :)
      猜你喜欢
      • 2014-05-07
      • 2017-03-11
      • 1970-01-01
      • 2011-11-15
      • 2020-01-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-28
      相关资源
      最近更新 更多