【问题标题】:How to create an array and fill from tree node variable如何创建数组并从树节点变量填充
【发布时间】:2016-03-19 13:45:34
【问题描述】:

我正在尝试从包含比我需要的更多数据的树节点(至少我认为是这样)传输数据。我很难操作树节点中的数据。我更希望有一个数组,它只为我提供数据操作所需的数据。

我希望更高的利率具有以下变量: 1. BookmarkNumber(整数) 2.日期(字符串) 3. DocumentType(字符串) 4. BookmarkPageNumberString(字符串) 5. BookmarkPageNumberInteger(整数)

我想从变量 book_mark 的数据中得到上面定义的速率(如我的代码所示)。

我已经为此苦苦挣扎了两天。任何帮助将非常感激。我可能确定问题的措辞不正确,所以请提出问题,以便我在需要时进一步解释。

非常感谢

顺便说一句,我要做的是创建一个 Windows 窗体程序,该程序将具有多个书签的 PDF 文件解析为每个书签/章节的离散 PDF 文件,同时使用正确的命名约定将书签保存在正确的文件夹中,文件夹和命名约定取决于被解析的书签/章节的 PDF 名称和标题名称。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using itextsharp.pdfa;
using iTextSharp.awt;
using iTextSharp.testutils;
using iTextSharp.text;
using iTextSharp.xmp;
using iTextSharp.xtra;

namespace WindowsFormsApplication1
{


    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }


        private void ChooseImageFileWrapper_Click(object sender, EventArgs e)
        {
            OpenFileDialog openFileDialog1 = new OpenFileDialog();
            openFileDialog1.InitialDirectory = GlobalVariables.InitialDirectory;
            openFileDialog1.Filter = "Pdf Files|*.pdf";
            openFileDialog1.RestoreDirectory = true;
            openFileDialog1.Title = "Image File Wrapper Chooser";

            if (openFileDialog1.ShowDialog() == DialogResult.OK)
            {
                try
                {
                    GlobalVariables.ImageFileWrapperPath = openFileDialog1.FileName;

                }
                catch (Exception ex)
                {
                    MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
                }
            }
            ImageFileWrapperPath.Text = GlobalVariables.ImageFileWrapperPath;
        }

        private void ImageFileWrapperPath_TextChanged(object sender, EventArgs e)
        {

        }


        private void button2_Click(object sender, EventArgs e)
        {
            iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(GlobalVariables.ImageFileWrapperPath);
            IList<Dictionary<string, object>> book_mark = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(pdfReader);

            List<ImageFileWrapperBookmarks> IFWBookmarks = new List<ImageFileWrapperBookmarks>();
            foreach (Dictionary<string, object> bk in book_mark) // bk is a single instance of book_mark
            {
                ImageFileWrapperBookmarks.BookmarkNumber = ImageFileWrapperBookmarks.BookmarkNumber + 1;
                foreach (KeyValuePair<string, object> kvr in bk) // kvr is the key/value in bk
                {
                    if (kvr.Key == "Kids" || kvr.Key == "kids")
                    {
                        //create recursive program for children
                    }
                    else if (kvr.Key == "Title" || kvr.Key == "title")
                    {

                    }
                    else if (kvr.Key == "Page" || kvr.Key == "page")
                    {

                    }

                }
            }

            MessageBox.Show(GlobalVariables.ImageFileWrapperPath);
        }
    }
}

【问题讨论】:

    标签: c# arrays pdf tree itextsharp


    【解决方案1】:

    这是解析 PDF 并创建类似于您所描述的数据结构的一种方法。先说数据结构:

    public class BookMark
    {
        static int _number;
        public BookMark() { Number = ++_number; }
        public int Number { get; private set; }
        public string Title { get; set; }
        public string PageNumberString { get; set; }
        public int PageNumberInteger { get; set; }
        public static void ResetNumber() { _number = 0; }
    
        // bookmarks title may have illegal filename character(s)
        public string GetFileName()
        {
            var fileTitle = Regex.Replace(
                Regex.Replace(Title, @"\s+", "-"), 
                @"[^-\w]", ""
            );
            return string.Format("{0:D4}-{1}.pdf", Number, fileTitle);
        }
    }
    

    创建Bookmark列表的方法(上):

    List<BookMark> ParseBookMarks(IList<Dictionary<string, object>> bookmarks)
    {
        int page;
        var result = new List<BookMark>();
        foreach (var bookmark in bookmarks)
        {
            // add top-level bookmarks
            var stringPage = bookmark["Page"].ToString();
            if (Int32.TryParse(stringPage.Split()[0], out page))
            {
                result.Add(new BookMark() {
                    Title = bookmark["Title"].ToString(),
                    PageNumberString = stringPage,
                    PageNumberInteger = page
                });
            }
    
            // recurse
            if (bookmark.ContainsKey("Kids"))
            {
                var kids = bookmark["Kids"] as IList<Dictionary<string, object>>;
                if (kids != null && kids.Count > 0)
                {
                    result.AddRange(ParseBookMarks(kids));
                }
            }
        }
        return result;
    }
    

    像这样调用上面的方法将结果转储到文本文件中:

    void DumpResults(string path)
    {
        using (var reader = new PdfReader(path))
        {
            // need this call to parse page numbers
            reader.ConsolidateNamedDestinations();
    
            var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
            var sb = new StringBuilder();
            foreach (var bookmark in bookmarks)
            {
                sb.AppendLine(string.Format(
                    "{0, -4}{1, -100}{2, -25}{3}",
                    bookmark.Number, bookmark.Title,
                    bookmark.PageNumberString, bookmark.PageNumberInteger
                ));
            }
            File.WriteAllText(outputTextFile, sb.ToString());
        }
    }
    

    更大的问题是如何将每个Bookmark 提取到一个单独的文件中。如果每个 Bookmark 开始一个新页面,这很容易:

    1. 遍历ParseBookMarks()的返回值
    2. 选择以当前BookMark.Number 开头并以下一个 BookMark.Number - 1 结尾的页面范围
    3. 使用该页面范围创建单独的文件。

    类似这样的:

    void ProcessPdf(string path)
    {
        using (var reader = new PdfReader(path))
        {
            // need this call to parse page numbers
            reader.ConsolidateNamedDestinations();
    
            var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
            for (int i = 0; i < bookmarks.Count; ++i)
            {
                int page = bookmarks[i].PageNumberInteger;
                int nextPage = i + 1 < bookmarks.Count
                    // if not top of page will be missing content
                    ? bookmarks[i + 1].PageNumberInteger - 1 
    
                    /* alternative is to potentially add redundant content:
                    ? bookmarks[i + 1].PageNumberInteger
                    */
    
                    : reader.NumberOfPages;
                string range = string.Format("{0}-{1}", page, nextPage);
    
                // DEMO!
                if (i < 10)
                {
                    var outputPath = Path.Combine(OUTPUT_DIR, bookmarks[i].GetFileName());
                    using (var readerCopy = new PdfReader(reader))
                    {
                        var number = bookmarks[i].Number;
                        readerCopy.SelectPages(range);
                        using (FileStream stream = new FileStream(outputPath, FileMode.Create))
                        {
                            using (var document = new Document())
                            {
                                using (var copy = new PdfCopy(document, stream))
                                {
                                    document.Open();
                                    int n = readerCopy.NumberOfPages;
                                    for (int j = 0; j < n; )
                                    {
                                        copy.AddPage(copy.GetImportedPage(readerCopy, ++j));
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    

    问题在于,所有书签都不太可能出现在 PDF 每一页的顶部。要了解我的意思,请尝试评论/取消评论 bookmarks[i + 1].PageNumberInteger 行。

    【讨论】:

      猜你喜欢
      • 2016-02-19
      • 2023-03-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-03-22
      • 2019-06-25
      • 2023-03-17
      • 2019-11-04
      相关资源
      最近更新 更多