【问题标题】:how to read an XML file and grab all URL links in it and save them to a TXT file ? C#如何读取 XML 文件并获取其中的所有 URL 链接并将它们保存到 TXT 文件? C#
【发布时间】:2020-08-02 05:25:57
【问题描述】:

我有一个 .XML 文件(这是我的程序制作的日志),其中包含以下文本:

<?xml version="1.0" encoding="utf-8"?>
<PsnRecords>
  <PsnRecord>
    <Names></Names>
    <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA05330_00/108/f_acb1a312a982305e284718898b3dade6afb395e6718d836b1d7b1e1aa1873800/f/EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100-DP.pkg</PsnUrl>
    <LocalUrl>C:\Users\Betrisa\Desktop\Shared\EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100-DP.pkg</LocalUrl>
    <isLixian>false</isLixian>
    <LixianUrl></LixianUrl>
  </PsnRecord>
  <PsnRecord>
    <Names></Names>
    <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA05330_00/108/f_acb1a312a982305e284718898b3dade6afb395e6718d836b1d7b1e1aa1873800/f/EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100.pkg?downloadId=0000015b&amp;du=000000000000015b00e26bd28904ee7f&amp;product=0187&amp;serverIpAddr=192.168.137.1&amp;r=00000000</PsnUrl>
    <LocalUrl></LocalUrl>
    <isLixian>false</isLixian>
    <LixianUrl></LixianUrl>
  </PsnRecord>
  <PsnRecord>
    <Names></Names>
    <PsnUrl>http://ic.97f46e00.060798.gs2.sonycoment.loris-e.llnwd.net/gs2/ppkgo/prod/CUSA05330_00/108/f_acb1a312a982305e284718898b3dade6afb395e6718d836b1d7b1e1aa1873800/f/EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100.pkg?downloadId=0000015b&amp;du=000000000000015b00e26bd28904ee7f&amp;product=0187&amp;serverIpAddr=192.168.137.1&amp;r=00000001</PsnUrl>
    <LocalUrl></LocalUrl>
    <isLixian>false</isLixian>
    <LixianUrl></LixianUrl>
  </PsnRecord>
</PsnRecords>

我想获取所有 URL 链接并将它们保存到 .TXT 文件中。 我尝试了 2 种方法,但没有奏效:

方式1:使用Split(结果为:Url)

        private void button1_Click(object sender, EventArgs e)
        {
            string paths = Application.StartupPath + @"\DataFiles\DataHistory.xml";
            string resPaths = Application.StartupPath + @"\DataFiles\Links.txt";
            StreamWriter urlsWrite = File.CreateText(resPaths);


            var text = System.IO.File.ReadAllText(paths);
            var links = text.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("<PsnUrl>http://") || s.StartsWith("<PsnUrl>https://"));

            foreach (string s in links)
            {
            urlsWrite.WriteLine(s);     
            }
            
        }

方式2:使用正则表达式(结果什么都没有!!)

        private void button1_Click(object sender, EventArgs e)
        {
            string paths = Application.StartupPath + @"\DataFiles\DataHistory.xml";
            string resPaths = Application.StartupPath + @"\DataFiles\Links.txt";
            StreamWriter urlsWrite = File.CreateText(resPaths);


            var text = System.IO.File.ReadAllText(paths);
            var regex = new Regex(@"\b(?:http?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
            MatchCollection mactches = regex.Matches(text);
            
            foreach (string matc in links)
            {
            text = text.Replace(matc.Value, "<PsnUrl>"+matc.Value+"</PsnUrl>");
            urlsWrite.WriteLine(mats);     
            }
        }

我想要一个包含干净 URL 的 .TXT 文件,例如:

https://xxxxxxxxxxxxxx
http://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx

我做错了什么?

【问题讨论】:

  • 使用一些适当的方法来解析 XML。看看here 开始吧。)
  • 当您今天早些时候提出这个问题时,我建议您研究一下 XPath。正如其他人所建议的那样,将 XML 视为 XML。它被设计成易于被 XML 解析器解析。
  • @Flydog57 我是这个网站的新手!管理员关闭了我的帖子因为规则!所以谢谢你和其他人的帮助,你是对的 Parse XML 是最简单的方法

标签: c#


【解决方案1】:

方式0:正确解析XML

var doc = new XmlDocument();
doc.LoadXml(text);
foreach(var n in doc.SelectNodes("//PsnUrl/text()"))
    urlsWrite.WriteLine(n);

您的示例 XML 似乎是从树视图中复制的。这是正确的内容。注意&amp;amp;s 被编码为&amp;amp;。如果您的来源不这样做,您可以先替换它们,例如text.Replace("&amp;", "&amp;amp;").

<?xml version="1.0" encoding="UTF-8"?>
<PsnRecords>
    <PsnRecord>
        <Names/>
        <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/acpkgo/prod/CUSA00803_00/9/f_72955662ebee69bf3f1bbec8b1f1dfef1ed000acb6f96046b394d69fc8551fe4/f/UP0002-CUSA00803_00-CODAWDIGITALPACK.pkg?downloadId=000000ab&amp;serverIpAddr=87.248.195.254&amp;country=us&amp;downloadType=ob&amp;q=1817303785a54ecb464ab93233801c33225a5dae976d075973acb9669874c74b</PsnUrl>
        <LocalUrl></LocalUrl>
    </PsnRecord>
    <PsnRecord>
        <Names/>
        <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/appkgo/prod/CUSA00803_00/3/f_6ee0d43dc4ea9a53a9f3d83fe26c7afcfadca8d17795762ab81cb2ddc6086776/f/UP0002-CUSA00803_00-CODAW00000000000_0.pkg?downloadId=000000ac&amp;serverIpAddr=87.248.195.254&amp;country=us&amp;downloadType=ob&amp;q=1817303785a54ecb464ab93233801c33225a5dae976d075973acb9669874c74b</PsnUrl>
        <LocalUrl></LocalUrl>
    </PsnRecord>
</PsnRecords>

除非 XML 格式不正确,否则请避免自己玩字符串。

方式一:你需要去掉&lt;PsnUrl&gt;&lt;/PsnUrl&gt;

foreach (string s in links)
    urlsWrite.WriteLine(s.Replace("<PsnUrl>", string.Empty).Replace("</PsnUrl>", string.Empty));

方式二:mactcheslinksmats???请发布编译的实际代码。您的替换调用正在用标签包装 URL!?这与您想要实现的目标相反。

foreach (Match matc in mactches)
    urlsWrite.WriteLine(matc.Value);

【讨论】:

  • 对不起,那是我的错误!!由我的重复删除器编辑的 XML 文件!我在第一篇文章中以任何方式编辑了 XML 示例
  • 此解析 XML 不起作用 ``` var paths = Application.StartupPath + @"\DataFiles\DataHistory.xml";字符串 resPaths = Application.StartupPath + @"\DataFiles\Links.txt"; StreamWriter urlsWrite = File.CreateText(resPaths); var doc = new XmlDocument(); doc.Load(路径); XmlNodeList 节点列表; nodeList = doc.SelectNodes("PsnRecords/PsnRecords/PsnUrl"); foreach (var n in nodeList) urlsWrite.WriteLine(n); ```
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-12-11
  • 1970-01-01
  • 2022-12-10
  • 1970-01-01
  • 2021-02-10
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多