【问题标题】:Retrieving XML from URL not writing first couple of lines从 URL 中检索 XML 不写前几行
【发布时间】:2015-03-12 06:27:28
【问题描述】:

我目前正在为大学编写一个基本的天气应用程序,其中包括从 BBC 天气 RSS 提要中检索天气信息。

我已将其全部设置为将 RSS 提要输出到一个文件 (output.xml),然后解析器类将使用该文件来构建树。

但是我得到The markup in the document following the root element must be well- formed. 运行时出错。

在检查下载的 XML 文件时,我注意到前两个节点丢失了。

这是下载的 XML:

<channel>
    <atom:link href="http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss" rel="self" type="application/rss+xml" />
    <title>BBC Weather - Observations for  Bangor, United Kingdom</title>
    <link>http://www.bbc.co.uk/weather/2656397</link>
    <description>Latest observations for Bangor from BBC Weather, including weather, temperature and wind information</description>
    <language>en</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://www.bbc.co.uk/terms/additional_rss.shtml for more details</copyright>
    <pubDate>Thu, 12 Mar 2015 05:35:08 +0000</pubDate>
    <item>
      <title>Thursday - 05:00 GMT: Thick Cloud, 10°C (50°F)</title>
      <link>http://www.bbc.co.uk/weather/2656397</link>
      <description>Temperature: 10°C (50°F), Wind Direction: South Easterly, Wind Speed: 8mph, Humidity: 90%, Pressure: 1021mb, Falling, Visibility: Very Good</description>
      <pubDate>Thu, 12 Mar 2015 05:35:08 +0000</pubDate>
      <guid isPermaLink="false">http://www.bbc.co.uk/weather/2656397-2015-03-12T05:35:08.000Z</guid>
      <georss:point>53.22647 -4.13459</georss:point>
    </item>
  </channel>
</rss>

XML 在&lt;channel&gt; 节点之前应该有以下两个节点:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss" version="2.0">

这是我用来检索 XML 文件的代码:

public static void main(String[] args) throws SAXException, IOException, XPathExpressionException {
    URL url = new URL("http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss");
    URLConnection con = url.openConnection();
    StringBuilder builder;
    try (BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()))) {

        builder = new StringBuilder();
        String line;

        if (!in.readLine().isEmpty()) {
            line = in.readLine();
        }

        while ((line = in.readLine()) != null) {
            builder.append(line).append("\n");
        }

        String input = builder.toString();

        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File("output.xml"))));
        out.write(input);
        out.flush();
    }
    try {
        WeatherParser parser = new WeatherParser();
        System.out.println(parser.parse("output.xml"));
    } catch (ParserConfigurationException ex) {
    }
}

这里是解析 XML (WeatherParser.java) 的代码:

public class WeatherParser {

    public WeatherParser() throws ParserConfigurationException {
        xpfactory = XPathFactory.newInstance();
        path = xpfactory.newXPath();
        dbfactory = DocumentBuilderFactory.newInstance();
        builder = dbfactory.newDocumentBuilder();
    }

    public String parse(String fileName) throws SAXException, IOException, XPathExpressionException {
        File f = new File(fileName);
        org.w3c.dom.Document doc = builder.parse(f);
        StringBuilder info = new StringBuilder();
        info.append(path.evaluate("/channel/item/title", doc));
        return info.toString();
    }

    private DocumentBuilderFactory dbfactory;
    private DocumentBuilder builder;
    private XPathFactory xpfactory;
    private XPath path;
}

希望这提供了足够的信息。

【问题讨论】:

    标签: java xml parsing dom


    【解决方案1】:

    前两行丢失是因为您阅读了它但没有“保存”它
    删除它,它会起作用。

        if (!in.readLine().isEmpty()) {
            line = in.readLine();
        }
    

    if 中,您正在阅读第一行 (&lt;?xml....),但您没有保留它。
    line = in.readLine(); 获取第二行,但是当您输入 while 时,您会丢失原来的内容在line 变量中。

    【讨论】:

    • 太棒了!谢谢你:)
    【解决方案2】:

    首先,您不得操纵服务器发送给您的数据流。删除StringBuilder。如果要将 XML 保存到磁盘,请逐字写入:

    URL url = new URL("http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss");
    URLConnection con = url.openConnection();
    InputStream in = conn.getInputStream();
    FileOutputStream out = new FileOutputStream("output.xml");
    
    byte[] b = new byte[1024];
    int count;
    while ((count = in.read(b)) >= 0) {
        out.write(b, 0, count);
    }
    out.flush(); out.close(); in.close();
    

    事实上,您根本不需要将它写入磁盘。您可以直接从输入流构造 XML 文档。

    public static Document readXml(InputStream is) throws SAXException, ParserConfigurationException, IOException {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    
        dbf.setValidating(false);
        dbf.setIgnoringComments(false);
        dbf.setIgnoringElementContentWhitespace(true);
        dbf.setNamespaceAware(true);
    
        DocumentBuilder db = dbf.newDocumentBuilder();
        return db.parse(is);
    }
    

    让你做事

    public static void main (String[] args) throws java.lang.Exception
    {
        URL observationsUrl = new URL("http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss");
        Document observations = readXml(observationsUrl.openConnection().getInputStream());
    
        XPathFactory xpf = XPathFactory.newInstance();
        XPath xpath = xpf.newXPath();
    
        String title = xpath.evaluate("/rss/channel/title", observations);
        System.out.println(title);
    
        XPathExpression rssitemsExpr = xpath.compile("/rss/channel/item");
    
        NodeList items = (NodeList)rssitemsExpr.evaluate(observations, XPathConstants.NODESET);
        for (int i = 0; i < items.getLength(); i++) {
            System.out.println(xpath.evaluate("./title", items.item(i)));
        }
    }
    

    为我输出:

    BBC 天气 - 班戈, 英国的观测 星期四 - 格林威治标准时间 06:00:厚云,11°C (52°F)

    【讨论】:

    • 如果答案解决了您的问题,请不要勾选“已接受”。
    • 抱歉,我是 stackoverflow 的新手。现在完成:)
    猜你喜欢
    • 1970-01-01
    • 2018-03-16
    • 2015-04-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-17
    • 1970-01-01
    相关资源
    最近更新 更多