从 Jena 中的 url 读取 RDF/XML答案

【问题标题】：read RDF/XML from url in Jena从 Jena 中的 url 读取 RDF/XML
【发布时间】：2017-01-15 11:02:11
【问题描述】：

我正在尝试使用 Jena 读取 XML 文件。并且通常它正在工作。

    final String url = "http://www.bbc.co.uk/nature/life/Human";
    Model model = ModelFactory.createDefaultModel();       
    model.read(url, "RDF/XML");

但是当我在段落包含 br 或链接时尝试另一个 URL 时。它给了我这个错误。

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 25, col: 6 ] {E202} Cannot have both string data "Great white sharks are at the very top of the marine food chain. Feared as man-eaters, they are only responsible for about 5-10 attacks a year, which are rarely fatal. Great whites are ultimate predators. Powerful streamlined bodies and a mouth full of terrifyingly sharp, serrated teeth, combine with super senses that can detect a single drop of blood from over a mile away. Hiding from a great white isn't an option as they can detect and home in on small electrical discharges from hearts and gills. Unlike most other sharks, live young are born that immediately swim away.
" and XML data <br> inside a property element. Maybe you want rdf:parseType='Literal'.

这是耶拿抛出此错误http://www.bbc.co.uk/nature/life/Great_white_shark的第二种情况的链接

我应该怎么做才能让它忽略它。

【问题讨论】：

标签： xml rdf jena

【解决方案1】：

问题在于 BBC 网站的数据； &lt;br/&gt; 需要转义为 &lt;br/&gt; 以将 HTML 标记放入字符串值中。在 RDF/XML 中，字符串值不能有简单字符串的原始标记。

不幸的是，BBC 网站无法处理完整的内容协商：请求 Turtle 或 N-triples 会获得 XHMTL 页面。

您需要使用常规 HTTP 请求下载文件，标题为 Accept: application/rdf+xml，修补内容，并从固定版本解析它。一种方法是将其读入 Java 字符串，执行正则表达式以将 &lt;br/&gt; 替换为 &lt;br/&gt;，然后从字符串中解析。

【讨论】：