问题：Jsoup 用 < 后跟 word 解析字符串答案

【问题标题】：Issue: Jsoup to parse string with < followed by word问题：Jsoup 用 < 后跟 word 解析字符串
【发布时间】：2019-09-30 15:55:20
【问题描述】：

我正在使用 Jsoup 解析包含以

String input ="<p>testing with less than <string</p>";

String s = Jsoup.parse(input).text();

提取属性文本后“testing with less than”是结果，而不是testing with less than

【问题讨论】：

【解决方案1】：

String input = "<p>testing with less than <string</p>";
System.out.println(input);

输出：

<p>testing with less than <string</p>

如果我们打印输入，我们将得到如图所示的整个字符串。

String s1 = Jsoup.parse(input).text();
System.out.println(s1);// when we use method text()

输出：

testing with less than

如果我们使用 jsoup text() 方法，我们会得到没有 HTML 标签的纯文本。

但是，由于字符“

原因在下面的例子中是合理的。

String s2 = Jsoup.parse(input).html();
System.out.println(s2);// when we use method html()

输出：

 <html>
 <head></head>
 <body>
 <p>testing with less than 
 <string></string> //the end tag is auto generated by the method
 </p>
 </body>
 </html>

如果我们使用 jsoup html() 方法，我们会得到整个格式化的 HTML 代码。

这里我们可以清楚地看到，在另一个HTML标签之间的字符“

这就是我们没有得到第一个示例中所示的完整输入的原因

【讨论】：