JSoup - 如何解析嵌套文本？

【问题标题】：JSoup - How to parse nested texts?JSoup - 如何解析嵌套文本？
【发布时间】：2018-03-13 17:32:27
【问题描述】：

我正在使用 JSoup 解析网站的 html。我想解析这部分：

<td class="lastpost">
This is a text 1<br>
<a href="post/13594">Website Page - 1</a>
</td>

我想要这样：

String text = "This is a text 1";
String textNo = "Website Page - 1";
String link = "post/13594";

我怎样才能得到这样的零件？

【问题讨论】：

你能发布你迄今为止尝试过的代码吗？
其实我做不到：String lastPost = thread.select("td.lastpost").text();

标签： java parsing jsoup

【解决方案1】：

您的代码只会获取您选择的td 元素中的所有文本。如果要将文本存储在单独的变量中，则应像以下代码一样单独获取部分。添加了额外的 cmets，以便您了解如何/为什么获得每件作品。

// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();

// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");

【讨论】：