【发布时间】:2019-09-30 09:12:11
【问题描述】:
我正在使用 jsoup 解析 html 并想提取 body 标签内的 innerHtml
到目前为止,我尝试使用 document.body.childern().outerHtml;但它只给出 html 元素并跳过正文内的浮动文本(不包含在任何 html 标记中)
private String getBodyTag(final Document document) {
return document.body().children().outerHtml();
}
输入:
<!DOCTYPE html>
<html lang="de">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" type="text/css" href="assets/style.css">
</head>
<body>
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text
</body>
</html>
预期:
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text
实际:
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
【问题讨论】: