【发布时间】:2018-04-24 21:00:55
【问题描述】:
String html = "<video width='320' height='240' controls autoplay> <source src='movie.ogg' type='video/ogg'> <source src='movie.mp4' type='video/mp4'> <object data='movie.mp4' width='320' height='240'> <embed width='320' height='240' src='movie.swf'> </object></video><canvas id='myCanvas' width='200' height='100' style='border:1px solid #000000;'>Your browser does not support the HTML5 canvas tag.</canvas><article> <header> <h1>Internet Explorer 9</h1> <p><time pubdate datetime='2011-03-15'></time></p> </header> <p>Windows Internet Explorer 9 (abbreviated as IE9) was released to the public on March 14, 2011 at 21:00 PDT.....</p></article><footer> <p>Posted by: Hege Refsnes</p> <p>Contact information: <a href='mailto:someone@example.com'> someone@example.com</a>.</p></footer> <nav> <a href='/html/'>HTML</a> | <a href='/css/'>CSS</a> | <a href='/js/'>JavaScript</a> | <a href='/jquery/'>jQuery</a></nav> <section> <h1>WWF</h1> <p>The World Wide Fund for Nature (WWF) is....</p></section><datalist id='browsers'> <option value='Internet Explorer'> <option value='Firefox'> <option value='Chrome'> <option value='Opera'> <option value='Safari'></datalist> <audio controls> <source src='horse.ogg' type='audio/ogg'> <source src='horse.mp3' type='audio/mpeg'>Your browser does not support the audio element.</audio> <progress value='22' max='100'>teasdklfjashdfjkl</progress> ";
String toDoRemoveTAG = "style,img,script,noscript,hr,input";
String allowTagList = "p,span,b,i,u,div,br,a";
Document doc = Jsoup.parse(html);
Elements els = doc.select(toDoRemoveTAG);
for (Element e : els)
{
e.remove();
}
Whitelist whitelist = new Whitelist();
whitelist.addTags(allowTagList.split(","));
whitelist.addAttributes("a", "href");
Cleaner cleaner = new Cleaner(whitelist);
doc = cleaner.clean(doc);
System.out.println(doc.select("body").html());
我使用上述程序只允许列入白名单的标签并删除其他标签(甚至删除剥离的文本)。我想知道是否有任何 API 或 OOTB 解决方案可以实现相同的目标,我只需要传递白名单标签,函数将删除其他标签
我不想像以前那样手动执行此操作。
Elements els = doc.select(toDoRemoveTAG);
for (Element e : els)
{
e.remove();
}
【问题讨论】: