【发布时间】:2019-10-23 09:36:06
【问题描述】:
我使用 htmlunit 2.36.0 并尝试抓取: https://delightful.dussmann.com/menu/B%C3%BCropark%20Bredeney/B%C3%BCropark%20Bredeney 不知何故,没有通过 javascript 执行动态内容获取。 有人知道如何解决吗?
@Test
public void testPDFFetch() throws IOException {
String url = "https://delightful.dussmann.com/menu/B%C3%BCropark%20Bredeney/B%C3%BCropark%20Bredeney";
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(true);
client.getOptions().setCssEnabled(true);
client.getOptions().setUseInsecureSSL(true);
client.setAjaxController(new AjaxController() {
@Override
public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) {
return true;
}
});
try {
HtmlPage page = client.getPage(url);
// page.wait(20000);
client.waitForBackgroundJavaScript(10000);
client.waitForBackgroundJavaScriptStartingBefore(10000);
Thread.sleep(10000);
System.out.println(page.asXml());
} catch (Exception e) {
e.printStackTrace();
} finally {
client.close();
}
}
【问题讨论】:
标签: firebase web-scraping htmlunit