【发布时间】:2017-10-29 23:18:54
【问题描述】:
我将Selenium 用于Java,但HTMLUnitDriver 出现问题。无论我尝试哪个网站或依赖哪个网站,根据控制台输出,它几乎都会在任何JavaScript 上崩溃。当我改用PhantomJS 时,一切都很好,而且东西的工作原理就像它对例如Chrome 或 Firefox。另外,我不确定应该为HTMLUnitDriver 使用哪些依赖项。
下面应该给我the latest version的HTMLUnitDriver:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.5.3</version>
<exclusions>
<exclusion>
<groupId>org.sourceforge.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
</exclusion>
<exclusion>
<groupId>org.sourceforge.htmlunit</groupId>
<artifactId>htmlunit-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.sourceforge.htmlunit</groupId>
<artifactId>neko-htmlunit</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-htmlunit-driver</artifactId>
<version>2.52.0</version>
</dependency>
然而,事实并非如此。 HTMLUnitDriver 似乎与 net.sourceforge.htmlunit:htmlunit:2.27、net.sourceforge.htmlunit:htmlunit-core-js:2.27 和 net.sourceforge.htmlunit:neko-htmlunit:2.27 捆绑在一起,尽管排除了。
This 存储库建议 2.27 仍然是最新的,但它在网站上处理任何类型的 JavaScript 非常糟糕,因此无法使用。
我是这样开始的:
HtmlUnitDriver unitDriver = new HtmlUnitDriver();
unitDriver.setJavascriptEnabled(true);
例外:
Caused by: com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function registerElement in object [object HTMLDocument]. (https://www.example.com/some-script.js#31)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:894)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:637)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:518)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:774)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:750)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:102)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:991)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:366)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:247)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:268)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:800)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:756)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1236)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1136)
at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:226)
at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:345)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3178)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2141)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:945)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:521)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:472)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:999)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:250)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:192)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:272)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:160)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:522)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:396)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:313)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.get(HtmlUnitDriver.java:668)
... 3 more
不启用JavaScript 在避免异常方面效果更好,但站点需要JavaScript,所以这不是解决方案。
我的依赖项有什么问题还是HTMLUnitDriver 真的只是“垃圾”? PhantomJS 的启动时间约为 5 秒,如果您只想解析一次内容,这将是相当慢的,因此如果它有效,像 HTMLUnitDriver 这样更轻量级的驱动程序会派上用场...
【问题讨论】:
标签: javascript java maven selenium htmlunit-driver