【问题标题】:got error and exceptions using HtmlUnit in Java在 Java 中使用 HtmlUnit 得到错误和异常
【发布时间】:2015-01-02 17:51:52
【问题描述】:

我刚刚用 Java 编写了简单的代码(使用带有 OS X 的 Eclipse 编辑器,jdk1.8.0_25.jdk):

String url = "http://www.delfi.lt";
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webClient.getPage(url);
webClient.closeAllWindows();
System.out.println("Done!");

并得到以下错误:

Exception in thread "main" ======= EXCEPTION START ========
EcmaError: lineNumber=[1546] column=[0] lineSource=[<no source>] name=[TypeError] sourceName=[script in http://www.delfi.lt/ from (1545, 32) to (1548, 10)] message=[TypeError: Cannot find function getElementsByClassName in object [object HTMLDocument]. (script in http://www.delfi.lt/ from (1545, 32) to (1548, 10)#1546)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function getElementsByClassName in object [object HTMLDocument]. (script in http://www.delfi.lt/ from (1545, 32) to (1548, 10)#1546)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:705)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:591)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:566)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:975)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:349)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:409)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:274)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:288)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:741)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:701)
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:965)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:247)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:193)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:468)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:342)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:407)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:392)
    at ClickingDelfi.submitComment(ClickingDelfi.java:26)
    at Main.run(Main.java:28)
    at Main.main(Main.java:13)
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function getElementsByClassName in object [object HTMLDocument]. (script in http://www.delfi.lt/ from (1545, 32) to (1548, 10)#1546)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2233)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2215)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:582)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:690)
    ... 34 more
Enclosed exception: 
net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function getElementsByClassName in object [object HTMLDocument]. (script in http://www.delfi.lt/ from (1545, 32) to (1548, 10)#1546)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2233)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2215)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333)
    at script(script in http://www.delfi.lt/ from (1545, 32) to (1548, 10):1546)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:582)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:690)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:591)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:566)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:975)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:349)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:409)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:274)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:288)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:741)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:701)
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:965)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:247)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:193)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:468)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:342)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:407)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:392)
    at ClickingDelfi.submitComment(ClickingDelfi.java:26)
    at Main.run(Main.java:28)
    at Main.main(Main.java:13)
======= EXCEPTION END ========

使用 url 作为https://www.google.lt 它有效。它仅适用于某些页面吗?我使用的 url 有什么替代方案可以使用吗?

【问题讨论】:

    标签: java eclipse macos htmlunit


    【解决方案1】:

    您可以使用WebClientOptions 来绕过脚本引擎抛出的异常。这可以通过以下方式完成:

        webClient.getOptions().setThrowExceptionOnScriptError(false);
    

    或者,您也可以完全禁用 Javascript:

        webClient.getOptions().setJavaScriptEnabled(false);
    

    【讨论】:

    • 然而,如果 javascript 被禁用,如何使用 HtmlUnit 评估 javascript?
    • 好吧,我不知道你的具体用例,所以如果你想评估或测试 Javascript,不要禁用它,只需将 setThrowExceptionsOnScriptError 设置为 false。根据 Javascript 错误和您的用例,错误可能可以忽略不计。您可能还想尝试切换BrowserVersion(例如,使用BrowserVersion.FIREFOX_24 我没有得到任何异常)。否则,恐怕除了修复网站本身或使用其他工具之外,您别无选择。
    【解决方案2】:

    我遇到了同样的问题,但我使用的是 HtmlUnitDriver 而不是直接使用 HtmlUnit 类。因为HtmlUnitDriver 不会公开getWebClient 类,所以我通过创建HtmlUnitDriverWrapper 添加了getWebClientOptions() 方法解决了这个问题。

    一旦我这样做了,我就可以使用以下命令禁用 javascript 异常。

    driver.getWebClientOptions().setThrowExceptionOnScriptError(false);
    

    不幸的是,这只是导致测试用例无声地失败,因此我在错误后注释掉了所有断言,并将 javascript 错误作为错误报告给开发团队。

    【讨论】:

      【解决方案3】:

      使用类似的东西

      public static void main(String[] args) throws IOException {
          final String url = "http://www.delfi.lt";
      
          try (final WebClient webClient = new WebClient()) {
              webClient.getOptions().setThrowExceptionOnScriptError(false);
      
              HtmlPage page = webClient.getPage(url);
              webClient.waitForBackgroundJavaScript(40_000);
      
              System.out.println(page.asXml());
          }
      }
      

      在对象 [object HTMLDocument] 中找不到函数 getElementsByClassName

      似乎指向一个非常旧的 HtmlUnit 版本。但也有最新的这个页面产生了许多 js 错误,但也许有足够的代码执行来获得你需要的结果。

      【讨论】:

        猜你喜欢
        • 2013-12-16
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-08-01
        • 1970-01-01
        • 2018-10-31
        • 1970-01-01
        相关资源
        最近更新 更多