【问题标题】:HtmlUnit not finding form and not handling postbackHtmlUnit 未找到表单且未处理回发
【发布时间】:2019-09-20 01:38:15
【问题描述】:

我正在尝试在此 url https://www.pharmacy.ohio.gov/Licensing/RosterRequests.aspx 上使用 Java 中的 HtmlUnit (2.35) 从单选按钮中选择选项,然后单击下载按钮并接收文件。

我很确定我正确设置了单选按钮,但我不确定我是否按下了按钮,或者如果我按下了,如何检测通过回发完成的下载开始(我认为)。

我尝试等待 Javascript,关闭 Javascript,在检查 contentType 并创建侦听器时循环 60 秒。

我还想从页面加载表单,因为 HtmlUnit click() 操作可能只是触发了 Javascript 而没有执行发布操作,但 HtmlUnit 似乎无法在页面上找到表单,即使有一个。

public static void main( String[] args ) throws IOException, InterruptedException {
        WebClient webClient;
        webClient = new WebClient( BrowserVersion.FIREFOX_60 );                    

        webClient.getOptions().setJavaScriptEnabled(false);
        webClient.getOptions().setUseInsecureSSL(true); 
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setSSLClientProtocols(new String[]{"TLSv1.2","TLSv1.1","TLSv1"});  
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());

        HtmlPage MainPage = (HtmlPage) webClient.getPage("https://www.pharmacy.ohio.gov/Licensing/RosterRequests.aspx");
        HtmlElement body = MainPage.getBody();
        if (dbg) System.out.println("MainPage = " + MainPage); 

\\ All of the below are empty:

        System.out.println( "MainPageForm = " + MainPage.getFirstByXPath( "//*[@id=\"form1\"]"));
        System.out.println( "Form List = " + MainPage.getElementsByIdAndOrName( "form#form1"));
        System.out.println( "Form List = " + MainPage.getForms());
        System.out.println( "Form? = " + MainPage.querySelector("#form1"));
        System.out.println( "Form? = " + MainPage.getFirstByXPath( "//form[@action=\"RosterRequests.aspx\"]" ));
        System.out.println( "Form? = " + MainPage.getElementById( "#form1"));
        System.out.println( "MainPageButton = " + MainPage.getFirstByXPath( "//*[@id=\"phBody_rblLicenseType_5\"]") );

\\ Code to click buttons: 

HtmlRadioButtonInput rad_status = (HtmlRadioButtonInput)MainPage.getHtmlElementById("phBody_rblLicenseStatus_1");
rad_status.setChecked( true );
HtmlRadioButtonInput rad_tddd = MainPage.getHtmlElementById( "phBody_rblLicenseType_1");
rad_tddd.setChecked( true );
HtmlInput btn_download = (HtmlInput)MainPage.getHtmlElementById( "phBody_btnSubmit" );   
WebResponse response = btn_download.click().getWebResponse();

\\ ContentType never changes 

        int tries = 30;

        while ( tries > 0 ) {
            //System.out.println( response.getWebRequest().toString());
            System.out.println( response.getContentType());
            synchronized (response) { response.wait(1000);}

        tries--;
        }

        webClient.close();


    }

什么都不会下载,ContentType 也不会改变。在浏览器中,开发工具显示 aspx 页面使用不同的 ContentType 重新加载,这会触发下载对话框。

【问题讨论】:

  • 对此做了一些分析。看起来问题的根本原因在 HtmlUnit 中。请打开一个问题 (github.com/HtmlUnit/htmlunit/issues)
  • 谢谢!我会这样做的。欣赏分析!

标签: forms postback htmlunit


【解决方案1】:

这适用于 HtmlUnit 2.36.0(或至少适用于最新的 2.35.0-SNAPSHOT。

final String url = "https://www.pharmacy.ohio.gov/Licensing/RosterRequests.aspx";

try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
    HtmlPage page = webClient.getPage(url);

    HtmlRadioButtonInput rad_status = (HtmlRadioButtonInput)page.getHtmlElementById("phBody_rblLicenseStatus_1");
    rad_status.setChecked( true );
    HtmlRadioButtonInput rad_tddd = page.getHtmlElementById( "phBody_rblLicenseType_1");
    rad_tddd.setChecked( true );

    HtmlInput btn_download = (HtmlInput)page.getHtmlElementById( "phBody_btnSubmit" );
    WebResponse response = btn_download.click().getWebResponse();

    try (InputStream in = response.getContentAsStream();
         FileOutputStream out = new FileOutputStream("c:/tmp/test.xlsx")) {
        byte[] buffer = new byte[8 * 1024];
        int bytesRead;
        while ((bytesRead = in.read(buffer)) != -1) {
            out.write(buffer, 0, bytesRead);
        }
    }
}

【讨论】:

    猜你喜欢
    • 2019-06-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-08-09
    • 2023-03-07
    • 1970-01-01
    相关资源
    最近更新 更多