【问题标题】:Selenium And Java: Exception in thread "main" org.openqa.selenium.NoSuchWindowException: no such window: target window already closedSelenium 和 Java:线程“主”org.openqa.selenium.NoSuchWindowException 中的异常:没有这样的窗口:目标窗口已经关闭
【发布时间】:2022-02-05 00:22:21
【问题描述】:

我正在访问一个魁北克法律网站,我正在尝试从网络上抓取其所有法律名称及其相关的 PDF。执行此操作时,我打开每条法律的每个选项卡,然后浏览所有这些选项卡以获取我正在寻找的信息。但是,在浏览选项卡一段时间后,我收到以下错误:“线程“主”中的异常 org.openqa.selenium.NoSuchWindowException:没有这样的窗口:目标窗口已经关闭“。我不确定为什么会出现这种情况。我相信这是因为标签的数量是如此之长,因为我用于较少数量的标签的相同代码可以正常工作。这是我的代码:`

System.setProperty("webdriver.chrome.driver", "C:\\WorkSpace\\Driver\\chromedriver.exe");
        WebDriver driver = new ChromeDriver();
        WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(5000));
        driver.manage().window().maximize();
        driver.get("http://www.legisquebec.gouv.qc.ca/en/chapters?corpus=regs&selection=all");
        wait.until(ExpectedConditions.numberOfElementsToBeMoreThan(By.cssSelector("tr.clickable a"), 100));
        Thread.sleep(50);

        List<WebElement> QuebecConsolidatedRegulations = driver.findElements(By.cssSelector("tr.clickable a"));
        String parent = driver.getWindowHandle();
        for (int i=0; i<QuebecConsolidatedRegulations.size(); i++){
            String opentabs = Keys.chord(Keys.CONTROL, Keys.ENTER);
            
            ((JavascriptExecutor) driver).executeScript("arguments[0].scrollIntoView(true);", QuebecConsolidatedRegulations.get(i));
            Thread.sleep(300);
            wait.until(ExpectedConditions.visibilityOf(QuebecConsolidatedRegulations.get(i)));
            QuebecConsolidatedRegulations.get(i).sendKeys(opentabs);
        }

        int i=0; 
            Set<String> tabs = driver.getWindowHandles();
            for (String child:tabs){
                // try{
                    if (!parent.equalsIgnoreCase(child)){
                        driver.close();
                        
                        driver.switchTo().window(child);
                        String StatuteName = driver.findElement(By.xpath("//*[@id='form']/div[2]/div[1]/h3")).getText();
                        String pagePdfUrl = driver.findElement(By.xpath("//*[@id='renditionPdf']")).getAttribute("href");
                        driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
                        if (i<QuebecConsolidatedRegulations.size()){
                            ConsolidatedRegulationsAndPDFs.put(StatuteName, pagePdfUrl);
                            i+=1;
                        }
                        else{
                            continue;
                        }
                    }
               // }
                // catch(NoSuchWindowException e){
                //     continue;
                // }
                
               
                    
            }
            return ConsolidatedRegulationsAndPDFs;
    }`

【问题讨论】:

  • 请告诉我们错误来自哪一行。另外,您说您尝试减少标签并且有效?为什么不在同一个标​​签中打开所有法律?
  • 我很抱歉。第 117 行是错误发生的地方。另外我不确定我应该如何在一个标签中打开所有法律?打开法律会导致一个新选项卡。您的意思是打开一项法律,然后返回主页以访问其他法律吗?

标签: java selenium web-scraping


【解决方案1】:

这肯定取决于打开的标签页数。

我对下一个脚本(groovy)做了一些研究:

static void main(a) {
    WebDriverManager.chromedriver().setup()
    WebDriver driver = new ChromeDriver(new ChromeOptions())
    driver.get('https://nbc.com')
    300.times {
        driver.executeScript("window.open('https://nbc.com')")
        println('windows size:' + driver.getWindowHandles().size())
        driver.switchTo().window(driver.getWindowHandles().first())
        driver.switchTo().window(driver.getWindowHandles().last())
    }
    driver.quit()
}

它失败了

Exception in thread "main" org.openqa.selenium.WebDriverException: chrome not reachable

92 个新标签打开后(注意:我有 16GB 内存)。失败前的标签计数取决于站点和内存,它正在消耗..

您的页面上有 3000 多个 QuebecConsolidatedRegulations 项。


我建议不要打开这么多标签

我在循环中建议:

  • 打开一个新标签,
  • 收集数据,
  • 关闭标签
  • 切换到第一个窗口

通过这种方式,您将同时打开 1 或 2 个标签,而不是更多。

List<WebElement> QuebecConsolidatedRegulations = driver.findElements(By.cssSelector("tr.clickable a"));
String parent = driver.getWindowHandle();
for (int i=0; i<QuebecConsolidatedRegulations.size(); i++){
    String opentabs = Keys.chord(Keys.CONTROL, Keys.ENTER);
    ((JavascriptExecutor) driver).executeScript("arguments[0].scrollIntoView(true);", QuebecConsolidatedRegulations.get(i));
    Thread.sleep(300);
wait.until(ExpectedConditions.visibilityOf(QuebecConsolidatedRegulations.get(i)));
    QuebecConsolidatedRegulations.get(i).sendKeys(opentabs);
    // do the action within a loop but in new window
    addPdfUrlFromTheNewPage(driver, ConsolidatedRegulationsAndPDFs);
}

实现方法addPdfUrlFromTheNewPage

public static void addPdfUrlFromTheNewPage(WebDriver driver, Map resultsMap) {
    List<String> tabs = new ArrayList<>(driver.getWindowHandles());
    String lastTab = tabs.get(tabs.size()-1);
    driver.switchTo().window(lastTab);
    String StatuteName = driver.findElement(By.xpath("//*[@id='form']/div[2]/div[1]/h3")).getText();
    String pagePdfUrl = driver.findElement(By.xpath("//*[@id='renditionPdf']")).getAttribute("href");
    driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
    resultsMap.put(StatuteName, pagePdfUrl);
    driver.close(); //close current window
    driver.switchTo().window(tabs.get(0));  //switch to initial window
}

它已经工作了 20 多分钟,我认为有 1500 多个项目被刮掉了。然后我就停止了执行。我认为它可以正常工作。

输出:

2018C23, r. 1 - Regulation respecting certain transitional measures for the carrying out of the Act mainly to improve the regulation of the financial sector, the protection of deposits of money and the operation of financial institutions=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/2018C23,%20R.%201.pdf
A-2.02, r. 1 - Regulation respecting the application of the Act to promote access to justice through the establishment of the Service administratif de rajustement des pensions alimentaires pour enfants=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.02,%20R.%201.pdf
A-2.1, r. 1 - Code of ethics of the members of the Commission d’accès à l’information=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.1,%20R.%201.pdf
A-2.1, r. 2 - Regulation respecting the distribution of information and the protection of personal information=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.1,%20R.%202.pdf
A-2.1, r. 3 - Regulation respecting fees for the transcription, reproduction or transmission of documents or personal information=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.1,%20R.%203.pdf
A-2.1, r. 4 - Regulation respecting public bodies that must refuse to release or to confirm the existence of certain information=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.1,%20R.%204.pdf
A-2.1, r. 5 - Regulation respecting the procedure for selecting persons qualified for appointment as members of the Commission d’accès à l’information=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.1,%20R.%205.pdf
A-2.1, r. 6 - Rules of Proof and Procedure before the Commission d’accès à l’information=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-2.1,%20R.%206.pdf
A-3, r. 1 - Regulation respecting financial assistance=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-3,%20R.%201.pdf
A-3, r. 2 - Regulation respecting the impairment scale=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-3,%20R.%202.pdf
A-3, r. 3 - Regulation respecting payment of expenses for organizing and maintaining rescue stations in mines by the Commission des normes, de l’équité, de la santé et de la sécurité du travail, and the reimbursement by interested employers of sums disbursed=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-3,%20R.%203.pdf
A-3, r. 4 - Regulation respecting reimbursement of damaged or destroyed clothing, prosthesis or orthesis=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-3,%20R.%204.pdf
A-3, r. 5 - Regulation respecting the transportation of the body of a worker=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-3,%20R.%205.pdf
A-3.001, r. 1 - Regulation respecting medical aid=http://www.legisquebec.gouv.qc.ca/en/pdf/cr/A-3.001,%20R.%201.pdf

... +3000 lines

【讨论】:

  • 你是最棒的!非常感谢。
猜你喜欢
  • 1970-01-01
  • 2014-12-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多