【发布时间】:2014-09-02 18:17:59
【问题描述】:
我正在尝试访问网站中存在的所有链接并想检查其状态(HTTP 200 或 500 等)。我在处理单击某些链接后生成的新窗口时遇到问题。很少有链接会通向新窗口,而在同一窗口中打开的链接也很少。如何检查新窗口并切换到该窗口并返回主窗口。到目前为止,这是我的代码:
public class TestLink {
//list to save visited links
static List<String> links = new ArrayList<String>();
WebDriver driver;
public TestLink(WebDriver driver) {
this.driver = driver;
}
public void linkTest() {
// loop over all the a elements in the page
try{
for(WebElement link : driver.findElements(By.tagName("a"))) {
// Check if link is displayed and not previously visited
if (link.isDisplayed()
&& !links.contains(link.getText())) {
// add link to list of links already visited
links.add(link.getText());
System.out.println(link.getText());
// click on the link. This opens a new page
link.click();
// call testLink on the new page
new TestLink(driver).linkTest();
}
}
driver.navigate().back();
}catch(StaleElementReferenceException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws InterruptedException {
WebDriver driver = new HtmlUnitDriver();
driver.get("http://www.flipkart.com/");
// start recursive linkText
new TestLink(driver).linkTest();
}
}
编辑
以下代码适用于字符串 url,但我想要网站中每个链接的状态代码。如何动态构造每个链接的url。
public static int getResponseCode(String url) {
try {
WebClient client = new WebClient();
// webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
if(url != null)
return client.getPage(url).getWebResponse().getStatusCode();
} catch (IOException ioe) {
throw new RuntimeException(ioe);
}
return 0;
}
【问题讨论】:
标签: java selenium web-crawler