【发布时间】:2018-06-03 09:54:17
【问题描述】:
我正在尝试在 GitHub 存储库的分页中删除链接 我已经分别刮掉了它们,但现在我想要的是使用一些循环来优化它。知道我该怎么做吗?这是代码
ComitUrl= "http://github.com/apple/turicreate/commits/master";
Document document2 = Jsoup.connect(ComitUrl ).get();
Element pagination = document2.select("div.pagination a").get(0);
String Url1 = pagination.attr("href");
System.out.println("pagination-link1 = " + Url1);
Document document3 = Jsoup.connect(Url1).get();
Element pagination2 = document3.select("div.pagination a").get(1);
String Url2 = pagination2.attr("href");
System.out.println("pagination-link2 = " + Url2);
Document document4 = Jsoup.connect(Url2).get();
Element check = document4.select("span.disabled").first();
if (check.text().equals("Older")) {
System.out.println("No pagination link more");
}
else { Element pagination3 = document4.select("div.pagination a").get(1);
String Url3 = pagination3.attr("href");
System.out.println("pagination-link3 = " + Url3);
}
【问题讨论】:
-
您的问题解决了吗?否则我会帮你的。
标签: java web-scraping jsoup