【问题标题】:Unable to navigate aspx pages via jsoup while scraping抓取时无法通过 jsoup 导航 aspx 页面
【发布时间】:2015-06-15 17:15:59
【问题描述】:

我在 jsoup 中抓取一个 url(http://nvsos.gov/sosentitysearch/CorpSearch.aspx),但是我能够抓取第一页但无法导航到第二页。

这里是代码sn-p

try
{
    string url = "http://nvsos.gov/sosentitysearch/CorpSearch.aspx";
    Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();
    Document responseDocument = response.parse();

    Element eventValidation = responseDocument.select("input[name=__EVENTVALIDATION]").first();
    Element viewState = responseDocument.select("input[name=__VIEWSTATE]").first();

    //javascript:__doPostBack('ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01','')
    response = Jsoup.connect(url)
    .data("__VIEWSTATE", viewState.attr("value"))
    .data("__EVENTVALIDATION", eventValidation.attr("value"))
    .data("ctl00$MainContent$txtSearchBox", "apple")  // <- search 
    .data("ctl00$MainContent$btnCorpSearch", "Search")
    .data("ctl00$MainContent$ddlCorpSortColumns", "m")
    .data("ctl00$MainContent$ddlCorpNumSortColumns", "m")
    .data("ctl00$MainContent$ddlOfficerSortColumns", "m")
    .data("ctl00$MainContent$ddlRASortColumns", "m")
    .data("ctl00$MainContent$ddlABNSortColumns", "m")
    .data("ctl00$MainContent$ddlABNSortColumns", "m")
    .data("ctl00$MainContent$rdlSortOrder", "d")
    .data("ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01", "")
    .method(Connection.Method.POST)
    .followRedirects(true)
    .execute();

    Document document = response.parse(); //search results
    System.out.println(document);

}
catch (IOException e)
{
    e.printStackTrace();
}

这里.data("ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01", "") 是导航到第二页,但它总是返回第一页。

【问题讨论】:

    标签: java c# asp.net web-scraping jsoup


    【解决方案1】:

    您可能缺少一些 cookie。试试下面的代码:

    response = Jsoup.connect(url)
    .cookies(response.cookies()) // Add cookies received when fetching the first page
    .data("__VIEWSTATE", viewState.attr("value"))
    .data("__EVENTVALIDATION", eventValidation.attr("value"))
    .data("ctl00$MainContent$txtSearchBox", "apple")  // <- search 
    .data("ctl00$MainContent$btnCorpSearch", "Search")
    .data("ctl00$MainContent$ddlCorpSortColumns", "m")
    .data("ctl00$MainContent$ddlCorpNumSortColumns", "m")
    .data("ctl00$MainContent$ddlOfficerSortColumns", "m")
    .data("ctl00$MainContent$ddlRASortColumns", "m")
    .data("ctl00$MainContent$ddlABNSortColumns", "m")
    .data("ctl00$MainContent$ddlABNSortColumns", "m")
    .data("ctl00$MainContent$rdlSortOrder", "d")
    .data("ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01", "")
    .method(Connection.Method.POST)
    .followRedirects(true)
    .execute();
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-06-09
      • 1970-01-01
      • 1970-01-01
      • 2018-09-12
      • 2018-08-15
      相关资源
      最近更新 更多