【问题标题】:Extract text in a order using jsoup使用jsoup按顺序提取文本
【发布时间】:2016-04-25 04:56:57
【问题描述】:

我想提取“职位”中的文本和“摘要”类中的文本。有许多具有相同的类名。所以我想要第一个的职位及其摘要。然后是下一个的职位及其摘要。以该顺序。

以下代码有效。但它首先给出所有标题,然后给出所有摘要类中的所有文本。我想要第一个职位和第一个摘要。然后是第二个职位和第二个摘要等等。如何为此修改代码?请帮忙。

 <div class="  row  result" id="p_64c5268586001bd2" data-jk="64c5268586001bd2" itemscope="" itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
 <h2 id="jl_64c5268586001bd2" class="jobtitle">
 <a rel="nofollow" href="/rc/clk?jk=64c5268586001bd2" target="_blank" onmousedown="return rclk(this,jobmap[0],0);" onclick="return rclk(this,jobmap[0],true,0);" itemprop="title" title="Fashion Assistant" class="turnstileLink" data-tn-element="jobTitle"><b>Fashion</b> Assistant</a>
 </h2>
 <span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
    <span itemprop="name">
    <a href="/cmp/Itv?from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=64c5268586001bd2&amp;jcid=3bf3e8a57da58ff5" target="_blank">
ITV Jobs</a></span>
   </span>

     <a data-tn-element="reviewStars" data-tn-variant="cmplinktst2" class="turnstileLink " href="/cmp/Itv/reviews?jcid=3bf3e8a57da58ff5" title="Itv Jobs reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Fashion+Assistant&amp;fromjk=64c5268586001bd2');" target="_blank">
    <span class="ratings"><span class="rating" style="width:49.5px;"><!-- ->        </span></span><span class="slNoUnderline">28 reviews</span></a>
<span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place">      <span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">London</span></span></span>
 <table cellpadding="0" cellspacing="0" border="0">
 <tbody><tr>
 <td class="snip">
 <div>
 <span class="summary" itemprop="description">
  Do you have a passion for <b>Fashion</b>? You will be responsible for     running our <b>fashion</b> cupboard, managing a team of interns and liaising with press officers to...</span>
   </div>

doc = Jsoup.connect("http://www.indeed.co.uk/jobs?q=fashion&l=England").timeout(5000).get();
Elements f = doc.select(".jobtitle");
Elements e = doc.select(".summary");
System.out.println("Title: " + f.text());
System.out.println("Details: "+ e.text());

【问题讨论】:

    标签: java html web-scraping jsoup


    【解决方案1】:

    遍历标题,然后找到每个标题的摘要:

    for (Element title : doc.select(".jobtitle")) {
        Element summary = title.parent().select(".summary").first();
    
        System.out.format("Title: %s. Summary: %s%n", title.text(), summary.text());
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-07-26
      • 1970-01-01
      • 1970-01-01
      • 2012-04-28
      • 1970-01-01
      • 2020-03-26
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多