【问题标题】:HTML tag parsing extracting title and othersHTML标签解析提取标题等
【发布时间】:2014-10-13 23:31:39
【问题描述】:

尝试从网站中提取标题,打印标题时出现流关闭错误。试图在标题标签之间提取,例如 .不熟悉机智解析请在解释时彻底。谢谢。

import java.lang.*;
import java.util.Scanner;
import java.net.*;
import java.io.*;




public class Allrecipes{
  public static void main(String[] args) throws Exception{  


    System.out.println("Colby Mehmen");

    Scanner input = new Scanner(System.in);
    String str1 = "";
    str1 = compare();

    if (str1.contains("http://allrecipes.com")){



        URL oracle = new URL(str1);
        BufferedReader in = new BufferedReader(
        new InputStreamReader(oracle.openStream()));

        String html;
        while ((html = in.readLine()) != null)  

            in.close();




     String page = html;

     int start = page.indexOf("<title>");
     int end = page.indexOf("</title>");

String title = page.substring(start+"<title>".length(),end);

System.out.println(title);


    }//end program





  }

【问题讨论】:

  • 如果你能格式化你的代码就好了。
  • @SotiriosDelimanolis 很抱歉,稍微清理一下
  • java.io.IOException: 流关闭
  • ^ 我现在收到的错误
  • 查看我的更新答案,如果您接受该答案将不胜感激

标签: java html parsing bufferedreader


【解决方案1】:

JSoup

试试JSOUP API真的很好用

Document doc = Jsoup.connect(YOUR_WEBSITE).get();
Elements tt = doc.select("title");
System.out.println(tt.text());

您的代码

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.Scanner;

public class Allrecipes {
    public static void main(String[] args) throws Exception {

    System.out.println("Colby Mehmen");

    // http://allrecipes.com/Recipe/Cardamom-Maple-Salmon/Detail.aspx?soid=carousel_0_rotd&prop24=rotd

    String str1 = "";
    str1 = compare();

    if (str1.contains("http://allrecipes.com")) {

        URL oracle = new URL(str1);
        BufferedReader in = new BufferedReader(new InputStreamReader(
                oracle.openStream()));

        String html = null;
        String line;
        while ((line = in.readLine()) != null)
            html += line;

        in.close();

        String page = html;

        int start = page.indexOf("<title>");
        int end = page.indexOf("</title>");

        String title = page.substring(start+7, end);
        System.out.println(title);

    }// end program

}

public static String compare() {
    Scanner input = new Scanner(System.in);

    System.out.println("Enter recipe URL: ");
    String str1 = input.next();
    String str2 = "allrecipes.com";
    String str3 = "http://";

    boolean b = str1.contains(str2);

    if (b == true) {
        boolean c = str1.contains(str3 + str2);

        if (c == false) {
            str1 = str3 + str1;

        }
    }// endifif

    boolean d = str1 != str3 + str2;
    if (d == false) {

        System.out.println("ERROR");
    }

    /* cOUT */System.out.println(str1);

    return str1;
}// end compare

}

【讨论】:

  • 当我使用上面的代码时,它说找不到符号符号:类文档
  • 需要先添加JSoup库
  • 好的,当我修复我打印出标题但仍然重复 0 时
  • 我不知道你的代码,JSOUP会返回标题原样,如果它有零则返回零,真的我不明白你的问题
  • 您的 JSoup API 链接已损坏。你的意思可能是jsoup.org
猜你喜欢
  • 2010-10-17
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-02-24
  • 2016-05-06
  • 2012-10-01
  • 2013-06-17
相关资源
最近更新 更多