HTML标签解析提取标题等答案

【问题标题】：HTML tag parsing extracting title and othersHTML标签解析提取标题等
【发布时间】：2014-10-13 23:31:39
【问题描述】：

尝试从网站中提取标题，打印标题时出现流关闭错误。试图在标题标签之间提取，例如 .不熟悉机智解析请在解释时彻底。谢谢。

import java.lang.*;
import java.util.Scanner;
import java.net.*;
import java.io.*;




public class Allrecipes{
  public static void main(String[] args) throws Exception{  


    System.out.println("Colby Mehmen");

    Scanner input = new Scanner(System.in);
    String str1 = "";
    str1 = compare();

    if (str1.contains("http://allrecipes.com")){



        URL oracle = new URL(str1);
        BufferedReader in = new BufferedReader(
        new InputStreamReader(oracle.openStream()));

        String html;
        while ((html = in.readLine()) != null)  

            in.close();




     String page = html;

     int start = page.indexOf("<title>");
     int end = page.indexOf("</title>");

String title = page.substring(start+"<title>".length(),end);

System.out.println(title);


    }//end program





  }

【问题讨论】：

如果你能格式化你的代码就好了。
@SotiriosDelimanolis 很抱歉，稍微清理一下
java.io.IOException: 流关闭
^ 我现在收到的错误
查看我的更新答案，如果您接受该答案将不胜感激

标签： java html parsing bufferedreader

【解决方案1】：

JSoup

试试JSOUP API真的很好用

Document doc = Jsoup.connect(YOUR_WEBSITE).get();
Elements tt = doc.select("title");
System.out.println(tt.text());

您的代码

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.Scanner;

public class Allrecipes {
    public static void main(String[] args) throws Exception {

    System.out.println("Colby Mehmen");

    // http://allrecipes.com/Recipe/Cardamom-Maple-Salmon/Detail.aspx?soid=carousel_0_rotd&prop24=rotd

    String str1 = "";
    str1 = compare();

    if (str1.contains("http://allrecipes.com")) {

        URL oracle = new URL(str1);
        BufferedReader in = new BufferedReader(new InputStreamReader(
                oracle.openStream()));

        String html = null;
        String line;
        while ((line = in.readLine()) != null)
            html += line;

        in.close();

        String page = html;

        int start = page.indexOf("<title>");
        int end = page.indexOf("</title>");

        String title = page.substring(start+7, end);
        System.out.println(title);

    }// end program

}

public static String compare() {
    Scanner input = new Scanner(System.in);

    System.out.println("Enter recipe URL: ");
    String str1 = input.next();
    String str2 = "allrecipes.com";
    String str3 = "http://";

    boolean b = str1.contains(str2);

    if (b == true) {
        boolean c = str1.contains(str3 + str2);

        if (c == false) {
            str1 = str3 + str1;

        }
    }// endifif

    boolean d = str1 != str3 + str2;
    if (d == false) {

        System.out.println("ERROR");
    }

    /* cOUT */System.out.println(str1);

    return str1;
}// end compare

}

【讨论】：

当我使用上面的代码时，它说找不到符号符号：类文档
需要先添加JSoup库
好的，当我修复我打印出标题但仍然重复 0 时
我不知道你的代码，JSOUP会返回标题原样，如果它有零则返回零，真的我不明白你的问题
您的 JSoup API 链接已损坏。你的意思可能是jsoup.org。