【问题标题】:Read the HTML code of Websites阅读网站的 HTML 代码
【发布时间】:2016-01-26 19:48:40
【问题描述】:

我正在尝试阅读网站的 HTML 代码,因此我正在使用以下代码: 我的片段之一:

public class FragmentFavorites extends Fragment {
    View view;
    TextView text;
    Homescreen home = new Homescreen();
    public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
        view = inflater.inflate(R.layout.favorites,container, false);
        text = (TextView) view.findViewById(R.id.textView2);
        try {
            text.setText(home.getHtml("http://pastebin.com/u7jHeNwf"));
        } catch (IOException e) {
            e.printStackTrace();
        }
        return view;
    }
}

这是我指的 getHtml():

public static String getHtml(String url) throws IOException {
        URLConnection connection = (new URL(url)).openConnection();
        connection.setConnectTimeout(5000);
        connection.setReadTimeout(5000);
        connection.connect();

        InputStream in = connection.getInputStream();
        BufferedReader reader = new BufferedReader(new InputStreamReader(in));
        StringBuilder html = new StringBuilder();
        for (String line; (line = reader.readLine()) != null; ) {
            html.append(line);
        }
        in.close();

        return html.toString();
    }

不幸的是,每次滚动到此片段/调用 getHTML 时,我的应用程序都会停止运行。有人知道我做错了什么吗?

【问题讨论】:

  • 很可能你正面临 ui 线程异常的网络调用。
  • 你能再解释一下吗,我是编码新手.-.
  • 你不能使用库进行抓取吗?类似this
  • 您应该在 AsyncTask 中连接到网络,而不是在 UI 线程上。

标签: java android web-scraping screen-scraping


【解决方案1】:
    public class FragmentFavorites extends Fragment {
        View view;
        TextView text;
        Homescreen home = new Homescreen();
        public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
            view = inflater.inflate(R.layout.favorites,container, false);
            text = (TextView) view.findViewById(R.id.textView2);
            FetchHtml fetchHtml = new FetchHtml(getActivity().getApplicationContext(), FragmentFavorites.this);
            fetchHtml.executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, "http://pastebin.com/u7jHeNwf");
            return view;
        }

    public static class FetchHtml extends AsyncTask<String, Void, String> {

            Context mContext;
            WeakReference<FragmentFavorites> mClient;

            public RegisterGcmTask(Context context, FragmentFavorites client) {
                this.mContext = context;
                this.mClient = new WeakReference<>(client);
            }

            @Override
            protected String doInBackground(String... params) {
                  try {
                   return getHtml(params[0]);
                } catch (IOException e) {
                   e.printStackTrace();
                   return null;
                }
            }

            @Override
            protected void onPostExecute(String html) {
                super.onPostExecute(token);
                if (null != mClient && null != mClient.get()) {
                    if (null != html) {
                        mClient.get().text.setText(html);
                    } else {
                       mClient.get().text.setText("Error fetching html");
                    }
                }
            }

        private static String getHtml(String url) throws IOException {
            URLConnection connection = (new URL(url)).openConnection();
            connection.setConnectTimeout(5000);
            connection.setReadTimeout(5000);
            connection.connect();

            InputStream in = connection.getInputStream();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder html = new StringBuilder();
            for (String line; (line = reader.readLine()) != null; ) {
                html.append(line);
            }
            in.close();

            return html.toString();
        }

    }
}

【讨论】:

    猜你喜欢
    • 2015-11-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-11-05
    • 2011-06-19
    • 2014-07-08
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多