如何获取网页内容并将其保存到字符串变量中答案

【问题标题】：How I can get web page's content and save it into the string variable如何获取网页内容并将其保存到字符串变量中
【发布时间】：2011-05-29 10:54:49
【问题描述】：

如何使用 ASP.NET 获取网页内容？我需要编写一个程序来获取网页的 HTML 并将其存储到字符串变量中。

【问题讨论】：

标签： c# asp.net screen-scraping

【解决方案1】：

Webclient client = new Webclient();
string content = client.DownloadString(url);

传递您想要获取的页面的 URL。您可以使用 htmlagilitypack 解析结果。

【讨论】：

【解决方案2】：

您可以使用WebClient

Using System.Net;
    
WebClient client = new WebClient();
string downloadString = client.DownloadString("http://www.gooogle.com");

【讨论】：

不幸的是，DownloadString（从 .NET 3.5 开始）不够聪明，无法使用 BOM。我的答案中包含了一个替代方案。
没有投票，因为没有使用(WebClient client = new WebClient()){} :)
这相当于史蒂文斯皮尔伯格的回答，3 分钟前发布，所以没有 +1。

【解决方案3】：

我之前遇到过 Webclient.Downloadstring 的问题。如果你这样做，你可以试试这个：

WebRequest request = WebRequest.Create("http://www.google.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
    html = sr.ReadToEnd();
}

【讨论】：

您能详细说明您遇到的问题吗？
@Greg，这是一个与性能相关的问题。我从来没有真正解决它，但 WebClient.DownloadString 需要 5-10 秒才能拉下 HTML，而 WebRequest/WebResponse 几乎是立即的。只是想提出另一种替代解决方案，以防 OP 遇到类似问题或希望对请求/响应进行更多控制。
@Scott - +1 发现这个。只需运行一些测试。 DownloadString 在第一次使用时花费了更长的时间（5299ms downloadstring vs 200ms WebRequest）。在 50 个 BBC、50 个 CNN 和 50 个另一个 RSS 提要 URL 上循环测试它，使用不同的 URL 来避免缓存。初始加载后，BBC 的 DownloadString 快了 20 毫秒，CNN 快了 300 毫秒。对于其他 RSS 提要，WebRequest 快了 3 毫秒。一般来说，我认为我会使用 WebRequest 来处理单曲，而使用 DownloadString 来遍历 URL。
这对我来说非常有效，谢谢！只是为了节省其他人的搜索，WebRequest 在 System.Net 中，而 Stream 在 System.Io 中
Scott, @HockeyJ - 我不知道自从您使用 WebClient 后发生了什么变化，但是当我测试它（使用 .NET 4.5.2）时，它已经足够快了 - 950 毫秒（仍然比单个 WebRequest 需要 450 毫秒，但肯定不是 5-10 秒）。

【解决方案4】：

我建议不要使用WebClient.DownloadString。这是因为（至少在 .NET 3.5 中）DownloadString 不够聪明，无法使用/删除 BOM，如果它存在的话。这可能会导致 BOM (@ 987654322@) 在返回 UTF-8 数据时错误地显示为字符串的一部分（至少没有字符集）- ick！

相反，这种细微的变化将在 BOM 中正常工作：

string ReadTextFromUrl(string url) {
    // WebClient is still convenient
    // Assume UTF8, but detect BOM - could also honor response charset I suppose
    using (var client = new WebClient())
    using (var stream = client.OpenRead(url))
    using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) {
        return textReader.ReadToEnd();
    }
}

【讨论】：

提交错误报告