长时间运行后连接意外关闭 C#答案

【问题标题】：The connection was closed unexpectedly C# after a long running time长时间运行后连接意外关闭 C#
【发布时间】：2014-02-26 17:28:21
【问题描述】：

您好，我正在为一个网站制作爬虫。爬了大约 3 个小时后，我的应用程序在 WebException 上停止了。下面是我在 c# 中的代码。 client 是预定义的 WebClient 对象，每次处理完 gameDoc 时都会释放该对象。 gameDoc 是一个HtmlDocument 对象（来自HtmlAgilityPack）

while (retrygamedoc)
{
    try
    {
        gameDoc.LoadHtml(client.DownloadString(url)); // this line caused the exception
        retrygamedoc = false;
    }
    catch
    {
        client.Dispose();
        client = new WebClient();

        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

我尝试使用来自this 答案的以下代码（以保持网络客户端新鲜）

while (retrygamedoc)
{
    try
    {
        using (WebClient client2 = new WebClient())
        {
            gameDoc.LoadHtml(client2.DownloadString(url)); // this line cause the exception
            retrygamedoc = false;
        }
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

但结果还是一样。然后我使用 StreamReader，结果保持不变！下面是我使用 StreamReader 的代码。

while (retrygamedoc)
{
    try
    {
        // using native to check the result
        HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
        string responsestring = string.Empty;
        HttpWebResponse response = (HttpWebResponse)webreq.GetResponse(); // this cause the exception
        using (StreamReader reader = new StreamReader(response.GetResponseStream()))
        {
            responsestring = reader.ReadToEnd();
        }
        gameDoc.LoadHtml(client.DownloadString(url));

        retrygamedoc = false;
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

我应该怎么做和检查？我很困惑，因为我能够在同一站点上的某些页面上爬行，然后在大约 1000 个结果中，它导致了异常。来自异常的消息仅为The request was aborted: The connection was closed unexpectedly.，状态为ConnectionClosed

PS。该应用是桌面表单应用。

更新：

现在我正在跳过这些值并将它们设置为 null，以便继续进行爬取。但是如果真的需要数据，我还是得手动更新爬取结果，因为结果包含数千条记录，这很累。请帮帮我。

示例：

就像您从网站下载了大约 1300 条数据，然后应用程序停止说 The request was aborted: The connection was closed unexpectedly.，而您的所有互联网连接仍然保持正常且速度良好。

【问题讨论】：

标签： c# webclient system.net.webexception

【解决方案1】：

收到此错误是因为它从服务器返回为 404。

【讨论】：

【解决方案2】：

ConnectionClosed 可能表明（并且可能确实）您正在从中下载的服务器 正在关闭连接。也许它注意到来自您的客户的大量请求并拒绝您提供额外的服务。

由于您无法控制服务器端的恶作剧，我建议您有某种逻辑稍后重试下载。

【讨论】：

一开始我也在考虑这个问题，我打了几次WebClient 并尝试在调试模式下运行它。结果是可以执行下一个相同的语句块（但url变量的内容不同）。这就是让我好奇的原因。无论如何，我会尝试您的解决方案并使用 Thread.Sleep 进行测试，持续时间更长。
我测试了一遍又一遍，看来问题真的发生在快速连接上，导致网站停止了我的程序WebClient。我将在每个页面之间以及发生相同异常时添加一个间隔。谢谢，标记为答案。
进一步调查表明，防病毒软件也可能导致同样的问题。我今天早些时候再次运行该软件，它返回相同的错误，而我的连接速度很慢，并且在每个错误中都会使用 Thread.Sleep 暂停。关闭杀毒软件一段时间后，代码就像变魔术一样正常工作。