【问题标题】:Can't download web page in .net无法在.net 中下载网页
【发布时间】:2019-03-26 17:02:09
【问题描述】:

我做了一批解析gearbest.com的html页面以提取项目数据(示例链接link)。 在网站更新之后,它一直工作到 2-3 周前。 所以我无法下载要解析的页面,我也不明白为什么。 在更新之前,我确实使用 HtmlAgilityPack 请求了以下代码。

HtmlWeb web = new HtmlWeb();    
HtmlDocument doc = null;    
doc = web.Load(url); //now this the point where is throw the exception

我尝试不使用框架,并在请求中添加了一些日期

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
request.CookieContainer = new CookieContainer();
request.Headers.Add("accept-language", "it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("upgrade-insecure-requests", "1");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.CookieContainer = new CookieContainer();

Response response = request.GetResponse();  //exception

例外是:

  • IOException:无法从传输连接中读取数据
  • SocketException:无法建立连接。

如果我尝试请求主页 (https://it.gearbest.com),它会起作用。

您认为有什么问题?

【问题讨论】:

    标签: c# html-agility-pack webrequest


    【解决方案1】:

    由于某种原因,它不喜欢提供的用户代理。如果您省略设置 UserAgent 一切正常

    HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
    request.Credentials = CredentialCache.DefaultCredentials;
    //request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
    request.ContentType = "text/html; charset=UTF-8";
    

    另一种解决方案是将request.Connection 设置为随机字符串(但不是keep-aliveclose

    request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
    request.Connection = "random value";
    

    它也有效,但我无法解释原因。

    【讨论】:

    • 谢谢,它只适用于 Accept 和 Connection 标头
    【解决方案2】:

    可能值得一试...

    HttpRequest.KeepAlive = false; 
    HttpRequest.ProtocolVersion = HttpVersion.Version10;
    

    https://stackoverflow.com/a/16140621/1302730

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-10-19
      • 2018-03-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多