爬虫/机器人如何工作？区分机器人/爬虫 http 请求答案

【问题标题】：How crawlers/bots work? differentiating bots/crawlers http requests爬虫/机器人如何工作？区分机器人/爬虫 http 请求
【发布时间】：2015-11-16 22:01:33
【问题描述】：

我正在开发一个网站。

我需要了解我的网站是否获得了 Google 或任何其他搜索引擎的 crawlers/bots 的访问

在我的应用程序中，我正在拦截 http 请求。并且需要了解爬虫/机器人是否发出 http 请求来爬取我的网站。

我该怎么做？

【问题讨论】：

可以查看User-Agent头：security.stackexchange.com/questions/17096/…

标签： c# asp.net seo search-engine google-search

【解决方案1】：

检查用户代理字符串，看看它是否是已知的机器人。一个例子：

protected void Page_Load(object sender, EventArgs e)
        {
            if (Request.UserAgent.Contains("Googlebot"))
            {
                //it's one of the google robots
            }
            else if (...)
            {
                ...
            }
        }

对于 google，他们使用的代理列表可以在这里找到here。

其他人，你必须自己去发现。

【讨论】：