python微信域名或者链接批量检测

好久没上来写博客了，直入主题。

大家经常用google搜索，如何提取搜索结果的链接呢

google搜索结果url提取,F12,来到console端; 粘贴下面语句，回车。

var tag=document.getElementsByClassName(\'r\');

 for (var i=0;i<tag.length;i++){
        var a=tag[i].getElementsByTagName("a");
        console.log(a[0].href)
 }

提取出来，保存到url.txt. 待检测的url和域名，一行一个,先经过去重去空白行

import io
import shutil
readPath=\'oldurl.txt\'
writePath=\'url.txt\'
lines_seen=set()
outfiile=io.open(writePath,\'a+\',encoding=\'utf-8\')
f=io.open(readPath,\'r\',encoding=\'utf-8\')
for line in f:
    if not len(line):
        continue
    if line not in lines_seen:
        outfiile.write(line)
        lines_seen.add(line)

然后再批量检测

ok.txt 域名正常

red.txt 已经屏蔽的域名和链接

#! /usr/bin/env python
#coding:utf-8
import os,urllib,linecache
import sys
import time
import requests

result = list()
strxx = \'"Code":"102"\'
html = \'\'
for y in linecache.updatecache(r\'url.txt\'):
    try:
       headers = {\'user-agent\': \'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\'}
       #response = urllib.urlopen(x)        
       #html = response.read()
       x = \'http://wx.rrbay.com/pro/wxUrlCheck.ashx?url=\' +  y
       response = requests.get(x,headers=headers)
       html = response.text

       time.sleep(3)
       #print x,a
    except Exception,e:
        html = \'\'
        print e
    if strxx in html:
        print \'ok:\'
        print x
        with open (\'ok.txt\',\'a\') as f:  
            f.write(y)
    else:
        print \'error:\'        
        print y
        html = \'\'
        with open (\'red.txt\',\'a\') as f:  
            f.write(y)