【发布时间】:2014-10-22 20:25:35
【问题描述】:
这是我为获得 Alexa 排名而编写的脚本。
#!/usr/bin/env python
import sys
import requests
from lxml import html
if __name__ == '__main__':
if len(sys.argv) < 2:
print 'usage: python %s <file-urls>' % (sys.argv[0])
sys.exit(2)
filename = sys.argv[1]
urls = open(filename)
for site in urls:
try:
url="http://www.alexa.com/siteinfo/"+site
content=requests.get(url).content
tree=html.fromstring(content)
RANK=tree.xpath('//strong[@class="metrics-data align-vmiddle"]/text()')
print "Site:",site+"Global Rank:",RANK[0]+"\t"+"Country Rank:",RANK[1]
# print 'Site:%s Global Rank:%2s Country Rank:%2s' % (site, RANK[0], RANK[1])
except (KeyboardInterrupt, SystemExit):
print "Keyboar Interruption!"
sys.exit(0)
结果:
Site: google.com
Global Rank: 1 Country Rank: 1
Site: yahoo.com
Global Rank: 4 Country Rank: 4
Site: bing.com
Global Rank: 23 Country Rank: 14
结果并不令人满意。您能否展示如何更好地对结果进行列化?
【问题讨论】:
-
我想知道为什么网站在上线以及如何纠正它
-
因为
\n在您的site变量的末尾。尝试剥离它。