【发布时间】:2015-07-01 21:02:31
【问题描述】:
我正在尝试使用 Beautifulsoup 从 Google 专利档案中下载每个 zip 文件。以下是我迄今为止编写的代码。但似乎我无法将文件下载到桌面上的目录中。任何帮助将不胜感激
from bs4 import BeautifulSoup
import urllib2
import re
import pandas as pd
url = 'http://www.google.com/googlebooks/uspto-patents-grants.html'
site = urllib2.urlopen(url)
html = site.read()
soup = BeautifulSoup(html)
soup.prettify()
path = open('/Users/username/Desktop/', "wb")
for name in soup.findAll('a', href=True):
print name['href']
linkpath = name['href']
rq = urllib2.request(linkpath)
res = urllib2.urlope
我应该得到的结果是,所有的 zip 文件都应该下载到特定的目录中。相反,我收到以下错误:
> #2015 --------------------------------------------------------------------------- AttributeError Traceback (most recent call last)
> <ipython-input-13-874f34e07473> in <module>() 17 print name['href'] 18
> linkpath = name['href'] ---> 19 rq = urllib2.request(namep) 20 res =
> urllib2.urlopen(rq) 21 path.write(res.read()) AttributeError: 'module'
> object has no attribute 'request' –
【问题讨论】:
-
您遇到了什么问题?预期的结果是什么?会发生什么?
-
它应该下载所有的 zip 文件,但我得到了这个错误。#2015 ------------------------ -------------------------------------------------- -- AttributeError Traceback (最近调用最后一次)
in () 17 print name['href'] 18 linkpath = name['href'] ---> 19 rq = urllib2 .request(namep) 20 res = urllib2.urlopen(rq) 21 path.write(res.read()) AttributeError: 'module' object has no attribute 'request'
标签: python beautifulsoup web-crawler urllib2