【发布时间】:2017-12-11 14:57:17
【问题描述】:
我对 python 很陌生,并且正在研究一个基于抓取的项目——我应该从包含特定搜索词的链接中提取所有内容并将它们放在一个 csv 文件中。作为第一步,我编写了这段代码来根据输入的搜索词从网站中提取所有链接。我只得到一个空白屏幕作为输出,我找不到我的错误。
import urllib
import mechanize
from bs4 import BeautifulSoup
import datetime
def searchAP(searchterm):
newlinks = []
browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent', 'Firefox')]
text = ""
start = 0
while "There were no matches for your search" not in text:
url = "http://www.marketing-interactive.com/"+"?s="+searchterm
text = urllib.urlopen(url).read()
soup = BeautifulSoup(text, "lxml")
results = soup.findAll('a')
for r in results:
if "rel=bookmark" in r['href'] :
newlinks.append("http://www.marketing-interactive.com"+ str(r["href"]))
start +=10
return newlinks
print searchAP("digital marketing")
【问题讨论】:
标签: python python-2.7 web-scraping beautifulsoup