【发布时间】:2016-05-17 20:01:29
【问题描述】:
我正在尝试从具有以下模式的网页中提取 URL:
'http://www.realclearpolitics.com/epolls/????/governor/??/-.html'
我当前的代码提取所有链接。如何更改我的代码以仅提取与模式匹配的 URL?谢谢!
import requests
from bs4 import BeautifulSoup
def find_governor_races(html):
url = html
base_url = 'http://www.realclearpolitics.com/'
page = requests.get(html).text
soup = BeautifulSoup(page,'html.parser')
links = []
for a in soup.findAll('a', href=True):
links.append(a['href'])
find_governor_races('http://www.realclearpolitics.com/epolls/2010/governor/2010_elections_governor_map.html')
【问题讨论】:
标签: python-2.7 web-scraping beautifulsoup python-requests