【发布时间】:2015-05-11 23:59:46
【问题描述】:
我是 Python 的新手,我正在尝试通过一些简单的网络抓取来获取足球统计数据来自学。
我已经成功地一次获取一个页面的数据,但我无法弄清楚如何在我的代码中添加一个循环来一次抓取多个页面(或多个职位/年份/会议就此而言)。
我在这个网站和其他网站上搜索了相当多的内容,但我似乎无法正确找到它。
这是我的代码:
import csv
import requests
from BeautifulSoup import BeautifulSoup
url = 'http://www.nfl.com/stats/categorystats?seasonType=REG&d-447263-n=1&d-447263-o=2&d-447263-p=1&d-447263-s=PASSING_YARDS&tabSeq=0&season=2014&Submit=Go&experience=&archive=false&statisticCategory=PASSING&conference=null&qualified=false'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'data-table1'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(''', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
#for line in list_of_rows: print ', '.join(line)
outfile = open("./2014.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rk", "Player", "Team", "Pos", "Comp", "Att", "Pct", "Att/G", "Yds", "Avg", "Yds/G", "TD", "Int", "1st", "1st%", "Lng", "20+", "40+", "Sck", "Rate"])
writer.writerows(list_of_rows)
outfile.close()
这是我在 URL 中添加变量并构建循环的尝试:
import csv
import requests
from BeautifulSoup import BeautifulSoup
pagelist = ["1", "2", "3"]
x = 0
while (x < 500):
url = "http://www.nfl.com/stats/categorystats?seasonType=REG&d-447263-n=1&d-447263-o=2&d-447263-p="+str(x)).read(),'html'+"&d-447263-s=RUSHING_ATTEMPTS_PER_GAME_AVG&tabSeq=0&season=2014&Submit=Go&experience=&archive=false&statisticCategory=RUSHING&conference=null&qualified=false"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'data-table1'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(''', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
#for line in list_of_rows: print ', '.join(line)
outfile = open("./2014.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rk", "Player", "Team", "Pos", "Att", "Att/G", "Yds", "Avg", "Yds/G", "TD", "Long", "1st", "1st%", "20+", "40+", "FUM"])
writer.writerows(list_of_rows)
x = x + 0
outfile.close()
非常感谢。
这是我修改后的代码,似乎在写入 csv 文件时会删除每一页。
import csv
import requests
from BeautifulSoup import BeautifulSoup
url_template = 'http://www.nfl.com/stats/categorystats?tabSeq=0&season=2014&seasonType=REG&experience=&Submit=Go&archive=false&d-447263-p=%s&conference=null&statisticCategory=PASSING&qualified=false'
for p in ['1','2','3']:
url = url_template % p
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'data-table1'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(''', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
#for line in list_of_rows: print ', '.join(line)
outfile = open("./2014Passing.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Rk", "Player", "Team", "Pos", "Comp", "Att", "Pct", "Att/G", "Yds", "Avg", "Yds/G", "TD", "Int", "1st", "1st%", "Lng", "20+", "40+", "Sck", "Rate"])
writer.writerows(list_of_rows)
outfile.close()
【问题讨论】:
标签: python python-2.7 web-scraping beautifulsoup python-requests