rookie2017

   支持正版,从我做起!

   最近发售的游戏不少,刚好又在学习爬虫,于是就灵光一闪去3DM爬了个单机大作排行榜TOP200,过程代码结果如下。

  首先,我们需要知道我们要爬取哪些信息,看到排行榜里的游戏项可以确定有名字发售日期评分网址还有其他一些描述信息这五项内容。

  

 

  打开Chrome开发者工具,找到五项信息对应的标签,可以发现发售日期和其他一些信息都包含在一个<ul>表格中,可以将发售日期单独提取出来作为一项数据。

   

  

  然后分析一下单机大作列表的url,发现页码和zq有关,zq_a就是第a页,而第一页zq和zq_1显示的是一个页面,而每个页面包含20个游戏信息(让我们忽略第一个真-3DM大作),爬取前10页得到TOP210的游戏信息。

  

 

  接下来就是代码的编写,代码分为数据的爬取与清洗和数据的入库两个基本函数,非常简单。

import requests
import pymongo
import csv
import xlwt
from bs4 import BeautifulSoup


base_url = \'https://www.3dmgame.com/games/zq_\'
headers = {
\'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/85.0.4181.9 Safari/537.36\'
}
proxies = {}


# 爬取游戏排行榜
def game_spider():
games = []
for i in range(1, 11):
url = base_url + str(i) + \'/\'
response = requests.get(url, headers=headers, proxies=proxies)
if response:
page_source = response.text
result = BeautifulSoup(page_source, \'html.parser\')
# 游戏名
game_names = result.find_all(\'a\', class_=\'bt\')
game_urls = []
# 游戏链接, 提取游戏名标签的href属性即可
for game_name in game_names:
game_url = game_name[\'href\']
game_urls.append(game_url)
# 发售日期
game_lis = result.find_all(\'li\')
game_dates = []
for game_li in game_lis:
if "发售:" in game_li.text:
game_date = game_li.text.replace(\'发售:\', \'\').replace(\' \', \'\')
game_dates.append(game_date)
# 其他信息
game_infos = result.find_all(\'ul\', class_=\'info\')
# 评分
game_ranks = result.find_all(\'div\', class_=\'scorewrap\')
for j in range(0, len(game_names)):
game_name = game_names[j].text.replace(\'\n\', \'\').replace(\' \', \'\')
game_info = game_infos[j].text.replace(\'\n\', \'\').replace(\' \', \'\')
game_rank = game_ranks[j].text.replace(\'\n\', \'\').replace(\' \', \'\')
game_data = {\'game_name\': game_name, \'game_date\': game_dates[j], \'game_info\': game_info, \'game_rank\': \
game_rank, \'game_url\': game_urls[j]}
if game_data[\'game_name\'] != \'斗罗大陆3D\':
games.append(game_data)
return games


def save_csv(games):
csvfile = open(\'games.csv\', \'w\', newline=\'\', encoding=\'utf-8\')
writer = csv.writer(csvfile)
writer.writerow([\'游戏名\', \'发售日期\', \'其他信息\', \'评分\', \'网址\'])
for i in range(0, len(games)):
game = [games[i][\'game_name\'], games[i][\'game_date\'], games[i][\'game_info\'], games[i][\'game_rank\'], \
games[i][\'game_url\']]
writer.writerow(game)
csvfile.close()
print(\'csv文件制作完成\')

def save_mongodb(games):
client = pymongo.MongoClient(\'localhost\', 27017)
db = client[\'DB\']
game_collection = db.games
game_collection.insert_many(games)
print(\'入库完成\')


def save_excel(games):
wb = xlwt.Workbook()
ws = wb.add_sheet(\'games\', cell_overwrite_ok=True)
alignment = xlwt.Alignment()
alignment.horz = xlwt.Alignment.HORZ_CENTER
alignment.vert = xlwt.Alignment.VERT_CENTER
pattern = xlwt.Pattern()
pattern.pattern_fore_colour = 3
font = xlwt.Font()
font.bold = True
style = xlwt.XFStyle()
basestyle = xlwt.XFStyle()
basestyle.alignment = alignment
style.alignment = alignment
style.pattern = pattern
style.font = font
ws.write_merge(0, 0, 0, 5 , \'3DM单机大作排行榜TOP210\', style)
titles = [\'游戏名\', \'发售日期与平台\', \'其他信息\', \'评分\', \'网址\']
for i in range(0, 5):
ws.write(1, i, titles[i], basestyle)
for i in range(2, len(games)):
ws.write(i, 0, games[i-1][\'game_name\'], style)
ws.write(i, 1, games[i-1][\'game_date\'], basestyle)
ws.write(i, 2, games[i-1][\'game_info\'], basestyle)
ws.write(i, 3, games[i-1][\'game_rank\'], style)
ws.write(i, 4, games[i-1][\'game_url\'], basestyle)
wb.save(\'单机游戏排行榜TOP210.xls\')
print(\'Excel表制作完成\')


if __name__ == "__main__":
games = game_spider()
save_csv(games)
save_mongodb(games)
save_excel(games)

  运行代码,可以从看到Excel、csv和MongoDB里面都已经写入了内容。

 

 

 

接下来对Excel里的榜单按评分进行排序。

 

 

 可以看到,最近名声大燥的几款大作都上了榜但评分都不算很高,两款未上线的游戏也是如此,所以我对这个排行榜持怀疑态度,不知道各位网友们怎么看。

 

  

 

分类:

技术点:

相关文章: