【问题标题】:Scraping table from data site with Python and Beautiful Soup使用 Python 和 Beautiful Soup 从数据站点抓取表格
【发布时间】:2018-10-12 06:28:09
【问题描述】:

我是 Python 的初学者,我坚持从https://wow-pets.com/compare/eu/silvermoon/kazzak 抓取整个表格的想法 所以我从这个开始:

import urllib
import urllib.request
from bs4 import BeautifulSoup
from time import sleep

WAIT_PERIOD = 20
def make_soup(url):
   thepage1=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}) 
   thepage = urllib.request.urlopen(thepage1).read()
   sleep(WAIT_PERIOD)
   soupdata = BeautifulSoup(thepage, "html.parser")
   return soupdata

petdata=""
soup = make_soup("https://wow-pets.com/compare/eu/draenor/silvermoon")

在那之后,我已经尝试过,我无法使用宠物名称、价格等来拉表。 我的主要目标是计算最佳比率并打印出最佳结果。

感谢任何帮助! :)

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    request.get 调用仅产生空标题标签后,该站点似乎使用脚本来更新表列表作为对表结构的检查。要解决此问题,请使用浏览器操作工具,例如 selenium

    from bs4 import BeautifulSoup as soup
    from selenium import webdriver
    import re
    d = webdriver.Chrome('/path/to/chromedriver')
    d.get('https://wow-pets.com/compare/eu/silvermoon/kazzak')
    page = soup(d.page_source, 'html.parser').find('table', {'class':'table-sortable'})
    headers = [i.text for i in page.find('thead').find_all('th')]
    main_table = [[c.text for c in i.find_all('td')] for i in page.find('tbody').find_all('tr')]
    final_results = [dict(zip(headers, [re.sub('\n+', '', a), *b])) for a, *b in main_table]
    

    输出(前十个结果):

    [{'Pet name': 'Hippogryph Hatchling', 'Silvermoon': '499,999', 'Kazzak': '313,949', 'Diff.': '▼ 37%', 'Global price': '668,709'}, {'Pet name': 'Spectral Tiger Cub', 'Silvermoon': '492,711', 'Kazzak': '400,000', 'Diff.': '▼ 19%', 'Global price': '876,368'}, {'Pet name': 'Nightsaber Cub', 'Silvermoon': '304,836', 'Kazzak': '250,000', 'Diff.': '▼ 18%', 'Global price': '671,397'}, {'Pet name': 'Everliving Spore', 'Silvermoon': '301,000', 'Kazzak': '439,993', 'Diff.': '▲ 46%', 'Global price': '691,879'}, {'Pet name': 'Dragon Kite', 'Silvermoon': '297,234', 'Kazzak': '359,987', 'Diff.': '▲ 21%', 'Global price': '628,084'}, {'Pet name': 'Rocket Chicken', 'Silvermoon': '284,053', 'Kazzak': '309,999', 'Diff.': '▲ 9%', 'Global price': '651,913'}, {'Pet name': 'Tuskarr Kite', 'Silvermoon': '278,595', 'Kazzak': '299,998', 'Diff.': '▲ 8%', 'Global price': '635,809'}, {'Pet name': 'Guardian Cub', 'Silvermoon': '267,741', 'Kazzak': '299,999', 'Diff.': '▲ 12%', 'Global price': '716,485'}, {'Pet name': "Landro's Lichling", 'Silvermoon': '247,999', 'Kazzak': '200,000', 'Diff.': '▼ 19%', 'Global price': '565,617'}, {'Pet name': 'Bananas', 'Silvermoon': '239,431', 'Kazzak': '278,711', 'Diff.': '▲ 16%', 'Global price': '540,228'}]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-04-07
      • 2023-03-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多