【问题标题】:Scrape site based on span text基于跨度文本抓取网站
【发布时间】:2016-11-07 11:55:36
【问题描述】:

我正在尝试仅从以下站点抓取“MLB”分数: www.scoresandodds.com/pgrid_20160628.html?sort=rot

一段 HTML 代码如下所示:

<div xmlns:dat="http://scoresandodds.com/dataset-main" class="section">
<div class="heading"><
span class="league">MLB</span>
<span class="date">06/28/2016</span>
</div><table cellpadding="0" summary="" cellspacing="0" border="0"><thead>
<tr><th class="first">Team</th><th>Pitcher</th><th>Open</th><th>Current</th><th>Runline</th><th>Scores</th><th>Notes</th>
</tr></thead><tbody><tr><td class="teamName">901 <a href="http://scoresandodds.com/statfeed/statfeed.php?page=MLB/MLBteam&amp;teamid=NY+METS&amp;season=">NEW YORK METS</a></td>
<td class="pitcher">(r) harvey, m</td>
<td class="line">8</td>
<td class="line">8.5o15</td>
<td class="line">+1.5(-200)</td>
<td class="score">0 Under 8.5</td>

我的代码以:

开头
url = "http://www.scoresandodds.com/pgrid_"+date+".html?sort=rot"
soup =  BeautifulSoup(urllib2.urlopen(url), 'html.parser')
scores = soup.find_all("span", {"class": "league"})

print(scores) 
[<span class="league">MLB</span>, <span class="league">WNBA</span>]

它返回的内容很棒,但我不清楚如何只抓取“MLB”分数的数据。

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    找到MLB“标签”并仅通过find_next()获得第一个关注table

    mlb_table = soup.find("span", class_="league", text="MLB").find_next("table")
    

    【讨论】:

    • 华丽 - 谢谢!
    猜你喜欢
    • 1970-01-01
    • 2019-01-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-04-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多