我的第一个 Python Web Scraper 的问题答案

【问题标题】：Issues with my first Python Web Scraper我的第一个 Python Web Scraper 的问题
【发布时间】：2025-12-30 14:00:11
【问题描述】：

我正在编写我的第一个 python 网络抓取工具，但我无法编写代码来抓取我想要的数据。

到目前为止，这是我的代码：

import bs4 as bs
import urllib.request

source = urllib.request.urlopen ('http://finviz.com/screener.ashx?v=340&s=ta_topgainers')
soup = bs.BeautifulSoup(source, "html.parser")
#Ticker = 'quote.ashx?t'

print (Ticker)

我想从网站上提取的是这段代码：

<a href="quote.ashx?t=ETRM&ty=c&p=d&b=1">

这是整行，但我只对上面的部分感兴趣：

<a href="quote.ashx?t=ETRM&ty=c&p=d&b=1"><img src="chart.ashx?t=ETRM&ta=1&ty=c&p=d&s=l" alt="" width="700" height="340" border="0"/></a></td>

具体来说，我想提取股票代码，在本例中为 $ETRM。我想从上面的页面中提取所有上述格式的股票代码。

我尝试隔离quote.ashx?t，但它只是返回页面的整个源代码。

【问题讨论】：

我认为您需要通过filter lambda or function 运行结果

标签： python web-scraping

【解决方案1】：

soup.select('a[href^="quote.ashx?t"]') # select a tag which have href starts with quote.ashx?t

出来：

[<a href="quote.ashx?t=ETRM&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=ETRM&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=ETRM&amp;ty=c&amp;p=d&amp;b=1">ETRM</a>,
 <a href="quote.ashx?t=SSY&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=SSY&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=SSY&amp;ty=c&amp;p=d&amp;b=1">SSY</a>,
 <a href="quote.ashx?t=PTX&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=PTX&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=PTX&amp;ty=c&amp;p=d&amp;b=1">PTX</a>,
 <a href="quote.ashx?t=ZFGN&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=ZFGN&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=ZFGN&amp;ty=c&amp;p=d&amp;b=1">ZFGN</a>,
 <a href="quote.ashx?t=JTPY&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=JTPY&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=JTPY&amp;ty=c&amp;p=d&amp;b=1">JTPY</a>,
 <a href="quote.ashx?t=ARWR&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=ARWR&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=ARWR&amp;ty=c&amp;p=d&amp;b=1">ARWR</a>,
 <a href="quote.ashx?t=PCRX&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=PCRX&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=PCRX&amp;ty=c&amp;p=d&amp;b=1">PCRX</a>,
 <a href="quote.ashx?t=ATOS&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=ATOS&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=ATOS&amp;ty=c&amp;p=d&amp;b=1">ATOS</a>,
 <a href="quote.ashx?t=QTNT&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=QTNT&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=QTNT&amp;ty=c&amp;p=d&amp;b=1">QTNT</a>,
 <a href="quote.ashx?t=GBX&amp;ty=c&amp;p=d&amp;b=1"><img alt="" border="0" height="340" src="chart.ashx?t=GBX&amp;ta=1&amp;ty=c&amp;p=d&amp;s=l" width="700"/></a>,
 <a class="tab-link" href="quote.ashx?t=GBX&amp;ty=c&amp;p=d&amp;b=1">GBX</a>]

【讨论】：

【解决方案2】：

您可以通过部分匹配href 值与CSS selector 来定位所需的链接：

link = soup.select_one("a[href*=ETRM]")
print(link["href"])

【讨论】：