【问题标题】:Change BeautifulSoup URL with Try/Except Inside For Loop在 For 循环内使用 Try/Except 更改 BeautifulSoup URL
【发布时间】:2020-01-02 20:43:54
【问题描述】:

我正在抓取一个经常会收到AttributeError 的网站。发生这种情况时,我需要通过在循环遍历的 ID 列表中添加几个前导零来重试 URL。

import requests
from bs4 import BeautifulSoup as bs

ids = ['23135106', '37833100', '57636Q104']

base_url = "https://quotes.fidelity.com/mmnet/SymLookup.phtml?reqforlookup=REQUESTFORLOOKUP&productid=mmnet&isLoggedIn=mmnet&rows=50&for=stock&by=cusip&criteria="

# Create empty list to store scraped symbols
symbols = []
for id in ids:
    url = base_url + id + "&submit=Search"
    r = requests.get(url)
    soup = bs(r.content, 'lxml')
    try:
        symbol = soup.select_one('[href*=SID_VALUE_ID]').text
        print(symbol)
    except AttributeError:
        print("N/A")
    else:
        symbols.append(symbol)

因此,我不想在异常上打印N/A,而是想通过添加前导零来重试ID(例如,23135106 变为023135106,这是有效的),然后如果失败,重试通过添加两个前导零等。在某些情况下,ID 将失败,尽管添加了多少前导零,此时可以返回 N/A

我怎样才能做到这一点?

【问题讨论】:

  • 避免使用id作为变量名,

标签: python loops exception beautifulsoup


【解决方案1】:

只需使用另一个 for 循环:

import requests
from bs4 import BeautifulSoup

IDS = ['23135106', '37833100', '57636Q104']
BASE_URL = "https://quotes.fidelity.com/mmnet/SymLookup.phtml?reqforlookup=REQUESTFORLOOKUP&productid=mmnet&isLoggedIn=mmnet&rows=50&for=stock&by=cusip&criteria={}&submit=Search"

def read_symbol(id):
    r = requests.get(BASE_URL.format(id))
    soup = BeautifulSoup(r.content, 'lxml')
    symbol = soup.select_one('[href*=SID_VALUE_ID]')
    return symbol.text if symbol is not None else None

symbols = []
for id in IDS:
    for zeros in range(11):
        symbol = read_symbol(zeros * "0" + id)
        if symbol is not None:
            print(symbol)
            symbols.append(symbol)
            break

【讨论】:

  • 谢谢,这让我很接近,但是如果idgdju753 这样无效,我将如何编辑它以便symbol = 'N/A'
猜你喜欢
  • 1970-01-01
  • 2016-04-06
  • 2020-02-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多