用beautifulsoup 拉动当前股价（雅虎）答案

【问题标题】：Pulling current stock price (Yahoo) with beautifulsoup用beautifulsoup 拉动当前股价（雅虎）
【发布时间】：2019-03-08 03:32:11
【问题描述】：

我在使用漂亮的汤 (python3) 获取最新股价时遇到问题

import requests
from money import Money
from bs4 import BeautifulSoup 

response = requests.get("https://finance.yahoo.com/quote/VTI?p=VTI")
soup = BeautifulSoup(response.content, "lxml")
price = soup.find('span', attrs = {"data-reactid": "34"})

这将返回“无”值。有什么我想念的吗？使用不同的页面，以下工作正常：

response = requests.get("https://finance.yahoo.com/lookup?s=VTI")
soup = BeautifulSoup(response.content,"lxml")
price = soup.find('td', attrs={"data-reactid": "59"})

不幸的是，该搜索页面并不总是与第一个结果完美匹配（搜索 VXUS 会带回 vxus 作为第二个结果）所以我希望能找到始终如一的东西，我想从实际页面效果最好。

提取 141.28 值的最佳方法是什么？

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup stock

【解决方案1】：

价格在那里并且可以按类别选择（id 之后第二快的选择器方法）

import requests
from bs4 import BeautifulSoup as bs

res = requests.get('https://finance.yahoo.com/quote/VXUS?p=VXUS')   # https://finance.yahoo.com/quote/VTI?p=VTI
soup = bs(res.content, 'lxml')
price = soup.select_one('.Trsdu\(0\.3s\)').text
print(price)

【讨论】：

这个解决方案没有硬编码索引，而且干净简洁。我想这应该是选择的。
@QHarr - '.Trsdu\(0\.3s\)'中的起始期有什么作用？
@JackFleeting 这是一个 CSS 类选择器

【解决方案2】：

import requests
from bs4 import BeautifulSoup 


response = requests.get("https://finance.yahoo.com/quote/VTI?p=VTI")
soup = BeautifulSoup(response.content, "lxml")

for stock in  soup.find_all('span', class_='Trsdu(0.3s) Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(b)'):
    print(stock.get_text())

这将返回 141.28

【讨论】：

【解决方案3】：

import requests
from bs4 import BeautifulSoup
import json

response = requests.get("https://finance.yahoo.com/quote/VTI?p=VTI")
soup = BeautifulSoup(response.content)
price = soup.findAll('script')
regularMarketPrice

a = price[-3].contents[0]

jjj = json.loads(a[111:-12])

jjj['context']['dispatcher']['stores']['StreamDataStore']['quoteData']['VTI']['regularMarketPrice']

这可能对你有帮助，先获取scriptdata，然后转成json，就可以找到你想要的数据了

【讨论】：

如果您现在只想要股票价格，@Robert Carlos 方法更有用

【解决方案4】：

所以它是一种解决方法，但由于这只是一个有趣的项目，以下工作可以得到正确的答案（尽管我更喜欢一个合适的、可扩展的解决方案）

response = requests.get("https://finance.yahoo.com/lookup/etf?s=vxus")
soup = BeautifulSoup(response.content,"lxml")
price = soup.select('table td')
print(price[2].text)

【讨论】：

【解决方案5】：

这是一个适合我的解决方案，但如果更新了网站元素，class_ 中的元素可能会发生变化，因此如果解决方案，我将从网站检查中复制并粘贴最新的元素失败了。

import requests
from bs4 import BeautifulSoup as bs

res = requests.get('https://finance.yahoo.com/quote/SQQQ/')   
soup = bs(res.content, 'lxml')
for stock in soup.find_all('span', class_='Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)'):
    print(stock.get_text())

【讨论】：