BeautifulSoup Python 获取内容答案

【问题标题】：BeautifultSoup Python get contentBeautifulSoup Python 获取内容
【发布时间】：2022-11-14 02:00:31
【问题描述】：

我通常不使用 Python 中的 BeautifulSoup，所以我很难在网页中找到与 Ibex 35 匹配的值 8.133,00：https://es.investing.com/indices/indices-futures 到目前为止，我正在获取页面的所有信息，但我无法过滤以获取该值：

site = 'https://es.investing.com/indices/indices-futures'
hardware = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:106.0) Gecko/20100101 
Firefox/106.0'}
request = Request(site,headers=hardware)
page = urlopen(request)
soup = BeautifulSoup(page, 'html.parser')
print(soup)

我很感激能得到那个价值的手。问候

【问题讨论】：

invest.com 似乎正在使用一种反抓取算法。
是的，它有。所以我放弃它。谢谢
不要轻易放弃，伙计...

标签： python beautifulsoup

【解决方案1】：

这是获取该信息的一种方法 - 一个包含该表中所有信息的数据帧，其中包含 IBEX 35、DAX 等，然后您可以根据需要对该数据帧进行切片。

import pandas as pd
from bs4 import BeautifulSoup as bs
import cloudscraper

scraper = cloudscraper.create_scraper(disableCloudflareV1=True)

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

url = 'https://es.investing.com/indices/indices-futures'
r = scraper.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
table = soup.select_one('table[class="datatable_table__D_jso quotes-box_table__nndS2 datatable_table--mobile-basic__W2ilt"]')
df = pd.read_html(str(table))[0]
print(df)

结果在终端：

    0   1   2   3   4
0   IBEX 35derived  8.098,10    -3510   -0,43%  NaN
1   US 500derived   3.991,90    355 +0,90%  NaN
2   US Tech 100derived  11.802,20   1962    +1,69%  NaN
3   Dow Jones   33.747,86   3249    +0,10%  NaN
4   DAXderived  14.224,86   7877    +0,56%  NaN
5   Índice dólarderived 106255  -1837   -1,70%  NaN
6   Índice euroderived  11404   89  +0,79%  NaN

见https://pypi.org/project/cloudscraper/

【讨论】：