BeautifulSoup 发现 div 正在返回整个页面而不是单个元素答案

【问题标题】：BeautifulSoup find div is returning the entire page and not a single elementBeautifulSoup 发现 div 正在返回整个页面而不是单个元素
【发布时间】：2019-08-17 23:36:55
【问题描述】：

我正在尝试从此页面中提取公司描述：https://angel.co/company/sensor-tower，但 BeautifulSoup 正在返回页面的整个文本。

我尝试了desc = soup.find('div', class_="content").get_text().strip()，它适用于网站上的其他页面，但正在返回此页面上的所有文本。

预期的输出应该是：

Sensor Tower is a comprehensive mobile market intelligence platform that delivers crucial insights into the global app economy. Our flagship Store Intelligence product is an enterprise level offering that provides high-accuracy, worldwide app download and revenue estimates for Apple's App Store and Google Play.

Our best-of-class research interface, which seamlessly integrates across our Store Intelligence, Ad Intelligence, and App Intelligence products, is utilized by executives and analysts alike to drive key business decisions. Our products are counted on by the app world's largest publishers, Fortune 500 companies, and financial institutions to surface emerging market trends, benchmark performance, and grow app businesses at enterprise scale.

【问题讨论】：

似乎content这个类不在初始源代码中，但它是生成的。所以你必须使用其他东西，比如无头浏览器，而不是 BeautifulSoup。
不，它在来自服务器的 HTTP 响应中，它不是动态添加的

标签： python beautifulsoup

【解决方案1】：

该页面上有两个div 标记，其类为content。其中一个（我的副本中的第 590 行）包含很多东西，而另一个（我的副本中的第 620 行）仅包含您要查找的描述。 BeautifulSoup 正在返回第一个。

使用find("div", class_="product_desc") 可能会更好地选择您想要的元素。

【讨论】：