【发布时间】:2019-08-17 23:36:55
【问题描述】:
我正在尝试从此页面中提取公司描述:https://angel.co/company/sensor-tower,但 BeautifulSoup 正在返回页面的整个文本。
我尝试了desc = soup.find('div', class_="content").get_text().strip(),它适用于网站上的其他页面,但正在返回此页面上的所有文本。
预期的输出应该是:
Sensor Tower is a comprehensive mobile market intelligence platform that delivers crucial insights into the global app economy. Our flagship Store Intelligence product is an enterprise level offering that provides high-accuracy, worldwide app download and revenue estimates for Apple's App Store and Google Play.
Our best-of-class research interface, which seamlessly integrates across our Store Intelligence, Ad Intelligence, and App Intelligence products, is utilized by executives and analysts alike to drive key business decisions. Our products are counted on by the app world's largest publishers, Fortune 500 companies, and financial institutions to surface emerging market trends, benchmark performance, and grow app businesses at enterprise scale.
【问题讨论】:
-
似乎
content这个类不在初始源代码中,但它是生成的。所以你必须使用其他东西,比如无头浏览器,而不是 BeautifulSoup。 -
不,它在来自服务器的 HTTP 响应中,它不是动态添加的
标签: python beautifulsoup