用beautifulsoup在一个div中获取孩子的文字答案

【问题标题】：Get text of children in a div with beautifulsoup用beautifulsoup在一个div中获取孩子的文字
【发布时间】：2014-01-20 07:18:57
【问题描述】：

您好，我想要 Google Playstore 中的应用程序描述。 (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

通过这段代码，我得到了这个类的全部内容。但我不能只得到其中的文字。我用 next_silbing 或 .text 尝试了很多东西，但它总是抛出错误（ResultSet 没有属性 xxx）。

我只想得到这样的文本：“Die Android App von wetter.com！Sie erhalten: ..:”

谁能帮帮我？

【问题讨论】：

标签： python html python-2.7 beautifulsoup urllib2

【解决方案1】：

在元素上使用.text 属性；你有一个 list 结果，所以循环：

for res in result:
    print(res.text)

.text 是一个代理 Element.get_text() method 的属性。

或者，如果只有一个这样的<div>，请使用.find()而不是.find_all()：

result = soup.find("div", {"class":"show-more-content text-body"})
print(result.text)

【讨论】：

请注意，根据文档，该属性不存在。但是，get_text() 函数可以。
@Mike'Pomax'Kamermans 这是一个文档错误。 .text 是 property that calls .get_text()。
hm，我在bugs.launchpad.net/beautifulsoup 上没有看到这方面的错误，而且这是一个写得很好的部分，所以......如果是的话，谈论@987654335 可能还是不错的@ 作为“东西”，.text 作为一个方便的快捷方式 - 如果人们想了解更多关于这个属性的信息，他们不会通过搜索 .text 在文档中找到它，而他们正在通过搜索get_text来查找东西。
@Mike'Pomax'Kamermans：很公平，补充道。

【解决方案2】：

使用decode_contents() 方法。

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

for res in result:
    print(res.decode_contents().strip())

您将从 div 中获取 innerHTML。

【讨论】：