【问题标题】:Beautiful soup nested div recursive get text美丽的汤嵌套div递归获取文本
【发布时间】:2018-05-08 04:03:17
【问题描述】:

我想要嵌套 div 中的数据,但我无法获得它。

有嵌套的 div 我需要正确格式化数据。

我已经编写了 bs4 模块,但出现错误

BeautifulSoup: AttributeError: 'NavigableString' 对象没有属性 'name'

请帮助我!

我的 HTML

<div id="new">
    <div id="newDat">
        <div class="Data">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>

            </div>
        </div>
        <div class="DataOne">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>

            </div>
        </div>
        <div class="DataTwo">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>  
            </div>
        </div>
        <div class="DataThree">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>

            </div>
        </div>
    </div>
</div>

我的美丽汤代码

    li = page.find('div', {'id': 'new'})
    for tag in li:
        for i in tag.find_all("div", {"class": "name"}):
            print i.getText()
            break

        for i in tag.find_all("div", {"class": "details"}):
            print i.getText()
            break

        for i in tag.find_all("div", {"class": "header"}):
            print i.getText()
            break


        for i in tag.find_all("div", {"class": "ideas"}):
            print i.getText()
            break


        for i in tag.find_all("div", {"class": "profile"}):
            print i.getText()
            break

        for i in tag.find_all("div", {"class": "phone"}):
            print i.getText()
            break

我想要这样的输出

Div one 
Name : Jack Daniels
Details : Barcelona
header : Looking for meeting new people
ideas : I have new ideas
profile: MS in cs
tel : +123123123123123231

Div two 
Name : Jack Daniels
Details : Barcelona
header : Looking for meeting new people
ideas : I have new ideas
profile: MS in cs
tel : +123123123123123231

等等。

如果我在 &lt;div id = "new"&gt; 中有 100 个 Div,我需要这样的输出。

【问题讨论】:

  • 为什么在第 1 次迭代之后,所有这些 for 循环都带有 break?你可以直接使用find,例如:tag.find("div", {"class": "name"}).text
  • 谢谢@t.m.adam 我已经尝试过了,但我需要div的内容div

标签: python beautifulsoup


【解决方案1】:

你可以这样做。这将返回每个 div 的数据。

from bs4 import BeautifulSoup
soup = BeautifulSoup(b) // b is html 
rows =soup.find_all('div', {'class': 'DataNew'})
for tag in rows:
    for tag in li:
    for i in tag.find_all("div", {"class": "name"}):
        print i.getText()
        break

    for i in tag.find_all("div", {"class": "details"}):
        print i.getText()
        break

    for i in tag.find_all("div", {"class": "header"}):
        print i.getText()
        break


    for i in tag.find_all("div", {"class": "ideas"}):
        print i.getText()
        break


    for i in tag.find_all("div", {"class": "profile"}):
        print i.getText()
        break

    for i in tag.find_all("div", {"class": "Data Right"}):
        print i.getText()
        break

【讨论】:

    猜你喜欢
    • 2019-04-21
    • 2015-07-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-05-24
    • 2018-03-12
    • 1970-01-01
    相关资源
    最近更新 更多