【问题标题】:extract text from beautifulsoup lxml file从 beautifulsoup lxml 文件中提取文本
【发布时间】:2020-10-01 10:26:23
【问题描述】:

如何从div class="ember-view" id="ember760"> 开始提取此 lxml 中的文本。 请帮忙。我尝试了以下代码,但没有捕获文本。

我尝试过的代码

#soup is an beautifulsoup element

exp = soup.find('header', {'class': 'pv-profile-section__card-header'})
exp

lxml 文件

<div class="pv-recommendation-entity__highlights">
<blockquote class="pv-recommendation-entity__text relative">
<div class="ember-view" id="ember760"> <span class="lt-line-clamp__line">I know Abc from Data Analysis training sessions with abc,</span>
<span class="lt-line-clamp__line">Abc
is an enthusiastic candidature in training sessions. He is an</span>
<span class="lt-line-clamp__line">extremely capable and dedicated entry-level Data Science Analyst.</span>
<span class="lt-line-clamp__line">He is enhancing Analytics skills by his enthusiasm for learning new</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
      things, and has learnt new tools like R, SPSS, and Pytho<span class="lt-line-clamp__ellipsis">...
            <a aria-expanded="false" class="lt-line-clamp__more" data-test-line-clamp-show-more-button="true" href="#" id="line-clamp-show-more-button" role="button">See more</a>
</span></span>
<!-- --><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">... <a class="lt-line-clamp__more" href="#" role="button">See more</a></span></div>
</blockquote>
</div>
</li>
</ul>
<!-- --></div>
</div></div>

预期输出

I know Abc from Data Analysis training sessions with abc,
is an enthusiastic candidature in training sessions. He is an
extremely capable and dedicated entry-level Data Science Analyst.
He is enhancing Analytics skills by his enthusiasm for learning new
      things, and has learnt new tools like R, SPSS, and Pytho

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    您可以使用 CSS 选择器 div#ember760 选择 &lt;div class="ember-view" id="ember760"&gt;.get_text() 方法:

    from bs4 import BeautifulSoup
    
    
    txt = '''
    <div class="pv-recommendation-entity__highlights">
    <blockquote class="pv-recommendation-entity__text relative">
    <div class="ember-view" id="ember760"> <span class="lt-line-clamp__line">I know Abc from Data Analysis training sessions with abc,</span>
    <span class="lt-line-clamp__line">Abc
    is an enthusiastic candidature in training sessions. He is an</span>
    <span class="lt-line-clamp__line">extremely capable and dedicated entry-level Data Science Analyst.</span>
    <span class="lt-line-clamp__line">He is enhancing Analytics skills by his enthusiasm for learning new</span>
    <span class="lt-line-clamp__line lt-line-clamp__line--last">
          things, and has learnt new tools like R, SPSS, and Pytho<span class="lt-line-clamp__ellipsis">...
                <a aria-expanded="false" class="lt-line-clamp__more" data-test-line-clamp-show-more-button="true" href="#" id="line-clamp-show-more-button" role="button">See more</a>
    </span></span>
    <!-- --><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">... <a class="lt-line-clamp__more" href="#" role="button">See more</a></span></div>
    </blockquote>
    </div>
    </li>
    </ul>
    <!-- --></div>
    </div></div>'''
    
    soup = BeautifulSoup(txt, 'lxml')
    
    print(soup.select_one('div#ember760').get_text(strip=True, separator='\n'))
    

    打印:

    I know Abc from Data Analysis training sessions with abc,
    Abc
    is an enthusiastic candidature in training sessions. He is an
    extremely capable and dedicated entry-level Data Science Analyst.
    He is enhancing Analytics skills by his enthusiasm for learning new
    things, and has learnt new tools like R, SPSS, and Pytho
    ...
    See more
    ...
    See more
    

    【讨论】:

      【解决方案2】:
      soup = BeautifulSoup(html, 'lxml')
      lines = soup.select('div.ember-view > span.lt-line-clamp__line')
      text = ''.join([line.find(text=True, recursive=False) for line in lines])
      print(text)
      

      给出文本:

      I know Abc from Data Analysis training sessions with abc,Abc
      is an enthusiastic candidature in training sessions. He is anextremely capable and dedicated entry-level Data Science Analyst.He is enhancing Analytics skills by his enthusiasm for learning new
            things, and has learnt new tools like R, SPSS, and Pytho
      

      “查看更多..”将被忽略

      【讨论】:

        猜你喜欢
        • 2013-08-25
        • 1970-01-01
        • 2016-06-22
        • 1970-01-01
        • 2022-01-22
        • 2016-12-27
        • 2021-07-12
        • 1970-01-01
        • 2023-04-02
        相关资源
        最近更新 更多