BeautifulSoup：在定义的h2标签之间拉p标签答案

【问题标题】：BeautifulSoup: Pull p tag while between defined h2 tagsBeautifulSoup：在定义的h2标签之间拉p标签
【发布时间】：2017-12-25 12:39:43
【问题描述】：

这让我有点困惑。我正在尝试通过“New Fundings”和“New Funds”的名称从“h2”标签下的“p”标签中提取所有文本。每个页面的“p”标签数量并不一致，所以我在考虑某种 while 循环，但我尝试过的方法不起作用。每个格式

标签通常是带有“强大”的公司名称，然后是列出资金/投资人的文本和其他“强大”标签。

一旦我可以正确解析它，目标是从“强”标签中导出公司名称以及后续文本和投资公司/人员（从“p”块中的“强”标签中导出一些数据分析。

任何帮助将不胜感激 - 是的，我查看了其他各种帮助页面，但我所做的尝试并没有成功，所以我来到这里。

import requests
page = requests.get("https://www.strictlyvc.com/2017/06/13/strictlyvc-june-12-2017/")
page
page.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
entrysoup = soup.find(class_ = 'post-entry')

//试图拉出正确的段落，但这些只选择下一个，我想要所有的

标签位于“New Fundings”和“New Funds”下（基本上，直到下一个不是这两个标签的标签。

print(entrysoup.find('h2', text = 'New Fundings').find_next_sibling('p'))
print(entrysoup.find('h2', text = 'New Funds').find_next_sibling('p'))

// 这更接近了，但我不知道如何让它在碰到非新基金/新基金标签时停止

for strong_tag in entrysoup.find_all('strong'):
    print (strong_tag.text, strong_tag.next_sibling)

【问题讨论】：

可惜bs做不到h2 ~ p:has(~ h2)

标签： beautifulsoup html-parsing python-3.5

【解决方案1】：

我认为这是我目前能得到的最好结果。如果这不是您想要的，请告诉我，以便我可以更多地摆弄。如果将其标记为答案:)

    import requests
    import bs4

    page = requests.get("https://www.strictlyvc.com/2017/06/13/strictlyvc-june-12-2017/")
    soup =bs4.BeautifulSoup(page.content, 'html.parser')
    entrysoup = soup.find(class_ = 'post-entry')

    Stop_Point = 'Also Sponsored By . . .'

    for strong_tag in entrysoup.find_all('h2'):

        if strong_tag.get_text() == 'New Fundings':
            for sibling in strong_tag.next_siblings:
                if isinstance(sibling, bs4.element.Tag):
                    print(sibling.get_text())

                    if sibling.get_text() == Stop_Point:
                        break

                if sibling.name == 'div':
                    for children in sibling.children:
                        if isinstance(children, bs4.element.Tag):
                            if children.get_text() == Stop_Point:
                                break

                            print(children.get_text())

【讨论】：

是的，这很有帮助！谢谢@Mohamed - 我将能够调整它以自动化其余页面/解析。