美丽的汤。找不到任何东西答案

【问题标题】：beautiful soup .find can't find anything美丽的汤。找不到任何东西
【发布时间】：2020-01-04 01:49:33
【问题描述】：

我正在尝试抓取 Facebook 群组中的帖子：

URL = 'https://www.facebook.com/groups/110354088989367/'

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}


def checkSubletGroup():
    page = requests.get(URL, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    posts = soup.find_all("div", {"class_": "text_exposed_root"})
    print(soup.prettify())
    for post in posts:
        print(post)


checkSubletGroup()

div 和 class="text_exposed_root" 显然在那里，因为当我搜索 print(soup.prettify()) 时，我可以使用 CTRLf 找到它，但当我搜索 @987654325 时@它返回一个空列表，许多其他类名也很明显。

请帮忙。

【问题讨论】：

只有当class_是关键字参数时才需要使用，而不是在字典中。
刚刚发现问题：所有<div> 在该页面的源代码中都被注释掉了。我猜bs4 会忽略这些标签。

标签： python beautifulsoup

【解决方案1】：

问题是所有<div> 都在一个注释掉的HTML 块中。

这样的事情可以解决这个问题：

soup = BeautifulSoup(page.text.replace('<!--', '').replace('-->', ''), 'html.parser')

之后你可以简单地做：

posts = soup.find_all('div', 'text_exposed_root')

希望对你有帮助。

【讨论】：

【解决方案2】：

您只需要在使用关键字参数检查类时使用class_，因为class 是Python 保留字，不能用作变量。如果您将属性作为字典传递，则只需使用class。

应该是这样的

posts = soup.find_all("div", {"class": "text_exposed_root"})

或

posts = soup.find_all("div", class_ = "text_exposed_root")

【讨论】：

这些选项似乎都没有效果。我自己都试过了，尽管div 在汤里，但他们返回了一个空列表。
也试过这个..任何一种方式都有效，但在这种情况下不起作用......认为@accdias有正确的想法