使用 beautifulsoup 或 selenium 在具有其他类的其他 div 中选择该特定的 div 类名称，但不选择其他 div答案

【问题标题】：Selecting that particular div class name among but not other divs which have other classes as well available using beautifulsoup or selenium使用 beautifulsoup 或 selenium 在具有其他类的其他 div 中选择该特定的 div 类名称，但不选择其他 div
【发布时间】：2020-11-24 02:42:21
【问题描述】：

从我这边来看，这是一个棘手的问题，我被困在网页抓取部分并且无法继续进行。

https://i.stack.imgur.com/r4tN2.png

我只需要循环中的网格单元答案

我尝试过使用

grid_cell=driver.find_element_by_css_selector('#tags-browser > div:nth-child(2) > div.mt-auto.grid.jc-space-between.fs-caption.fc-black-300 > div:nth-child(1)')

现在显示标签的文本将显示 2061748 个问题

grid_cell.text

但这仅适用于一个元素。

如果我想把它放在一个循环中，我需要该页面中所有可用标签的所有计数？

在这种情况下，根据图像，我在 '''javascript''' 和 '''java''' 上迭代了一个 for 循环但 get_element_using_css_selector 将为 java 或 javascript 提供一个特定的计数，但不会为两者都提供。

如果我选择的话

tag_counts = body.find_all('div', class_='grid_cell')

然后我会得到其他在所附图片中位于网格单元下方的类。

请提出一些解决方案。任何帮助将不胜感激。

【问题讨论】：

发布网站网址和您要提取的内容
stackoverflow.com/tags

标签： python html css web-scraping beautifulsoup

【解决方案1】：

有两种方法可以实现：

第一个选项： 删除您不想抓取的标签，然后抓取您想要的标签。例如：

tags = body.find_all('div', class_='grid_cell s-anchor') # TODO: add full class name (to remove this tag) 
for tag in tags:
    tag.extract() # Remove tag from body

tags = body.find_all('div', class_='grid_cell') # This will contain all the tags you want.

第二个选项： 遍历父 html 标签并使用 find() 获取第一个标签。例如：

containers = body.find_all('div', class_='mt-auto grid') # Find parent tag 
for container in containers:
    tag = container.find('div', class_='grid_cell') # Get first tag in the container div
    print(tag.text.strip())

【讨论】：

感谢您的及时回复，看起来不错。我会试试，让你知道
我试过用这个：#now get the count of all the questions in each tag containers = body.find_all('div', class_='s-card js-tag-cell grid fd-column') # Find parent tag for container in containers: tag= container.find('div', class_='grid jc-space-between ai-center mb12') tag_count = container.find('div', class_='mt-auto grid jc-space-between fs-caption fc-black-300') # Get first tag in the container div print(tag.text.strip()) print(tag_count.text.strip())
但它给了我错误的数据：drive.google.com/file/d/1eq2jg-a34cEpwTNvf915C_OYLB-MkI8o/…