Beautifulsoup 找不到标签答案

【问题标题】：Beautifulsoup not finding the tagBeautifulsoup 找不到标签
【发布时间】：2020-06-04 13:50:29
【问题描述】：

我正在做一个关于beautifulsoup的项目

from bs4 import BeautifulSoup as soup
from requests import get



url = "https://www.yelp.com/search?find_desc=&find_loc=New+York%2C+NY&ns=1"
clnt = get(url)
page=soup(clnt.text,"html.parser")
container = page.findAll("div",{"class":"lemon--div__373c0__1mboc container__373c0__ZB8u4 hoverable__373c0__3CcYQ margin-t3__373c0__1l90z margin-b3__373c0__q1DuY padding-t3__373c0__1gw9E padding-r3__373c0__57InZ padding-b3__373c0__342DA padding-l3__373c0__1scQ0 border--top__373c0__3gXLy border--right__373c0__1n3Iv border--bottom__373c0__3qNtD border--left__373c0__d1B7K border-color--default__373c0__3-ifU"})

container = container[1]
url2= "https://www.yelp.com"+container.a["href"]
clnt2 = get(url2)
page2 = soup(clnt2.text, 'html.parser')

info = page2.find("div",{"class":"lemon--div__373c0__1mboc island__373c0__3fs6U u-padding-t1 u-padding-r1 u-padding-b1 u-padding-l1 border--top__373c0__19Owr border--right__373c0__22AHO border--bottom__373c0__uPbXS border--left__373c0__1SjJs border-color--default__373c0__2oFDT background-color--white__373c0__GVEnp"})

contact=info.div (Example contact variable)

在这个“信息”变量中，我得到了包含所有联系方式的 div，我想从这个 div 中获取联系号码

当我打印这个“信息”变量时，它还显示了联系号码。存在于变量中，包括其他详细信息，但是当我遍历 div 以获取联系号码时，我找不到它。我还尝试获取所有子 div，甚至包括 div 本身的类，我无法得到它

给出的第一个网址是：https://www.yelp.com/search?find_desc=&find_loc=New+York%2C+NY&ns=1

第二个 url "url2" 是这个：https://www.yelp.com/biz/levain-bakery-new-york 里面有联系方式

任何解决方案???

【问题讨论】：

requests 不要运行 java 脚本，所以你不会得到任何动态内容供 beautifulsoup 解析，使用 selenium 之类的东西而不是请求

标签： python html beautifulsoup tags

【解决方案1】：

您可以借助班级名称获取联系电话。但我严重怀疑它是否适用于任何给定的页面，因为类名似乎是动态的。不过你可以试一试。

from bs4 import BeautifulSoup as soup
from requests import get

url = "https://www.yelp.com/search?find_desc=&find_loc=New+York%2C+NY&ns=1"
clnt = get(url)
page=soup(clnt.text,"html.parser")
container = page.findAll("div",{"class":"lemon--div__373c0__1mboc container__373c0__ZB8u4 hoverable__373c0__3CcYQ margin-t3__373c0__1l90z margin-b3__373c0__q1DuY padding-t3__373c0__1gw9E padding-r3__373c0__57InZ padding-b3__373c0__342DA padding-l3__373c0__1scQ0 border--top__373c0__3gXLy border--right__373c0__1n3Iv border--bottom__373c0__3qNtD border--left__373c0__d1B7K border-color--default__373c0__3-ifU"})

container = container[1]
url2= "https://www.yelp.com"+container.a["href"]
clnt2 = get(url2)
page2 = soup(clnt2.text, 'html.parser')

info = page2.find("div",{"class":"lemon--div__373c0__1mboc island__373c0__3fs6U u-padding-t1 u-padding-r1 u-padding-b1 u-padding-l1 border--top__373c0__19Owr border--right__373c0__22AHO border--bottom__373c0__uPbXS border--left__373c0__1SjJs border-color--default__373c0__2oFDT background-color--white__373c0__GVEnp"})

ContactNumber = info.find("p",{"class":"lemon--p__373c0__3Qnnj text__373c0__2pB8f text-color--normal__373c0__K_MKN text-align--left__373c0__2pnx_"})

print(ContactNumber.text)

输出：

(917) 464-3769

【讨论】：

它现在正在工作！但让我们看看！非常感谢兄弟
如果有用，请回答标记。

【解决方案2】：

以下对我有用：

from bs4 import BeautifulSoup as soup
from requests import get

url = "https://www.yelp.com/search?find_desc=&find_loc=New+York%2C+NY&ns=1"
clnt = get(url)
page=soup(clnt.text,"html.parser")
container = page.findAll("div",{"class":"lemon--div__373c0__1mboc container__373c0__ZB8u4 hoverable__373c0__3CcYQ margin-t3__373c0__1l90z margin-b3__373c0__q1DuY padding-t3__373c0__1gw9E padding-r3__373c0__57InZ padding-b3__373c0__342DA padding-l3__373c0__1scQ0 border--top__373c0__3gXLy border--right__373c0__1n3Iv border--bottom__373c0__3qNtD border--left__373c0__d1B7K border-color--default__373c0__3-ifU"})

container = container[1]
url2= "https://www.yelp.com"+container.a["href"]
clnt2 = get(url2)
page2 = soup(clnt2.text, 'html.parser')

info = page2.find("div",{"class":"lemon--div__373c0__1mboc island__373c0__3fs6U u-padding-t1 u-padding-r1 u-padding-b1 u-padding-l1 border--top__373c0__19Owr border--right__373c0__22AHO border--bottom__373c0__uPbXS border--left__373c0__1SjJs border-color--default__373c0__2oFDT background-color--white__373c0__GVEnp"})

p_tags_of_info = info.find_all("p")
print(p_tags_of_info[2].text)

我刚刚从info变量中提取了所有<p>标签，然后选择了第三个<p>标签的文本。

【讨论】：

很遗憾听到这个消息。为您打印什么？

【解决方案3】：

试试这个。我已经尝试了第二个 URL，它提供了一个联系号码。

url = 'https://www.yelp.com/biz/levain-bakery-new-york'
page = requests.get(url)
soup1 = BeautifulSoup(page.content, "lxml")

info = soup1.find_all("div", {
    "class": "lemon--div__373c0__1mboc island-section__373c0__3vKXy border--top__373c0__19Owr border-color--default__373c0__2oFDT"})

contact_number = info[1].find("p", attrs={
    'class': 'lemon--p__373c0__3Qnnj text__373c0__2pB8f text-color--normal__373c0__K_MKN text-align--left__373c0__2pnx_'}).text

print(contact_number)

如果有用，请批准答案。其他人可能会帮助它。

【讨论】：