【问题标题】:Get the Text from the next_sibling - BeautifulSoup 4从 next_sibling 获取文本 - BeautifulSoup 4
【发布时间】:2015-02-14 19:07:23
【问题描述】:

我想从this URL刮餐厅

for rests in dining_soup.select("div.infos-restos"):
    
    for rest in rests.select("h3"):
        safe_print("            Rest Nsme: "+rest.text)
        print(rest.next_sibling.next_sibling.next_sibling.next_sibling.contents)

输出

<div class="descriptif-resto">
<p>
<strong>Type of cuisine</strong>:International</p>
<p>
<strong>Opening hours</strong>:06:00-23:30</p>
<p>The Food Square bar and restaurant offers a varied menu in an elegant and welcoming setting. In fine weather you can also enjoy your meal next to the pool or relax on the garden terrace.</p>
</div>

print(rest.next_sibling.next_sibling.next_sibling.next_sibling.text)

输出总是空的

所以我的问题是如何从该 Div 中刮取 Type of cuisineopening hours

【问题讨论】:

    标签: python python-3.x beautifulsoup


    【解决方案1】:

    营业时间和美食在"descriptif-resto"文字中:

    import requests
    from bs4 import BeautifulSoup
    r = requests.get("http://www.accorhotels.com/gb/hotel-5548-mercure-niederbronn-hotel/restaurant.shtml")
    soup = BeautifulSoup(r.content)
    
    print(soup.find("div",attrs={"class":"descriptif-resto"}).text)
    
    Type of cuisine:Brasserie
    
    Opening hours:12:00 - 14:00 / 19:00 - 22:00
    

    名称在第一个h3标签中,类型和开放时间在两个p标签中:

    name = soup.find("div", attrs={"class":"infos-restos"}).h3.text
    det = soup.find("div",attrs={"class":"descriptif-resto"}).p   
    
    hours = det.find_next("p").text
    tpe = det.text
    print(name)
    print(hours)
    print(tpe)
    
    LA STUB DU CASINO
    
    Opening hours:12:00 - 14:00 / 19:00 - 22:00
    
    Type of cuisine:Brasserie
    

    好的,所以有些部分没有开放时间和美食,所以您必须对其进行微调,但这可以获得所有信息:

    from itertools import chain
    
    all_dets = soup.find_all("div", attrs={"class":"infos-restos"})
    # get all names from h3 tagsusing chain so we can zip later
    names = chain.from_iterable(x.find_all("h3") for x in  all_dets) 
    # get all info to extract cuisine, hours
    det = chain.from_iterable(x.find_all("div",attrs={"class":"descriptif-resto"}) for x in all_dets)
    # zipp appropriate details with each name
    zipped  = zip(names, det)
    
    for name, det in zipped:
        details = det.p
        name, tpe = name.text, details
        hours = details.find_next("p") if "cuisine" in det.p.text else ""
        if hours: # empty string means we have a bar
            print(name, tpe.text, hours.text)
        else:
             print(name, tpe.text)
        print("-----------------------------")
    
    LA STUB DU CASINO 
    Type of cuisine:Brasserie 
    Opening hours:12:00 - 14:00 / 19:00 - 22:00
    -----------------------------
    RESTAURANT DU CASINO IVORY 
    Type of cuisine:French 
    Opening hours:19:00 - 22:00
    -----------------------------
    BAR DE L'HOTEL LE DOLLY 
    Opening hours:10:00-01:00 
    -----------------------------
    BAR DES MACHINES A SOUS 
    Opening hours:10:30-03:00 
    -----------------------------
    

    【讨论】:

    • 我如何将餐厅名称、菜肴类型和营业时间转换为单独的变量,以便以后存储在 DB 中
    • 一秒,所以你只想要那三个?
    • 是的,放到单独的变量中,这样我以后可以从我的代码中存储到数据库中
    • 感谢您的努力......但It餐厅RESTAURANT DU CASINO IVORY没有被刮掉
    • 你在搜索多家餐馆吗?
    猜你喜欢
    • 2020-05-03
    • 2016-03-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-02-21
    • 2021-05-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多