【问题标题】:beautiful soup find_all doesn't find everything even with Selenium漂亮的汤 find_all 即使使用 Selenium 也无法找到所有内容
【发布时间】:2021-07-03 22:35:46
【问题描述】:

我在 stackoverflow 上查看了大部分关于美丽汤的问题,只从网站上抓取了一半的数据,但到目前为止,它们都没有奏效。我尝试将功能更改为 lxml 或 html5lib 等。我也尝试使用 selenium,现在我尝试使用 selenium 将网站一直向下滚动以加载网站上的所有内容并使用漂亮的汤来刮取数据,但它只保留当需要超过 100 个项目时,抓取 16 个项目。我在下面附上了我的代码。

我要抓取的网站链接:https://www.ranker.com/list/kpop-disbanded-groups/ranker-music?ref=listed_on&pos=2

from selenium import webdriver
from selenium.webdriver.common import timeouts
from selenium.webdriver.common.keys import Keys
import os
from bs4 import BeautifulSoup
import requests
import time

url = 'https://www.ranker.com/list/kpop-disbanded-groups/ranker-music?ref=listed_on&pos=2'
driver = webdriver.Safari()
driver.get(url)

SCROLL_PAUSE_TIME = 3
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        html_content = driver.execute_script('return document.body.innerHTML')
        soup = BeautifulSoup(html_content, 'html.parser')
        for years in soup.findAll('div', class_= 'gridItem_itemDescription__2Etxm gridItem_blather__2Mozw'):
            print(years.p.text)
        break
    last_height = new_height

【问题讨论】:

  • 您好,欢迎来到本站。如果您在帖子中提出问题,您可能会收到更多回复。

标签: python selenium beautifulsoup


【解决方案1】:
import requests
from bs4 import BeautifulSoup


def main(url):
    params = {
        "limit": 200,
        "offset": 0,
        "useDefaultNodeLinks": "false",
        "liCacheKey": "decacb20-5d77-4b04-a871-7b2c54e3db15",
        "include": "votes,wikiText,rankings,serviceProviders,openListItemContributors,taggedLists",
        "propertyFetchType": "ALL"
    }
    r = requests.get(url, params=params).json()['listItems']
    for x in r:
        yr = x.get('blather', 'N/A')
        soup = BeautifulSoup(yr, 'lxml')
        print("Name: {:<30}, Year: {}".format(
            x['name'], soup.get_text(strip=True)))


main('https://api.ranker.com/lists/2713714/items')

输出:

Name: 2NE1                          , Year: 2009–2016
Name: Wanna One                     , Year: 2017-2019
Name: SISTAR                        , Year: 2010–2017
Name: 4Minute                       , Year: 2009–2016
Name: I.O.I                         , Year: 2016–2017
Name: Wonder Girls                  , Year: 2007–2017
Name: X1                            , Year: 2019–2020
Name: T-ara                         , Year: N/A
Name: Miss A                        , Year: 2010–2017
Name: PRISTIN                       , Year: N/A
Name: IZ*ONE                        , Year: 2018–2021
Name: KARA                          , Year: 2007-2016
Name: Triple H                      , Year: N/A
Name: Orange Caramel                , Year: N/A
Name: After School                  , Year: N/A
Name: GFriend                       , Year: 2015–2021
Name: Pristin V                     , Year: N/A
Name: Nine Muses                    , Year: 2010-2019
Name: 2AM                           , Year: 2008–2017
Name: JBJ                           , Year: 2017–2018
Name: Boyfriend                     , Year: N/A
Name: HELLOVENUS                    , Year: N/A
Name: Sistar19                      , Year: 2011–2017
Name: UNB                           , Year: N/A
Name: Rainbow                       , Year: 2009–2016
Name: The Ark                       , Year: N/A
Name: History                       , Year: 2013–2017
Name: UNI.T                         , Year: N/A
Name: MADTOWN                       , Year: 2014–2017
Name: MYTEEN                        , Year: N/A
Name: Speed                         , Year: 2012–2016
Name: SPICA                         , Year: 2012–2017
Name: Stellar                       , Year: 2011–2018
Name: FIESTAR                       , Year: 2012-2018
Name: Secret                        , Year: 2009–2018
Name: B.I.G                         , Year: 2014- 2020
Name: Melody Day                    , Year: 2012-2018
Name: RaNia                         , Year: N/A
Name: C-Clown                       , Year: 2012–2015
Name: Hi Suhyun                     , Year: N/A
Name: BESTie                        , Year: 2013-2018
Name: Playback                      , Year: N/A
Name: HIGH4                         , Year: N/A
Name: Seo Taiji and Boys            , Year: N/A
Name: 15&                           , Year: N/A
Name: Glam                          , Year: 2012–2015
Name: CHI CHI                       , Year: N/A
Name: BIGSTAR                       , Year: N/A
Name: Coed School                   , Year: 2010-2013
Name: Bambino                       , Year: N/A
Name: 1Punch                        , Year: 2015
Name: EvoL                          , Year: 2012–2015
Name: A-Jax                         , Year: N/A
Name: Rainz                         , Year: N/A
Name: Leessang                      , Year: 2002–2017
Name: Tiny-G                        , Year: 2012–2015
Name: M&D                           , Year: N/A
Name: LIPBUBBLE                     , Year: N/A
Name: Homme                         , Year: 2010–2018
Name: Drug Restaurant               , Year: N/A
Name: 2YOON                         , Year: 2013–2016
Name: HONEYST                       , Year: N/A
Name: Jewelry                       , Year: 2001–2015
Name: D.Holic                       , Year: 2014–2017
Name: Lucky J                       , Year: 2014-2016
Name: M.I.B                         , Year: 2011–2017
Name: Tahiti                        , Year: N/A
Name: The Legend                    , Year: 2014–2017
Name: Wassup                        , Year: N/A
Name: Rainbow Pixie                 , Year: 2009–2016
Name: 14U                           , Year: N/A
Name: The East Light                , Year: N/A
Name: F-ve Dolls                    , Year: 2011–2015
Name: 8Eight                        , Year: 2007–2014
Name: 2EYES                         , Year: N/A
Name: Untouchable                   , Year: N/A
Name: DMTN                          , Year: 2010–2014
Name: Baby V.O.X                    , Year: 2013-2015
Name: LC9                           , Year: 2013–2015
Name: ChoColat                      , Year: 2011–2017
Name: CoCoSoRi                      , Year: N/A
Name: MyB                           , Year: 2015–2016
Name: I.B.I                         , Year: 2016–2017
Name: A-Prince                      , Year: 2012–2015
Name: M.Pire                        , Year: 2013-2015
Name: Bob Girls                     , Year: 2014–2015
Name: Gangkiz                       , Year: 2012–2014
Name: TraxX                         , Year: N/A
Name: GI                            , Year: 2013–2016
Name: BTL                           , Year: 2014–2016
Name: SKarf                         , Year: 2012–2014
Name: T-max                         , Year: 2007–2012
Name: Sunny Days                    , Year: 2012–2016
Name: SeeYa                         , Year: 2006–2011
Name: 4L                            , Year: 2014–2016
Name: Blady                         , Year: 2011-2017
Name: Phantom                       , Year: 2011–2017
Name: Puretty                       , Year: 2012–2014
Name: Double-A                      , Year: 2011–2015
Name: The SeeYa                     , Year: 2012–2015
Name: D-Unit                        , Year: 2012–2013
Name: Unicorn                       , Year: 2015–2017
Name: N-Sonic                       , Year: 2011–2016
Name: Supreme Team                  , Year: 2009–2013
Name: GP Basic                      , Year: 2010–2015
Name: Shu-I                         , Year: 2009 -2015
Name: Big Mama                      , Year: 2003—2012
Name: N-Train                       , Year: 2011–2013
Name: NOM                           , Year: 2013–2016
Name: Ledt                          , Year: 2010–2016
Name: PARAN                         , Year: 2005-2011
Name: N.EX.T                        , Year: 1992-1997, 2003-2014
Name: Kiha & The Faces              , Year: N/A
Name: Rumble Fish                   , Year: 2003-2010

【讨论】:

  • 你好 αԋɱҽԃ αмєяιcαη,非常感谢。我想知道您是如何获得该链接的:api.ranker.com/lists/2713714/items?因为我在上面提供了不同的链接。
猜你喜欢
  • 1970-01-01
  • 2015-11-19
  • 2019-06-29
  • 1970-01-01
  • 1970-01-01
  • 2018-04-18
  • 1970-01-01
  • 1970-01-01
  • 2021-02-21
相关资源
最近更新 更多