【问题标题】:puppeteer get number of rowspuppeteer 获取行数
【发布时间】:2021-01-27 20:31:00
【问题描述】:

出于某种原因,我试图获取包含这些类名列表的多行,我不断收到null 作为回报。如何使用这个类链列表获取行列表?

我的代码

        const alternativeRowsCounts = await page.$$eval(
            '.ExResult-row > .ExResult-row--relatedExercises > .flexo-container > .flexo-between',
            element => element.innerText //i've also tried rows instead of elements but still got null
//            (rows) => rows.length 
        );
        console.log(`Number of rows = ${alternativeRowsCounts}`)

DOM 的更大部分 我正在尝试获取h3 标签中的ExHeading ExResult-resultsHeading。例如:Barbell Bench Press - Medium Grip

<section class="ExDetail-section ExDetail-related">
      <h3 class="ExHeading ExHeading--h3">
        Alternative Exercises for Dumbbell Bench Press
      </h3>
        <div class="ExResult-row ExResult-row--relatedExercises flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
          <div class="ExResult-cell ExResult-cell--imgs ">
                <!-- using male photos -->
                <img class="ExImg ExResult-img  ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Barbell Bench Press - Medium Grip thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m1-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m1-square-600x600.jpg" itemprop="image">
                <!-- using male photos -->
                <img class="ExImg ExResult-img  ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Barbell Bench Press - Medium Grip thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m2-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m2-square-600x600.jpg" itemprop="image">
          </div>
          <div class="ExResult-cell ExResult-cell--nameEtc">
            <h3 class="ExHeading ExResult-resultsHeading">
              <a href="/exercises/barbell-bench-press-medium-grip" itemprop="name">
                Barbell Bench Press - Medium Grip
              </a>
            </h3>
            <div class="ExResult-details ExResult-muscleTargeted">
              Muscle Targeted:
              <a href="/exercises/muscle/chest">
                Chest
              </a>
            </div>
            <div class="ExResult-details ExResult-equipmentType">
              Equipment Type:
              <a href="/exercises/equipment/barbell">
                Barbell
              </a>
            </div>
          </div>
          <div class="ExResult-cell ExResult-cell--rating">
            <div class="ExRating">
              <div class="ExRating-badge">
                9
              </div>
              <div class="ExRating-description ExRating-description--Average">
                Average
              </div>
            </div>
          </div>
        </div>        <div class="ExResult-row ExResult-row--relatedExercises flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
          <div class="ExResult-cell ExResult-cell--imgs ">
                <!-- using male photos -->
                <img class="ExImg ExResult-img  ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Incline dumbbell bench press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m1-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m1-square-600x600.jpg" itemprop="image">
                <!-- using male photos -->
                <img class="ExImg ExResult-img  ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Incline dumbbell bench press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m2-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m2-square-600x600.jpg" itemprop="image">
          </div>
          <div class="ExResult-cell ExResult-cell--nameEtc">
            <h3 class="ExHeading ExResult-resultsHeading">
              <a href="/exercises/incline-dumbbell-press" itemprop="name">
                Incline dumbbell bench press
              </a>
            </h3>
            <div class="ExResult-details ExResult-muscleTargeted">
              Muscle Targeted:
              <a href="/exercises/muscle/chest">
                Chest
              </a>
            </div>
            <div class="ExResult-details ExResult-equipmentType">
              Equipment Type:
              <a href="/exercises/equipment/dumbbell">
                Dumbbell
              </a>
            </div>
          </div>
          <div class="ExResult-cell ExResult-cell--rating">
            <div class="ExRating">
              <div class="ExRating-badge">
                9.1
              </div>
              <div class="ExRating-description ExRating-description--Average">
                Average
              </div>
            </div>
          </div>
        </div>        <div class="ExResult-row ExResult-row--relatedExercises flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
          <div class="ExResult-cell ExResult-cell--imgs ">
                <!-- using male photos -->
                <img class="ExImg ExResult-img  ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Kettlebell alternating floor press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m1-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m1-square-600x600.jpg" itemprop="image">
                <!-- using male photos -->
                <img class="ExImg ExResult-img  ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Kettlebell alternating floor press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m2-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m2-square-600x600.jpg" itemprop="image">
          </div>
          <div class="ExResult-cell ExResult-cell--nameEtc">
            <h3 class="ExHeading ExResult-resultsHeading">
              <a href="/exercises/alternating-floor-press" itemprop="name">
                Kettlebell alternating floor press
              </a>
            </h3>
            <div class="ExResult-details ExResult-muscleTargeted">
              Muscle Targeted:
              <a href="/exercises/muscle/chest">
                Chest
              </a>
            </div>
            <div class="ExResult-details ExResult-equipmentType">
              Equipment Type:
              <a href="/exercises/equipment/kettlebells">
                Kettlebells
              </a>
            </div>
          </div>
          <div class="ExResult-cell ExResult-cell--rating">
            <div class="ExRating">
              <div class="ExRating-badge">
                6
              </div>
              <div class="ExRating-description ExRating-description--Average">
                Average
              </div>
            </div>
          </div>
        </div>    </section>

编辑 2:

我可以得到其中一个,但我需要得到所有。每页在每页一到三个之间。如何获取所有包含这些带有类的 html 元素的文本?

const alternativeExerciseNames = await page.$$(
    'h3.ExResult-resultsHeading > a',
    (el) => el.innerText
);

【问题讨论】:

    标签: javascript html web-scraping puppeteer


    【解决方案1】:

    您需要使用length 属性,而不是innerText 属性:

    const alternativeRowsCounts = await page.$$eval(
        '.ExResult-row > .ExResult-row--relatedExercises > .flexo-container > .flexo-between',
        elements => elements.length
    );
    console.log(`Number of rows = ${alternativeRowsCounts}`);
    

    您可以参考example in Puppeteer documentation

    我不确定选择器,它也可能是错误的,但我无法判断,因为我没有从 DOM 中看到更多信息。但你可以试试这个:

    const alternativeRowsCounts = await page.$$eval(
        'div.flexo-between',
        elements => elements.length
    );
    console.log(`Number of rows = ${alternativeRowsCounts}`);
    

    Child combinator &gt; 获取元素的直接子元素。

    【讨论】:

    • 感谢您抽出宝贵时间提供帮助。对于这两个代码,它仍然给了我 0。我在我的问题中发布了 DOM 的大部分内容。
    • 不是在 iframe 里面吗?影子 DOM?页面是否正确加载?
    • 不,不是。页面已完全加载。我能够获得其中一个标题,但我无法获得所有标题,因为每个页面都有 1 到 3 个标题。但它们都有相同的html elementclassconst alternativeExerciseNames = await page.$$( 'h3.ExResult-resultsHeading &gt; a', (el) =&gt; el.innerText );
    • @BruceMathers:你编辑的2个代码是错误的,阅读Puppeteer文档github.com/puppeteer/puppeteer/blob/main/docs/…如果你需要获取多个innerTexts,你可以使用page.$$eval()和一个带有map()函数的回调函数在里面。同样,它在文档中,只需阅读示例并根据您的问题进行调整。
    猜你喜欢
    • 2020-09-23
    • 2018-03-07
    • 2019-12-28
    • 2019-04-17
    • 2019-04-12
    • 2021-05-17
    • 2019-09-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多