【发布时间】:2021-01-27 20:31:00
【问题描述】:
出于某种原因,我试图获取包含这些类名列表的多行,我不断收到null 作为回报。如何使用这个类链列表获取行列表?
我的代码
const alternativeRowsCounts = await page.$$eval(
'.ExResult-row > .ExResult-row--relatedExercises > .flexo-container > .flexo-between',
element => element.innerText //i've also tried rows instead of elements but still got null
// (rows) => rows.length
);
console.log(`Number of rows = ${alternativeRowsCounts}`)
DOM 的更大部分
我正在尝试获取h3 标签中的ExHeading ExResult-resultsHeading。例如:Barbell Bench Press - Medium Grip
<section class="ExDetail-section ExDetail-related">
<h3 class="ExHeading ExHeading--h3">
Alternative Exercises for Dumbbell Bench Press
</h3>
<div class="ExResult-row ExResult-row--relatedExercises flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
<div class="ExResult-cell ExResult-cell--imgs ">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Barbell Bench Press - Medium Grip thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m1-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m1-square-600x600.jpg" itemprop="image">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Barbell Bench Press - Medium Grip thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m2-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-81e-bench-press-m2-square-600x600.jpg" itemprop="image">
</div>
<div class="ExResult-cell ExResult-cell--nameEtc">
<h3 class="ExHeading ExResult-resultsHeading">
<a href="/exercises/barbell-bench-press-medium-grip" itemprop="name">
Barbell Bench Press - Medium Grip
</a>
</h3>
<div class="ExResult-details ExResult-muscleTargeted">
Muscle Targeted:
<a href="/exercises/muscle/chest">
Chest
</a>
</div>
<div class="ExResult-details ExResult-equipmentType">
Equipment Type:
<a href="/exercises/equipment/barbell">
Barbell
</a>
</div>
</div>
<div class="ExResult-cell ExResult-cell--rating">
<div class="ExRating">
<div class="ExRating-badge">
9
</div>
<div class="ExRating-description ExRating-description--Average">
Average
</div>
</div>
</div>
</div> <div class="ExResult-row ExResult-row--relatedExercises flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
<div class="ExResult-cell ExResult-cell--imgs ">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Incline dumbbell bench press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m1-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m1-square-600x600.jpg" itemprop="image">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Incline dumbbell bench press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m2-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-3n-incline-dumbbell-bench-press-m2-square-600x600.jpg" itemprop="image">
</div>
<div class="ExResult-cell ExResult-cell--nameEtc">
<h3 class="ExHeading ExResult-resultsHeading">
<a href="/exercises/incline-dumbbell-press" itemprop="name">
Incline dumbbell bench press
</a>
</h3>
<div class="ExResult-details ExResult-muscleTargeted">
Muscle Targeted:
<a href="/exercises/muscle/chest">
Chest
</a>
</div>
<div class="ExResult-details ExResult-equipmentType">
Equipment Type:
<a href="/exercises/equipment/dumbbell">
Dumbbell
</a>
</div>
</div>
<div class="ExResult-cell ExResult-cell--rating">
<div class="ExRating">
<div class="ExRating-badge">
9.1
</div>
<div class="ExRating-description ExRating-description--Average">
Average
</div>
</div>
</div>
</div> <div class="ExResult-row ExResult-row--relatedExercises flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
<div class="ExResult-cell ExResult-cell--imgs ">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Kettlebell alternating floor press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m1-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m1-square-600x600.jpg" itemprop="image">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt="Kettlebell alternating floor press thumbnail image" src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m2-square-600x600.jpg" data-src="https://www.bodybuilding.com/images/2020/xdb/cropped/xdb-6k-kettlebell-alternating-floor-press-m2-square-600x600.jpg" itemprop="image">
</div>
<div class="ExResult-cell ExResult-cell--nameEtc">
<h3 class="ExHeading ExResult-resultsHeading">
<a href="/exercises/alternating-floor-press" itemprop="name">
Kettlebell alternating floor press
</a>
</h3>
<div class="ExResult-details ExResult-muscleTargeted">
Muscle Targeted:
<a href="/exercises/muscle/chest">
Chest
</a>
</div>
<div class="ExResult-details ExResult-equipmentType">
Equipment Type:
<a href="/exercises/equipment/kettlebells">
Kettlebells
</a>
</div>
</div>
<div class="ExResult-cell ExResult-cell--rating">
<div class="ExRating">
<div class="ExRating-badge">
6
</div>
<div class="ExRating-description ExRating-description--Average">
Average
</div>
</div>
</div>
</div> </section>
编辑 2:
我可以得到其中一个,但我需要得到所有。每页在每页一到三个之间。如何获取所有包含这些带有类的 html 元素的文本?
const alternativeExerciseNames = await page.$$(
'h3.ExResult-resultsHeading > a',
(el) => el.innerText
);
【问题讨论】:
标签: javascript html web-scraping puppeteer