【发布时间】:2021-09-01 10:56:54
【问题描述】:
我正在使用cheerio 和node-fetch 来获取特定URL 上的所有产品链接。
我返回了一系列链接,但列表不完整,因为正文缺少包含产品链接的 HTML。
fetch('https://shop.gossmanknives.com/shop?olsPage=products')
.then(res => res.text())
.then(body => {
$ = cheerio.load(body);
let snapshot = $("a, [data-ux='Link']")
.map((i, x) => $(x).attr('href'))
.toArray();
console.log(snapshot);
});
这是返回的数组:
['#', '/', '/', '/', '/shop','#', '/', '/shop', '/', '/shop', 'https://www.godaddy.com/websites/website-builder?isc=pwugc&utm_source=wsb&utm_medium=applications&utm_campaign=en-us_corp_applications_base']
这似乎很奇怪,因为有一个类似下面的元素 应该 被拾取,但 fetch() 返回的“正文”似乎缺少我在视图中看到的一堆 HTML资源。不知道为什么。也许数据是动态的,并且在 fetch() 运行时不在页面上?
<a rel="" typography="LinkAlpha" data-ux="Link" data-aid="PRODUCT_NAME_RENDERED_Orion" data-page="https://shop.gossmanknives.com/shop" data-page-query="olsPage=products/orion-aebl-black" href="https://shop.gossmanknives.com/shop?olsPage=products/orion-aebl-black" class="x-el x-el-a c2-9 c2-a c2-b c2-c c2-d c2-61 c2-f c2-3 c2-43 c2-4 c2-o c2-62 c2-63 c2-5 c2-6 c2-7 c2-8 x-d-ux x-d-aid x-d-page x-d-page-query" data-tccl="ux2.SHOP.shop1.Section.Default.Link.Default.43.click,click"><div data-ux="ProductCard" class="x-el x-el-div x-el c2-1 c2-2 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8 x-d-ux c2-1 c2-2 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8 x-d-ux"><div data-ux="ProductAsset" name="Orion" class="x-el x-el-div c2-1 c2-2 c2-1e c2-64 c2-65 c2-33 c2-4d c2-66 c2-2y c2-2z c2-30 c2-31 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8 x-d-ux"><div id="guacBg20" role="img" data-ux="Background" data-aid="PRODUCT_IMAGE_RENDERED_Orion" treatmentdata="[object Object]" class="x-el x-el-div c2-1 c2-2 c2-67 c2-68 c2-69 c2-6a c2-1g c2-6b c2-6c c2-1t c2-1i c2-6d c2-71 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8 x-d-ux x-d-aid" data-guac-image="loaded"><script>new guacImage('https://img1.wsimg.com/isteam/ip/94c95d7f-6505-4bfd-9837-ff1bcff87400/ols/IMG_0005-0002.JPG/:/rs=w:{width},h:{height},cg:false,m',document.getElementById('guacBg20'),{"useTreatmentData":true,"backgroundLayers":["linear-gradient(to bottom, rgba(22, 22, 22, 0) 0%, rgba(22, 22, 22, 0) 100%)"]})</script></div></div><div data-ux="ProductName" class="x-el x-el-div c2-1 c2-2 c2-6f c2-e c2-4j c2-g c2-3z c2-3 c2-4 c2-6g c2-5 c2-6 c2-7 c2-8 x-d-ux"><p typography="BodyAlpha" data-ux="Text" class="x-el x-el-p c2-1 c2-2 c2-c c2-d c2-4u c2-x c2-y c2-3y c2-6h c2-3 c2-6i c2-12 x-d-ux">Orion</p></div><div data-ux="ProductPrices" class="x-el x-el-div c2-1 c2-2 c2-6j c2-3y c2-3 c2-4 c2-5 c2-6 c2-7 c2-8 x-d-ux"><div typography="BodyAlpha" data-ux="Price" price="[object Object]" data-aid="PRODUCT_PRICE_RENDERED_Orion" class="x-el x-el-div c2-1 c2-2 c2-c c2-d c2-4u c2-x c2-y c2-t c2-3y c2-6k c2-3 c2-6i c2-12 x-d-ux x-d-aid">$365.00</div></div><p typography="DetailsAlpha" data-ux="ProductLabel" data-aid="PRODUCT_SHIP_FREE_RENDERED_Orion" class="x-el x-el-p c2-1 c2-1p c2-c c2-d c2-4u c2-6f c2-y c2-3y c2-28 c2-4r c2-3 c2-12 c2-29 c2-6q c2-2a c2-2b c2-2c x-d-ux x-d-aid">Free Shipping</p></div></a>
【问题讨论】:
标签: node.js cheerio node-fetch