【发布时间】:2021-11-26 00:05:27
【问题描述】:
我在使用 Puppeteer 时遇到了一些问题,我想提取一个项目列表并在 headless 为 FALSE 而不是 TRUE 时成功。
首先,我想在映射之前获取这些元素。
这是我的脚本,也许你可以复制它,它真的很基础。
const chalk = require("chalk");
const baseUrl = "https://www.interencheres.com/recherche/lots?search=";
const searchTerm = "Apple";
const searchUrl = baseUrl + searchTerm;
(async () => {
const browser = await puppeteer.launch({
headless: false,
ignoreHTTPSErrors: true,
args: [`--window-size=1920,1080`],
defaultViewport: {
width: 1920,
height: 1080,
},
});
const page = await browser.newPage();
// Begin navigation
console.log(chalk.yellow("Beginning navigation."));
await page.goto(searchUrl);
// Await List of elements;
console.log(chalk.yellow("Wait for Network Idle..."));
await page.waitForNetworkIdle();
// get Items
const findElements = await page.evaluate(() => {
const elements = document.querySelectorAll(".sale-item");
console.log(elements);
return elements;
});
console.log(findElements);
console.log(chalk.blue("Waiting..."));
await page.waitForTimeout(10000);
await browser.close();
console.log(chalk.red("Closed."));
})();
Expected results : {
'0': { _prevClass: 'sale-item pa-1 col-sm-6 col-md-4 col-lg-3 col-12' },
'1': { _prevClass: 'sale-item pa-1 col-sm-6 col-md-4 col-lg-3 col-12' },
'2': { _prevClass: 'sale-item pa-1 col-sm-6 col-md-4 col-lg-3 col-12' },
'3': { _prevClass: 'sale-item pa-1 col-sm-6 col-md-4 col-lg-3 col-12' },
'4': { _prevClass: 'sale-item pa-1 col-sm-6 col-md-4 col-lg-3 col-12' },
.
.
}
【问题讨论】:
标签: javascript web-scraping puppeteer headless-browser