【问题标题】:Getting a list of list elements using Playwright使用 Playwright 获取列表元素的列表
【发布时间】:2021-05-19 16:09:35
【问题描述】:

我正在尝试编写一个应用程序,它将转到亚马逊并在页面上获取书籍列表。我正在使用剧作家作为工具。我可以找到正确的部分,但我无法获得书籍列表。在线查看示例似乎使用page.$$(selector),但是当我尝试这样做时,我得到一个空数组。找到此信息 herehere。阅读$$ 上的docs,这似乎是正确的调用,因为所有列表元素都具有相同的类名。我不知道我做错了什么,对此有何建议?

这是我目前的代码;

const AMAZON_KINDLE_EBOOK_STORE_URL = 'https://www.amazon.com/Best-Sellers-Kindle-Store-eBooks/zgbs/digital-text/154606011/ref=zg_bs_nav_kstore_1_kstore/';
(async () => {
    const browser = await chromium.launch();
    try {
        const amazonPage = await browser.newPage();
        await amazonPage.goto(AMAZON_KINDLE_EBOOK_STORE_URL);

        await amazonPage.waitForSelector('"Best Sellers in"');
        await amazonPage.click('"Self-Help"');
        await amazonPage.click('"Creativity"')

        const books = await amazonPage.$$('li[class="zg-item-immersion"]');
        console.log(books);
    } finally {
        await browser.close();
    }
})();

对于选择器,我也尝试了很多方法;

  • li[class="zg-item-immersion"] - 这实际上在开发控制台上进行检查
  • 'zg-item-immersion'
  • #zg-item-immersion

【问题讨论】:

    标签: javascript node.js web-scraping playwright


    【解决方案1】:

    似乎唯一的问题是 Plawright 太快了,你没有等待那些元素li[class="zg-item-immersion"]

    我调试了脚本,选择器没问题,所以用这一行,它返回 50 个元素句柄:

    const { chromium } = require('playwright');
    
    const AMAZON_KINDLE_EBOOK_STORE_URL = 'https://www.amazon.com/Best-Sellers-Kindle-Store-eBooks/zgbs/digital-text/154606011/ref=zg_bs_nav_kstore_1_kstore/';
    (async () => {
        const browser = await chromium.launch({ headless: false});
        try {
            const amazonPage = await browser.newPage();
            await amazonPage.goto(AMAZON_KINDLE_EBOOK_STORE_URL);
    
            await amazonPage.waitForSelector('"Best Sellers in"');
            await amazonPage.click('"Self-Help"');
    
            await Promise.all([
                amazonPage.waitForNavigation(),
                amazonPage.click('"Creativity"')
            ]);
            
            const books = await amazonPage.$$('li[class="zg-item-immersion"]');
            console.log(books);
        } finally {
            await browser.close();
        }
    })();
    

    你也许可以做你在上面几行所做的事情并为一个选择器:

    const { chromium } = require('playwright');
    
    const AMAZON_KINDLE_EBOOK_STORE_URL = 'https://www.amazon.com/Best-Sellers-Kindle-Store-eBooks/zgbs/digital-text/154606011/ref=zg_bs_nav_kstore_1_kstore/';
    (async () => {
        const browser = await chromium.launch({ headless: false});
        try {
            const amazonPage = await browser.newPage();
            await amazonPage.goto(AMAZON_KINDLE_EBOOK_STORE_URL);
    
            await amazonPage.waitForSelector('"Best Sellers in"');
            await amazonPage.click('"Self-Help"');
            await amazonPage.click('"Creativity"')
    
            await amazonPage.waitForSelector('li[class="zg-item-immersion"]');
            const books = await amazonPage.$$('li[class="zg-item-immersion"]');
            console.log(books);
        } finally {
            await browser.close();
        }
    })();
    

    它也确实像这样工作。

    【讨论】:

    • 感谢@pavelsaman,这对我来说是一种享受。因此,任何时候页面更改/更新总是在执行任何操作之前使用waitForSelector
    猜你喜欢
    • 2022-01-18
    • 1970-01-01
    • 1970-01-01
    • 2012-04-04
    • 2022-10-23
    • 2019-12-31
    • 2010-11-16
    • 2018-11-05
    • 2022-01-22
    相关资源
    最近更新 更多