【问题标题】:Puppeteer QuerySelector Cannot read property 'textContent' of NullPuppeteer QuerySelector 无法读取 Null 的属性“textContent”
【发布时间】:2021-01-13 18:29:22
【问题描述】:

尝试使用 Nodejs 和 Puppeteer 抓取有关漏洞的一些数据,遇到了一些属性显示为 null 或空的问题,但在浏览器中运行 SelectorQuery 可以正常工作(版本 87.0.4280.88 (x86_64) )。下面是产生问题的 sn-p。

选择器用于修补漏洞的日期,其中选择器路径为“div.patched”。该问题似乎也发生在具有以下选择器“spec-title for-l”的软件部分。

const puppeteer = require('puppeteer');
const url = 'https://www.zero-day.cz/database/';
const selector = '.issue.col-md-6';
(async function(){
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    const version = await page.browser().version();
    console.log(version);
    await page.goto(url);
    const articles = await page.$$eval(selector, nodes => {
        return nodes.map(node => {
            let timePatched = node.querySelectorAll('div.patched').textContent;
            {};
            return {
                timePatched
            }
        })
    });
    console.log(articles);
    await browser.close();
})();

输出

[
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
  {}, {}, {}, {},
  ... 374 more items
]
  • HeadlessChrome/88.0.4298.0
  • Npm 7.3 版
  • Puppeteer 5.5 版

【问题讨论】:

    标签: javascript puppeteer chromium


    【解决方案1】:

    不要把事情复杂化,尽量简单。例如,要获取补丁的日期,您可以仅使用 .patched 选择器,它为您提供与结合使用 .issue.col-md-6 时相同数量的元素。

    另一件事是page.$$eval() 中有 3 级缩进,这不是很可读。尽量简化事情。

    这是给我一个补丁日期数组的代码(其中 450 个):

    const puppeteer = require('puppeteer');
    
    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.goto('https://www.zero-day.cz/database/');    
        
        const patchedTexts = await page.evaluate(() => {
            const nodes = document.querySelectorAll('.patched');
            return [...nodes].map(e => e.textContent);
        });
    
        console.log(patchedTexts);
        await browser.close();
    })();
    

    输出是:

    [
      '2021-01-12', '2020-12-15', '2020-12-14', '2020-12-07', '2020-11-11',
      '2020-11-11', '2020-11-06', '2020-11-06', '2020-11-06', '2020-11-03',
      '2020-11-03', '2020-11-10', '2020-10-20', '2020-10-20', '2020-09-01',
      ...
    ]
    

    【讨论】:

    • 非常感谢,它正在工作。但是如何扩展它以添加标题、描述和发现日期?
    【解决方案2】:

    假设您要查找的只是此处修补日期的时间,这是我根据您的代码制作的时间

    const puppeteer = require("puppeteer");
    const url = "https://www.zero-day.cz/database/";
    
    (async function () {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      const version = await page.browser().version();
      console.log(version);
      await page.goto(url);
      const patchTimes = await page.$$eval(
        ".issue.col-md-6 div.patched",
        (patches) => patches.map((patch) => patch.textContent)
      );
      console.log(patchTimes);
      await browser.close();
    })();

    添加获取标题/描述/问题状态/已修补的新 sn-p

    const puppeteer = require("puppeteer");
    
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto("https://www.zero-day.cz/database/");
    
      const patched_texts = await page.evaluate(() => {
        const nodes = document.querySelectorAll(".patched");
        return [...nodes].map((e) => e.textContent);
      });
      const issue_title = await page.evaluate(() => {
        const nodes = document.querySelectorAll(".issue-title");
        return [...nodes].map((e) => e.textContent);
      });
      const desc = await page.evaluate(() => {
        const nodes = document.querySelectorAll(".description");
        return [...nodes].map((e) => e.textContent);
      });
      const issue_status = await page.evaluate(() => {
        const nodes = document.querySelectorAll(".issue-status");
        return [...nodes].map((e) => e.textContent);
      });
      console.log(issue_title);
      console.log(desc);
      console.log(issue_status);
      console.log(patched_texts);
    
      console.log(patchedTexts);
      await browser.close();
    })();
    

    这会抓取您正在寻找的信息。您现在需要使用此脚本来处理您希望组合的任何格式

    【讨论】:

    • 非常感谢,它正在工作。但是如何扩展它以添加标题、描述和发现日期?
    • 我正在等待 Pavel 的回复,因为我真的很喜欢他编写 js 代码的方式,我也想向他学习。如果两天内没有回复,我会跳上去帮我保证
    • 嗨凯尔,再次感谢您的帮助。我正在努力将其扩展到包括所有内容,并且需要尽快完成工作,因此为匆忙道歉。如果您有任何想法,请告诉我。再次感谢您。
    • 嗨,马特,我明天正在打磨工作,我会优先考虑这个脚本:)
    • 做了一些补充
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-06-21
    • 1970-01-01
    • 2017-07-30
    • 2021-01-13
    • 1970-01-01
    • 2022-08-14
    • 1970-01-01
    相关资源
    最近更新 更多