【问题标题】:Puppeteer: How to evaluate an XPath with a context node?Puppeteer:如何使用上下文节点评估 XPath?
【发布时间】:2020-11-10 02:13:00
【问题描述】:

来自doc

所以我尝试了这段代码:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('http://personalitycore.com/a.html');
    let p = (await page.$x('/html/body/p'))[0]
    console.log("Var[p] Class: " + p.constructor.name)
    console.log("Var[p] Tag: " + await p.evaluate(e => e.tagName, p))
    let spans = await p.$x('/*')
    for (var i = 0; i < spans.length; i++) {
        console.log("Var[spans] Tag: " + await spans[i].evaluate(e => e.tagName, spans[i]))
        console.log("Var[spans] Text: " + await spans[i].evaluate(e => e.textContent, spans[i]))
    }
    await browser.close();
})();

http://personalitycore.com/a.html的HTML是:

<head>
</head>
<body>
<p>
text_node1
<span>span_node1</span>
text_node2
<span>span_node2</span>
</p>
</body>

结果:

/usr/local/bin/node example.js
Var[p] Class: ElementHandle
Var[p] Tag: P
Var[spans] Tag: HTML
Var[spans] Text: 

text_node1
span_node1
text_node2
span_node2

我很困惑。根据文档,p 是一个ElementHandle,评估 xpath /* 应该得到[TextNode, Span, TextNode, Span]

但它返回了整个页面,带有标签HTML

所以,我的问题:

  1. 我的代码中是否有任何错误导致我没有得到预期的结果?
  2. 如何使用上下文节点评估 XPath?在我的示例中,我想在标签 p 上评估 /*

【问题讨论】:

    标签: node.js xpath puppeteer


    【解决方案1】:

    您只需将上下文节点符号(一个点)添加到 XPath:'./*'。没有它,'/*' 表示“文档的所有子元素”,即 html 元素。

    import puppeteer from 'puppeteer';
    
    const browser = await puppeteer.launch();
    
    const html = `
      <!doctype html>
      <html>
        <head>
        </head>
        <body>
          <p>
            text_node1
            <span>span_node1</span>
            text_node2
            <span>span_node2</span>
          </p>
        </body>
      </html>`;
    
    try {
      const page = await browser.newPage();
      await page.goto('http://personalitycore.com/a.html');
    
      const [p] = await page.$x('/html/body/p');
      console.log("Var[p] Class: " + p.constructor.name);
      console.log("Var[p] Tag: " + await p.evaluate(e => e.tagName, p));
    
      const spans = await p.$x('./*');
      for (let i = 0; i < spans.length; i++) {
          console.log("Var[spans] Tag: " + await spans[i].evaluate(e => e.tagName, spans[i]));
          console.log("Var[spans] Text: " + await spans[i].evaluate(e => e.textContent, spans[i]));
      }
    } catch(err) { console.error(err); } finally { await browser.close(); }
    

    输出:

    Var[p] Class: ElementHandle
    Var[p] Tag: P
    Var[spans] Tag: SPAN
    Var[spans] Text: span_node1
    Var[spans] Tag: SPAN
    Var[spans] Text: span_node2
    

    【讨论】:

      猜你喜欢
      • 2020-12-14
      • 2018-07-05
      • 2020-06-14
      • 1970-01-01
      • 2020-08-17
      • 1970-01-01
      • 1970-01-01
      • 2012-07-12
      • 1970-01-01
      相关资源
      最近更新 更多