【问题标题】:html manipulation with Node JS使用 Node JS 操作 html
【发布时间】:2017-08-25 11:56:47
【问题描述】:

我想从一个源(链接或文件,...)和 从中找到价值。 html格式为:

<!doctype html>
<html>
<body>
  <main>
    <section id="serp">
      <div>
        <article>a</article>
        <article>b</article>
        <article>c</article>
        <article>d</article>
      </div>
    </section>
  </main>
</body>
</html>

首先我使用了cheerio。 根据我写的文档:

const cheerio = require('cheerio');
const $ = cheerio.load(myhtml);
const content = $('#serp div').children();
console.log(content); // null

根据相同的程序,我使用了x-ray 和 jsdom,但它们都是 打印空。

【问题讨论】:

  • console.log(myhtml) 在加载到 Cheerio 之前会输出什么?
  • 它是字符串
    a
    b
    c
    d
    @JeremyThille
  • 如果 HTML 确实可用并加载到 Cheerio 中,则选择器没有理由返回 null。问题在别处。您是否实际上尝试记录它,还是因为您认为这就是将要记录的内容而这么说?
  • @JeremyThille 不,我实际上记录了它
  • 嗯,这很奇怪,因为代码很简单,而且真的没有理由不应该工作。如果您只选择 $('#serp') 怎么办?找到了吗?

标签: javascript node.js cheerio x-ray


【解决方案1】:

我做了以下事情:

let myhtml = `<!doctype html>
<html>
<body>
  <main>
    <section id="serp">
      <div>
        <article>a</article>
        <article>b</article>
        <article>c</article>
        <article>d</article>
      </div>
    </section>
  </main>
</body>
</html>`;

const cheerio = require('cheerio');
const $ = cheerio.load(myhtml);
const content = $('#serp div').children();
console.log(content);
console.log(`html: ${content.html()}`);

它将以下内容输出到控制台:

initialize {
  '0': 
   { type: 'tag',
     name: 'article',
     namespace: 'http://www.w3.org/1999/xhtml',
     attribs: {},
     'x-attribsNamespace': {},
     'x-attribsPrefix': {},
     children: [ [Object] ],
     parent: 
      { type: 'tag',
        name: 'div',
        namespace: 'http://www.w3.org/1999/xhtml',
        attribs: {},
        'x-attribsNamespace': {},
        'x-attribsPrefix': {},
        children: [Object],
        parent: [Object],
        prev: [Object],
        next: [Object] },
     prev: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: null,
        next: [Circular] },
     next: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: [Circular],
        next: [Object] } },
  '1': 
   { type: 'tag',
     name: 'article',
     namespace: 'http://www.w3.org/1999/xhtml',
     attribs: {},
     'x-attribsNamespace': {},
     'x-attribsPrefix': {},
     children: [ [Object] ],
     parent: 
      { type: 'tag',
        name: 'div',
        namespace: 'http://www.w3.org/1999/xhtml',
        attribs: {},
        'x-attribsNamespace': {},
        'x-attribsPrefix': {},
        children: [Object],
        parent: [Object],
        prev: [Object],
        next: [Object] },
     prev: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: [Object],
        next: [Circular] },
     next: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: [Circular],
        next: [Object] } },
  '2': 
   { type: 'tag',
     name: 'article',
     namespace: 'http://www.w3.org/1999/xhtml',
     attribs: {},
     'x-attribsNamespace': {},
     'x-attribsPrefix': {},
     children: [ [Object] ],
     parent: 
      { type: 'tag',
        name: 'div',
        namespace: 'http://www.w3.org/1999/xhtml',
        attribs: {},
        'x-attribsNamespace': {},
        'x-attribsPrefix': {},
        children: [Object],
        parent: [Object],
        prev: [Object],
        next: [Object] },
     prev: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: [Object],
        next: [Circular] },
     next: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: [Circular],
        next: [Object] } },
  '3': 
   { type: 'tag',
     name: 'article',
     namespace: 'http://www.w3.org/1999/xhtml',
     attribs: {},
     'x-attribsNamespace': {},
     'x-attribsPrefix': {},
     children: [ [Object] ],
     parent: 
      { type: 'tag',
        name: 'div',
        namespace: 'http://www.w3.org/1999/xhtml',
        attribs: {},
        'x-attribsNamespace': {},
        'x-attribsPrefix': {},
        children: [Object],
        parent: [Object],
        prev: [Object],
        next: [Object] },
     prev: 
      { type: 'text',
        data: '\n        ',
        parent: [Object],
        prev: [Object],
        next: [Circular] },
     next: 
      { type: 'text',
        data: '\n      ',
        parent: [Object],
        prev: [Circular],
        next: null } },
  options: 
   { withDomLvl1: true,
     normalizeWhitespace: false,
     xml: false,
     decodeEntities: true },
  _root: 
   initialize {
     '0': 
      { type: 'root',
        name: 'root',
        namespace: 'http://www.w3.org/1999/xhtml',
        attribs: {},
        'x-attribsNamespace': {},
        'x-attribsPrefix': {},
        children: [Object],
        parent: null,
        prev: null,
        next: null },
     options: 
      { withDomLvl1: true,
        normalizeWhitespace: false,
        xml: false,
        decodeEntities: true },
     length: 1,
     _root: [Circular] },
  length: 4,
  prevObject: 
   initialize {
     '0': 
      { type: 'tag',
        name: 'div',
        namespace: 'http://www.w3.org/1999/xhtml',
        attribs: {},
        'x-attribsNamespace': {},
        'x-attribsPrefix': {},
        children: [Object],
        parent: [Object],
        prev: [Object],
        next: [Object] },
     options: 
      { withDomLvl1: true,
        normalizeWhitespace: false,
        xml: false,
        decodeEntities: true },
     _root: initialize { '0': [Object], options: [Object], length: 1, _root: [Circular] },
     length: 1,
     prevObject: initialize { '0': [Object], options: [Object], length: 1, _root: [Circular] } } }
html: a

Process finished with exit code 0

【讨论】:

    猜你喜欢
    • 2020-12-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-10-17
    • 1970-01-01
    相关资源
    最近更新 更多