WordPress：根据内容的标题生成目录答案

【问题标题】：WordPress: Generate table of contents based on headlines from contentWordPress：根据内容的标题生成目录
【发布时间】：2022-02-02 05:46:01
【问题描述】：

我想根据我文章的标题生成一个目录列表。

我已经找到了从内容中获取所有标题并将<h2> 标记替换为<a> 标记的解决方案。

问题是，我还需要用链接替换 <h3> 标记并将它们显示在链接列表中。

我的结果应该是这样的：

<ul>
    <li><a href="#h2-1">I was a H2 headline</a></li>
    <li>
        <a href="#h2-2">Also a H2 headline</a>
        <ul>
            <li><a href="#h3-1">H3 headline</a></li>
            <li><a href="#h3-2">Another H3 headline</a></li>
        </ul>
    </li>
</ul>

我的问题是，有些标题可能有 class="" 元素，而其他标题则没有。目前我删除了每个class="" 和str_replace。这不是最好的解决方案，但它适用于我和我对正则表达式的了解很少。

以下代码是我从内容中获取每个标题的函数。

我先获取帖子的内容，并将其存储在$content。

从那里我得到所有的头条新闻 (<h2> - <h6>) 并将它们存储在 $results 与此行：

preg_match_all('#<h[2-6]*[^>]*>.*?<\/h[2-6]>#',$content,$results);

目前我只使用<h2> 标题，因为我不确定如何以一种智能的方式进行操作，我必须为每个标题级别重复以下行：

$toc = str_replace('<h2','<li><a',$toc);
$toc = str_replace('</h2>','</a></li>',$toc);

但我最大的问题是标题的嵌套。我怎样才能生成像上面这样的 HTML 代码？

同样重要的是：我如何处理不同的标题格式：

<h2 class="style" id="name">
<h2 id="name" class="style">
<h2 id="name">

这是我当前的代码：

$content_postid = get_the_ID();
$content_post   = get_post($content_postid);
$content        = $content_post->post_content;
$content        = apply_filters('the_content', $content);
$content        = str_replace(']]>', ']]&gt;', $content);

preg_match_all('#<h[2-6]*[^>]*>.*?<\/h[2-6]>#',$content,$results);

$toc = implode("\n",$results[0]);

// This part is messy because I don't really understand regex :-(
$toc = preg_replace('/class=".*?"/', '', $toc);
$toc = str_replace('<strong>','',$toc);
$toc = str_replace('</strong>','',$toc);
$toc = str_replace('<h2','<li><a',$toc);
$toc = str_replace('</h2>','</a></li>',$toc);
$toc = str_replace('id="','href="#',$toc);

//plug the results into appropriate HTML tags
$toc = '<div id="toc">
<ul class="list-unstyled">
'.$toc.'
</ul>
</div>';

echo $toc;

这是我当前的输出（如您所见，只有<h2> 标题）：

<ul class="list-unstyled">
    <li><a href="#h2-1">I was a H2 headline</a></li>
    <li><a href="#h2-2">Also a H2 headline</a></li>
</ul>

编辑：这是一个可以在$content内的示例HTML代码：

<p>Lorem ipsum dolor sit amet...</p>
<p>consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat</p>
<img src="/path/to/image.jpg" />
<h2 class="style" id="name">
<p>Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat</p>
<p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat</p> 
<h3 class="style" id="name">Headline 3</h3>
<p>vel illum dolore eu feugiat nulla facilisis at vero et accumsan et iusto odio dignissim qui</p>
<h3 class="style" id="name">On more Headline 3</h3>
<p>blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi</p>
<h2 id="name" class="style">Headline 2 with class</h2>
<p>Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet</p>
<h2 id="name">Another Headline 2 without class</h2>
<p>doming id quod mazim placerat facer possim assum</p>

编辑 2：

我找到了一个看起来正确的函数 (here)。但我无法让它工作。

我还发现了一个明确使用 DOMDocument here 的函数。但我现在正在测试它。目前它显示了全部内容。

这是其中的代码：

$doc = new DOMDocument();
$doc->loadHTML($code);

// create document fragment
$frag = $doc->createDocumentFragment();
// create initial list
$frag->appendChild($doc->createElement('ol'));
$head = &$frag->firstChild;
$xpath = new DOMXPath($doc);
$last = 1;

// get all H1, H2, …, H6 elements
foreach ($xpath->query('//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]') as $headline) {
    // get level of current headline
    sscanf($headline->tagName, 'h%u', $curr);

    // move head reference if necessary
    if ($curr < $last) {
        // move upwards
        for ($i=$curr; $i<$last; $i++) {
            $head = &$head->parentNode->parentNode;
        }
    } else if ($curr > $last && $head->lastChild) {
        // move downwards and create new lists
        for ($i=$last; $i<$curr; $i++) {
            $head->lastChild->appendChild($doc->createElement('ol'));
            $head = &$head->lastChild->lastChild;
        }
    }
    $last = $curr;

    // add list item
    $li = $doc->createElement('li');
    $head->appendChild($li);
    $a = $doc->createElement('a', $headline->textContent);
    $head->lastChild->appendChild($a);

    // build ID
    $levels = array();
    $tmp = &$head;
    // walk subtree up to fragment root node of this subtree
    while (!is_null($tmp) && $tmp != $frag) {
        $levels[] = $tmp->childNodes->length;
        $tmp = &$tmp->parentNode->parentNode;
    }
    $id = 'sect'.implode('.', array_reverse($levels));
    // set destination
    $a->setAttribute('href', '#'.$id);
    // add anchor to headline
    $a = $doc->createElement('a');
    $a->setAttribute('name', $id);
    $a->setAttribute('id', $id);
    $headline->insertBefore($a, $headline->firstChild);
}

// append fragment to document
$doc->getElementsByTagName('body')->item(0)->appendChild($frag);

// echo markup
echo $doc->saveHTML();

【问题讨论】：

这显然不是str_replace 或preg_replace 的工作，而是DOMDocument 的工作。提供与预期结果对应的 html 源代码。
是的，完全可以用 DOM 完成（用于提取和构建结果），但是由于健壮性主要取决于提取，您也可以考虑这种代码：@987654323 @
更改格式并从节点中提取id属性：3v4l.org/MWJrel

标签： php html regex wordpress replace

【解决方案1】：

一种仅使用 DOM 从 html 源代码中解析和提取相关信息的方法。然后将结果构建为字符串。

libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTML($html);

$xp = new DOMXPath($dom);
$nodes = $xp->query('//*[contains("h1 h2 h3 h4 h5 h6", name())]');

$currentLevel = ['level' => 0 /*, 'count' => 0*/ ];
$stack = [];
$format = '<li><a href="#%s">%s</a></li>';
$result = '';

foreach($nodes as $node) {
    $level = (int)$node->tagName[1]; // extract the digit after h
  
    while($level < $currentLevel['level']) {
        $currentLevel = array_pop($stack);
        $result .= '</ul>';
    }
    
    if ($level === $currentLevel['level']) {
        $currentLevel['count']++;
    } else {
        $stack[] = $currentLevel;
        $currentLevel = ['level' => $level, 'count' => 1];
        $result .= '<ul>';
    }

    $result .= sprintf($format, $node->getAttribute('id'), $node->nodeValue);    
}

$result .= str_repeat('</ul>', count($stack));

demo

为了逐步构建预期的树结构，此代码使用堆栈 (FILO) 存储具有级别（h 之后的数字）和已为此级别添加的节点数的数组。当当前节点的级别高于前一个节点时，则将数组存储在堆栈中。如果当前节点的层级低于前一个节点，则最后一个元素被取消堆叠（直到最后一个元素的层级更高或相等）。如果当前节点和前一个节点的级别相同，则堆栈保持不变，并且数组中的计数项递增。

在主循环之后，代码计算堆栈中的剩余项以正确关闭ul标签。

xpath 查询详情：

 //*        [contains("h1 h2 h3 h4 h5 h6", name())]
|___|      |_______________________________________|
location   predicate
path

位置路径：

// 从当前位置开始在 DOM 树中的任何位置（即默认根）
*任意Element节点

谓词：

name() 返回当前元素名称
contains(haystack, needle)

【讨论】：