【问题标题】:Keeping line breaks when using PHP's DomDocument appendChild使用 PHP 的 DomDocument appendChild 时保持换行符
【发布时间】:2011-10-20 16:11:11
【问题描述】:

我正在尝试使用 PHP 中的 DOMDocument 来添加/解析 HTML 文档中的内容。据我所知,将 formOutput 设置为 true 并将 preserveWhiteSpace 设置为 false 应该使制表符和换行符保持有序,但它似乎不适用于新创建或附加的节点。

代码如下:

$dom = new \DOMDocument;
$dom->formatOutput = true;
$dom->preserveWhiteSpace = false;
$dom->loadHTMLFile($htmlsource);
$tables = $dom->getElementsByTagName('table');
foreach($tables as $table)
{
    $table->setAttribute('class', 'tborder');
    $div = $dom->createElement('div');
    $div->setAttribute('class', 'm2x');
    $table->parentNode->insertBefore($div, $table);
    $div->appendChild($table);
}
$dom->saveHTMLFile($html)

HTML 如下所示:

<table>
    <tr>
        <td></td>
    </tr>
</table>

这就是我想要的:

<div class="m2x">
    <table class="tborder">
        <tr>
            <td></td>
        </tr>
    </table>
</div>

这是我得到的:

<div class="m2x"><table class="tborder"><tr>
<td></td>
        </tr></table></div>

我做错了什么吗?我试过用谷歌搜索尽可能多的不同方式,但没有运气。

【问题讨论】:

    标签: php domdocument


    【解决方案1】:

    不幸的是,您可能需要编写一个函数来按您想要的方式缩进输出。我做了一个小功能,您可能会觉得有帮助。

    function indentContent($content, $tab="\t")
    {               
    
            // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
            $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
    
            // now indent the tags
            $token = strtok($content, "\n");
            $result = ''; // holds formatted version as it is built
            $pad = 0; // initial indent
            $matches = array(); // returns from preg_matches()
    
            // scan each line and adjust indent based on opening/closing tags
            while ($token !== false) 
            {
                    $token = trim($token);
                    // test for the various tag states
    
                    // 1. open and closing tags on same line - no change
                    if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
                    // 2. closing tag - outdent now
                    elseif (preg_match('/^<\/\w/', $token, $matches))
                    {
                            $pad--;
                            if($indent>0) $indent=0;
                    }
                    // 3. opening tag - don't pad this one, only subsequent tags
                    elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) $indent=1;
                    // 4. no indentation needed
                    else $indent = 0;
    
                    // pad the line with the required number of leading spaces
                    $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
                    $result .= $line."\n"; // add to the cumulative result, with linefeed
                    $token = strtok("\n"); // get the next token
                    $pad += $indent; // update the pad size for subsequent lines    
            }       
    
            return $result;
    }
    

    indentContent($dom-&gt;saveHTML()) 将返回:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
        <body>
            <div class="m2x">
                <table class="tborder">
                    <tr>
                        <td>
                        </td>
                    </tr>
                </table>
            </div>
        </body>
    </html>
    

    我以this one 开头创建了这个函数。

    【讨论】:

    • 很棒的功能!但不幸的是,它在使用 void 元素时缩进了太多空格。
    • 我试过这个html,但它不起作用:&lt;table&gt; &lt;tr&gt; &lt;td&gt;&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt;
    【解决方案2】:

    我修改了 ghbarratt 写的很棒的函数,所以它不会缩进 void elements

    function indentContent($content, $tab="\t")
    {
        // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
        $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
    
        // now indent the tags
        $token = strtok($content, "\n");
        $result = ''; // holds formatted version as it is built
        $pad = 0; // initial indent
        $matches = array(); // returns from preg_matches()
    
        // scan each line and adjust indent based on opening/closing tags
        while ($token !== false) 
        {
            $token = trim($token);
            // test for the various tag states
    
            // 1. open and closing tags on same line - no change
            if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
            // 2. closing tag - outdent now
            elseif (preg_match('/^<\/\w/', $token, $matches))
            {
                $pad--;
                if($indent>0) $indent=0;
            }
            // 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
            elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches))
            {
                $voidTag = false;
                foreach ($matches as $m)
                {
                    // Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                    if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m))
                    {
                        $voidTag = true;
                        break;
                    }
                }
    
                if (!$voidTag) $indent=1;
            }
            // 4. no indentation needed
            else $indent = 0;
    
            // pad the line with the required number of leading spaces
            $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
            $result .= $line."\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines    
        }    
    
        return $result;
    }
    

    所有学分都归 ghbarratt。

    【讨论】:

    【解决方案3】:

    @Stan 和@ghbarrat 都不适合&lt;!DOCTYPE html&gt; html5 声明。它有点将缩进传递给&lt;head&gt; 元素。

    预期:

    <!DOCTYPE html>
    <html>
      <head>
        <meta charset="UTF-8">
      </head>
      <body>
        <!-- all good -->
      </body>
    </html>

    结果:

    <!DOCTYPE html>
    <html>
      <head>
        <meta charset="UTF-8">
        </head>
        <body>
          <!-- all good -->
        </body>
      </html>

    一点点测试,当我将 &lt;html&gt; 元素添加到 Void 元素列表时,显示了部分修复,但这并不能解决头部的问题,它还会使子项(即头部和身体)变平。

    编辑#1 看来&lt;meta charset="UTF-8"&gt;毕竟对不正确的缩进负责。

    编辑 #2 - 解决方案

    经过少许故障排除后,我发现&lt;meta&gt; 作为自闭合标签会影响下一个闭合标签,这可以通过添加标志来解决。该标志定义了我们是否找到了自结束标记,那么结束标记的下一个实例将有一个额外的负缩进。

    function indentContent($content, $tab="\t"){
        // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
        $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
    
        // now indent the tags
        $token = strtok($content, "\n");
        $result = ''; // holds formatted version as it is built
        $pad = 0; // initial indent
        $matches = array(); // returns from preg_matches()
    
        // scan each line and adjust indent based on opening/closing tags
        while ($token !== false && strlen($token)>0)
        {
            $token = trim($token);
            // test for the various tag states
    
            // 1. open and closing tags on same line - no change
            if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
            // 2. closing tag - outdent now
            elseif (preg_match('/^<\/\w/', $token, $matches))
            {
                $pad--;
                if($indent>0) $indent=0;
                if($nextTagNegative){
                    $pad--;$nextTagNegative=false;
                }
            }
            // 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
            elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches))
            {
                $voidTag = false;
                foreach ($matches as $m)
                {
                    // Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                    if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m))
                    {
                        $voidTag = true;
                        break;
                    }
                }
    
                if (!$voidTag) $indent=1;$nextTagNegative=true;
            }
            // 4. no indentation needed
            else $indent = 0;
    
            // pad the line with the required number of leading spaces
            $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
            $result .= $line."\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
        }
    
        return $result;
    }

    【讨论】:

    • 我的解决方案存在 LI 标签问题。还没找到解决办法。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-12-07
    • 2011-10-19
    • 1970-01-01
    • 2013-07-12
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多