正则表达式去除注释和多行注释和空行答案

【问题标题】：Regex to strip comments and multi-line comments and empty lines正则表达式去除注释和多行注释和空行
【发布时间】：2010-10-13 04:31:24
【问题描述】：

我想解析一个文件，我想用php和regex来剥离：

空行或空行
单行cmets
多行 cmets

基本上我想删除任何包含

的行

/* text */

或多行cmets

/***
some
text
*****/

如果可能，另一个正则表达式来检查该行是否为空（删除空行）

这可能吗？有人可以向我发布一个正则表达式吗？

非常感谢。

【问题讨论】：

相关：stackoverflow.com/questions/503871/…

标签： php regex preg-replace

【解决方案1】：

$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);

【讨论】：

非常感谢！第一个正则表达式删除了单行 cmets。然而，第二个正则表达式没有改变，也没有删除多行 cmets。感谢您的回复..再次感谢
确保您在第一个正则表达式上有 !s；这不在我最初的答案中。这就是它可以处理多行 cmets 的原因。第二个模式删除空行。
!s 让它 100% 工作。它比我的正则表达式好得多，我 +1。

【解决方案2】：

请记住，如果您正在解析的文件的字符串包含与这些条件匹配的内容，那么您使用的任何正则表达式都会失败。例如，它会变成这样：

print "/* a comment */";

进入这个：

print "";

这可能不是你想要的。但也许是，我不知道。无论如何，正则表达式在技术上无法以某种方式解析数据以避免该问题。我之所以这么说，是因为现代 PCRE 正则表达式已经添加了许多技巧，使它们都能够做到这一点，更重要的是，不再是 regular 表达式，而是随便什么。如果您想避免在引号内或在其他情况下剥离这些内容，则无法替代成熟的解析器（尽管它仍然可以很简单）。

【讨论】：

【解决方案3】：

它是可能的，但我不会这样做。您需要解析整个 php 文件以确保您没有删除任何必要的空格（字符串、关键字/标识符之间的空格（publicfuntiondoStuff()）等）。最好使用PHP的tokenizer extension。

【讨论】：

我只想依靠正则表达式。该文件太简单了，它有几个单行 cmets、多行注释和一些 PHP 代码（每个都在一个新行中）..我只想要一个进行清理的正则表达式公式......所以我可以使用浏览器中的输出供不同用途。
请注意，仅使用正则表达式的方法会遗漏“此处的文档”。要正确识别此类文本，您确实需要使用分词器。

【解决方案4】：

这应该可以将所有 /* 替换为 */。

$string = preg_replace('/(\s+)\/\*([^\/]*)\*\/(\s+)/s', "\n", $string);

【讨论】：

也感谢您的帮助。谢谢！

【解决方案5】：

$string = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/#', '', $string);

【讨论】：

【解决方案6】：

这是我的解决方案，如果不习惯正则表达式的话。以下代码删除所有以 # 分隔的注释，并以这种样式 NAME=VALUE

检索变量的值

  $reg = array();
  $handle = @fopen("/etc/chilli/config", "r");
  if ($handle) {
   while (($buffer = fgets($handle, 4096)) !== false) {
    $start = strpos($buffer,"#") ;
    $end   = strpos($buffer,"\n");
     // echo $start.",".$end;
       // echo $buffer ."<br>";



     if ($start !== false)

        $res = substr($buffer,0,$start);
    else
        $res = $buffer; 
        $a = explode("=",$res);

        if (count($a)>0)
        {
            if (count($a) == 1 && !empty($a[0]) && trim($a[0])!="")
                $reg[ $a[0] ] = "";
            else
            {
                if (!empty($a[0]) && trim($a[0])!="")
                    $reg[ $a[0] ] = $a[1];
            }
        }




    }

    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

【讨论】：

【解决方案7】：

这是一个很好的功能，而且有效！

<?
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}
function strip_comments($source) {
    $tokens = token_get_all($source);
    $ret = "";
    foreach ($tokens as $token) {
       if (is_string($token)) {
          $ret.= $token;
       } else {
          list($id, $text) = $token;

          switch ($id) { 
             case T_COMMENT: 
             case T_ML_COMMENT: // we've defined this
             case T_DOC_COMMENT: // and this
                break;

             default:
                $ret.= $text;
                break;
          }
       }
    }    
    return trim(str_replace(array('<?','?>'),array('',''),$ret));
}
?>

现在使用这个函数 'strip_cmets' 来传递包含在某个变量中的代码：

<?
$code = "
<?php 
    /* this is comment */
   // this is also a comment
   # me too, am also comment
   echo "And I am some code...";
?>";

$code = strip_comments($code);

echo htmlspecialchars($code);
?>

将结果输出为

<?
echo "And I am some code...";
?>

从 php 文件加载：

<?
$code = file_get_contents("some_code_file.php");
$code = strip_comments($code);

echo htmlspecialchars($code);
?>

加载一个 php 文件，剥离 cmets 并保存回来

<?
$file = "some_code_file.php"
$code = file_get_contents($file);
$code = strip_comments($code);

$f = fopen($file,"w");
fwrite($f,$code);
fclose($f);
?>

来源：http://www.php.net/manual/en/tokenizer.examples.php

【讨论】：

这很好用。但是有一个问题，它不会从删除 cmets 的位置删除空行。如果一个文件包含 500 行 cmets，那么这些单词将被删除，但空行仍然存在。你能告诉我们删除这些空行的正确方法吗？
结果，应用 next 删除空行：preg_replace('/\n\s*\n/', '', $code) 或 next 只删除开始的空行：preg_replace( '/^\n\s*\n/', '', $code)

【解决方案8】：

//  Removes multi-line comments and does not create
//  a blank line, also treats white spaces/tabs 
$text = preg_replace('!^[ \t]*/\*.*?\*/[ \t]*[\r\n]!s', '', $text);

//  Removes single line '//' comments, treats blank characters
$text = preg_replace('![ \t]*//.*[ \t]*[\r\n]!', '', $text);

//  Strip blank lines
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $text);

【讨论】：

单行注释替换在涉及 URL 时不起作用。 https://example.com 也被替换了。

【解决方案9】：

我发现这个更适合我，(\s+)\/\*([^\/]*)\*/\n* 它删除了多行、标签或非 cmets 以及它后面的间隔。我将留下这个正则表达式匹配的评论示例。

/**
 * The AdditionalCategory
 * Meta informations extracted from the WSDL
 * - minOccurs : 0
 * - nillable : true
 * @var TestStructAdditionalCategorizationExternalIntegrationCUDListDataContract
 */

【讨论】：