删除括号内的字符串答案

【问题标题】：Removing string inside brackets删除括号内的字符串
【发布时间】：2011-05-04 13:56:49
【问题描述】：

美好的一天！

我需要一些帮助来删除方括号内的字符串并包括方括号。

字符串如下所示：

$string = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";

我只想删除包含“www.example.com”的括号及其内容。我想在字符串中保留"[test]"，并且任何其他括号中都没有"www.example.com"。

谢谢！

【问题讨论】：

您正在处理的文本是通用 HTML 标记吗？该标记是否会包含SCRIPT 或STYLE 元素？（或开始标签中的事件处理程序？）如果是这样，在删除方括号结构时需要非常小心。你得到的可能少于比你讨价还价！
它有一些 HTML 标记，但没有任何 SCRIPT 或 STYLE。我在某个 url 上使用了 cURL 并获得了我想要的内容。但我想删除内容中网站的url。

标签： php regex preg-replace

【解决方案1】：

注意： OP 极大地改变了这个问题。此解决方案旨在以原始（更难的）形式处理问题（在添加“www.example.com”约束之前）。尽管已修改以下解决方案以处理此附加约束，但现在可能会使用更简单的解决方案足够了（即anubhava的回答）。

这是我测试过的解决方案：

function strip_bracketed_special($text) {
    $re = '% # Remove bracketed text having "www.example.com" within markup.
          # Skip comments, CDATA, SCRIPT & STYLE elements, and HTML tags.
          (                      # $1: HTML stuff to be left alone.
            <!--.*?-->           # HTML comments (non-SGML compliant).
          | <!\[CDATA\[.*?\]\]>  # CDATA sections
          | <script.*?</script>  # SCRIPT elements.
          | <style.*?</style>    # STYLE elements.
          | <\w+                 # HTML element start tags.
            (?:                  # Group optional attributes.
              \s+                # Attributes separated by whitespace.
              [\w:.-]+           # Attribute name is required
              (?:                # Group for optional attribute value.
                \s*=\s*          # Name and value separated by "="
                (?:              # Group for value alternatives.
                  "[^"]*"        # Either double quoted string,
                | \'[^\']*\'     # or single quoted string,
                | [\w:.-]+       # or un-quoted string (limited chars).
                )                # End group of value alternatives.
              )?                 # Attribute values are optional.
            )*                   # Zero or more start tag attributes.
            \s*/?>               # End of start tag (optional self-close).
          | </\w+>               # HTML element end tags.
          )                      # End #1: HTML Stuff to be left alone.
        | # Or... Bracketed structures containing www.example.com
          \s*\[                  # (optional ws), Opening bracket.
          [^\]]*?                # Match up to required content.
          www\.example\.com      # Required bracketed content.
          [^\]]*                 # Match up to closing bracket.
          \]\s*                  # Closing bracket, (optional ws).
        %six';
    return preg_replace($re, '$1', $text);
}

请注意，正则表达式会跳过从 HTML cmets、CDATA 部分、SCRIPT 和 STYLE 元素以及 HTML 标记属性值中删除括号内的材料。给定以下 XHTML 标记（用于测试这些场景），上述函数仅正确删除了 html 元素内容中括号内的内容：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal. [Remove this www.example.com]</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal. [Remove this www.example.com]</h1>
<p>Test special removal. [Remove this www.example.com]</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal. [Remove this www.example.com]
</p>
</div>
</body>
</html>

下面是通过上面的 PHP 函数运行后的相同标记：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal.</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal.</h1>
<p>Test special removal.</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal.</p>
</div>
</body>
</html>

这个解决方案应该适用于几乎任何你可以扔给它的有效 (X)HTML。（但请不要使用时髦的 shorttags 或 SGML comments！）

【讨论】：

【解决方案2】：

$str = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";
$str = preg_replace('~\[[^]]*?www\.example\.com[^]]*\]~si', "", $str);
var_dump($str);

输出

string(83) "Lorem ipsum dolor<br />  <br />some text here. Text here. [test] Lorem ipsum dolor."

PS：它适用于多行换行。

【讨论】：

不需要使用 e 选项（邪恶），但不贪婪的选项在这里应该有用
@soju：没有e 选项，这是一个复制/粘贴错误。我在此处发布后立即将其删除。
@Viswanathan Iyer：这段代码只是去掉了[ 和] 之间的所有内容，包括方括号。任何地方都没有 <br/> 到 EOL 的转换。
@anubhava 在输出中只显示“Lorem ipsum dolor [newlien] some text here”
非常感谢您的迅速回复。对此很抱歉，但我想保留括号中的一些字符串。我只想删除其中包含 www.example.com 的括号及其内容。我将如何做到这一点？谢谢！

【解决方案3】：

使用像/\[.*?\]/ 这样的正则表达式。反斜杠是必需的，否则它将尝试匹配任何单个字符 .、* 或 ?。

【讨论】：

非常感谢您的迅速回复。对此感到抱歉，但我想保留括号中的一些字符串。我只想删除其中包含 www.example.com 的括号及其内容。我将如何做到这一点？谢谢！
@user704278：然后试试/\[[^]]*www\.example\.com[^]]*\]/。匹配一个左括号，然后是任意数量的非右括号字符，然后是字符串“www.example.com”，然后是任意数量的非右括号字符，然后是右括号。

【解决方案4】：

我能想到的最简单的方法是使用正则表达式计算[ 和] 之间的所有内容，然后将其替换为""。下面的代码将替换您在示例中使用的字符串。如果需要删除的实际字符串更复杂，您可以更改正则表达式以匹配。我推荐使用regexpal.com 来测试你的正则表达式。

$string = preg_replace("\[[A-Za-z .]*\]","",$string);

【讨论】：

非常感谢您的迅速回复。对此感到抱歉，但我想保留括号中的一些字符串。我只想删除其中包含 www.example.com 的括号及其内容。我将如何做到这一点？谢谢！
我可以使用它。数组 $pattern 的每个索引都将匹配您要删除的不同功能。 $pattern[0] 匹配带有或不带有相邻空格的左括号。 $pattern[1] 匹配带有或不带有相邻空格的右括号。 $pattern[2] 匹配字符串 www.anything.com。数组 $replace 由您要替换的字符串组成。 $pattern = array("\[(\s)?","(\s)?\]","\swww.[\w]*.com\s");$replace = array("",""," ");$string = preg_replace($pattern,$replace,$string);

【解决方案5】：

以下代码会将<br/> 更改为换行符：

$str = "Lorem ipsum dolor<br />[ Context are found on www.example.com ] <br />some text here";
$str = preg_replace( "/\[[^\]]*\]/m", "", $str);
echo $str;

输出：

Lorem ipsum dolor

这里有一些文字

【讨论】：