【问题标题】:split string contain html tags拆分字符串包含 html 标签
【发布时间】:2020-08-15 15:02:03
【问题描述】:

我有这个 html 字符串:

this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too

我想拆分它并得到这样的结果数组:

this simple 
the<b>html string<b>
text test 
that<b>need</b>to<b>spl</b>it
it too

我试过这样:

     var string ='this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
     var regex =  XRegExp('((?:[\\p{L}\\p{Mn}]+|)<\\s*.*?[^>]*>.*?<\/.*?>(?:[\\p{L}\\p{Mn}]+|))', "g");
 
    result = string.split(regex);

它没有用,我不想逐字逐句拆分有没有办法做到这一点...

【问题讨论】:

  • 你在什么条件下尝试拆分它?!
  • 是的,我想匹配包含多个标签或一个标签的整个单词并拆分字符串,如我提供的数组中所示
  • 这毫无意义,您在两个“对象数组”中有单词the,它周围没有标签。还有it
  • string.split(/(?:^|\s+)([^\s&lt;&gt;]+(?:\s+[^\s&lt;&gt;]+)*)(?:\s+|$)/).filter(Boolean) (demo)
  • string.split(/((?&lt;=\s)\w+&lt;\w&gt;.*?&lt;\/\w&gt;.*?(?=\s))/); - 你也可以试试这个。

标签: javascript regex split word


【解决方案1】:

使用

string.split(/\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/).filter(Boolean);

捕获组将允许将匹配项保存为结果数组的一部分。

正则表达式解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^\s<>]+                 any character except: whitespace (\n,
                             \r, \t, \f, and " "), '<', '>' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ")
                               (1 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      [^\s<>]+                 any character except: whitespace (\n,
                               \r, \t, \f, and " "), '<', '>' (1 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))

JavaScript:

const string = 'this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
const regex= /\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/;
console.log(string.split(regex).filter(Boolean));

输出:

[
  "this simple",
  "the<b>html string</b>",
  "text test",
  "that<b>need</b>to<b>spl</b>it",
  "it too"
]

【讨论】:

  • 如果标签包含值或属性,例如:"thehtml string",
  • 如果字符串只有这个字符串怎么办:“
猜你喜欢
  • 1970-01-01
  • 2015-12-27
  • 2021-08-05
  • 1970-01-01
  • 1970-01-01
  • 2023-03-30
  • 1970-01-01
  • 2011-01-10
  • 1970-01-01
相关资源
最近更新 更多