\s 正则表达式参数与 UTF-8 多字节字符串不兼容。
这个 PHP RegEx 是我为解决这个问题而编写的,它使用基于 PCRE(Perl 兼容正则表达式)的参数来替代 UTF-8 字符串:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- 示例用法-
之前:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
之后:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE 参数列表
来源:https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)