将多个空间合并为单个空间；删除尾随/前导空格答案

【问题标题】：Merge Multiple spaces to single space; remove trailing/leading spaces将多个空间合并为单个空间；删除尾随/前导空格
【发布时间】：2014-10-31 16:36:48
【问题描述】：

我想将多个空格合并为单个空格（空格也可以是制表符）并删除尾随/前导空格。

例如...

string <- "Hi        buddy        what's up    Bro"

到

"Hi buddy what's up bro"

我检查了Regex to replace multiple spaces with a single space 给出的解决方案。请注意，不要将 \t 或 \n 作为玩具字符串内的确切空间，并将其作为gsub 中的模式提供。我想要在 R 中。

请注意，我无法在玩具字符串中放置多个空格。谢谢

【问题讨论】：

如果你最后仔细阅读我的Q，你可以创建一个带有多个空格的玩具字符串，然后回复我的Q。我上面说过我无法在玩具字符串中放置多个空格作为stackoverlow 自动从我的查询中删除了它。
gsub("^ *|(?<= ) | *$", "", x, perl = TRUE)
嗨，大卫，这对我有用。但是你能解释一下这个模式到底在做什么吗？即 ^ *|(?
见here
@DavidArenburg 您给出的答案有效，但结束关注问题的指导方针。那个问题（我相信；虽然可能是错误的）是不同的（我现在找不到），因为它需要多个空格和前导。这需要多个空格和前导/尾随。同样，我可能错过了上一篇文章中的某些内容，但我不相信这两个问题是完全重复的。

标签： r pattern-matching

【解决方案1】：

这似乎满足您的需求。

string <- "  Hi buddy   what's up   Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"

【讨论】：

感谢您的回复。事实上，我只想要 gsub 部分，因为我不想将 B 替换为 b。我坚持的地方是找到做这种事情的模式。你能解释一下 \\s+ 的含义吗？
@chandresh - \\s+ 表示“一个或多个空格”
此时值得注意的是，这是解决将 Bro 中的大写 b 更改为小写的唯一答案，如问题的预期结果所示。
@RichScriven 我不需要小写，如何保留大小写？

【解决方案2】：

或者直接尝试stringr中的squish函数

library(stringr)
string <- "  Hi buddy   what's up   Bro "
str_squish(string)
# [1] "Hi buddy what's up Bro"

【讨论】：

【解决方案3】：

使用单个正则表达式的另一种方法：

gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", string, perl=TRUE)

解释 (from)

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    [\s]                     any character of: whitespace (\n, \r,
                             \t, \f, and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

【讨论】：

为什么不使用更简单的正则表达式，比如 Adam Erickson 的？

【解决方案4】：

您不需要导入外部库来执行这样的任务：

string <- " Hi        buddy        what's up    Bro "
string <- gsub("\\s+", " ", string)
string <- trimws(string)
string
[1] "Hi buddy what's up Bro"

或者，在一行中：

string <- trimws(gsub("\\s+", " ", string))

干净得多。

【讨论】：

这不依赖于任何外部库，也不是像 Tyler Rinker 那样的噩梦般的 REGEX。想知道为什么你没有更多的赞成票？
我也想知道为什么@heisenbug47 半年后完全重复了我的答案。

【解决方案5】：

qdapRegex 具有处理此问题的 rm_white 函数：

library(qdapRegex)
rm_white(string)

## [1] "Hi buddy what's up Bro"

【讨论】：

【解决方案6】：

您也可以从qdap 尝试clean

library(qdap)
library(stringr)
str_trim(clean(string))
#[1] "Hi buddy what's up Bro"

或者按照@Tyler Rinker 的建议（仅使用qdap）

Trim(clean(string))
#[1] "Hi buddy what's up Bro"

【讨论】：

您可以在qdap 内通过Trim(clean(string)) 完成所有操作。

【解决方案7】：

为此，不需要加载任何额外的库，因为 Base r 包的 gsub() 可以完成这项工作。
无需记住那些额外的库。使用trimws() 删除前导和尾随空格，并使用@Adam Erickson 提到的gsub() 替换多余的空格。

    `string = " Hi        buddy        what's up    Bro "
     trimws(gsub("\\s+", " ", string))`

此处\\s+ 匹配一个或多个空格，gsub 将其替换为单个空格。

要了解任何正则表达式在做什么，请访问@Tyler Rinker 提到的link。
只需复制并粘贴您想知道它在做什么的正则表达式，this 将完成剩下的工作。

【讨论】：

【解决方案8】：

使用 strsplit 的另一种解决方案：

将文本拆分为单词，然后使用粘贴功能连接单个单词。

string <- "Hi        buddy        what's up    Bro" 
stringsplit <- sapply(strsplit(string, " "), function(x){x[!x ==""]})
paste(stringsplit ,collapse = " ")

对于多个文档：

string <- c("Hi        buddy        what's up    Bro"," an  example using       strsplit ") 
stringsplit <- lapply(strsplit(string, " "), function(x){x[!x ==""]})
sapply(stringsplit ,function(d) paste(d,collapse = " "))

【讨论】：

【解决方案9】：

这似乎有效。
它不会像 Rich Scriven's 那样消除句子开头或结尾的空格但是，它合并了多个白色调

library("stringr")
string <- "Hi     buddy     what's      up       Bro"
str_replace_all(string, "\\s+", " ")
#> str_replace_all(string, "\\s+", " ")
#  "Hi buddy what's up Bro"

【讨论】：