有没有办法缩写R中对象的每个元素？答案

【问题标题】：Is there a way to abbreviate each element of an object in R?有没有办法缩写R中对象的每个元素？
【发布时间】：2020-03-16 18:08:39
【问题描述】：

我想缩写一个对象中超过 5 个字符的每个单词，并将删除的字符替换为“。”

即

会变成

“这个例子。我已经给了这里”

我想这必须通过循环来完成，并且可能还需要拆分成单独的字符串，但我对 R 非常陌生，并且真的很难让它做到这一点。任何帮助将不胜感激！

非常感谢！

【问题讨论】：

gsub("(?<=\\w{5})\\w+", ".", x, perl=TRUE)
@user20650，真不错！
@user20650 不错！您应该将其添加为答案。
感谢 dc37 / eipi10。 eipi，它适用于这个简单的例子，但我对正则表达式或可能的边缘情况没有那么自信，所以我会让它在这里闲逛（或者如果你有信心，可以随意添加到你的答案中）
@user20650，太棒了，谢谢！它似乎工作得很好。如果你不介意，如果你能解释一下这里发生了什么，那就太棒了。我熟悉 gsub 函数，但这里的一些论点对我来说是新的，我很想更好地理解以备将来使用。

标签： r loops substring character abbreviation

【解决方案1】：

我的答案如下，但请考虑改用@user20650 的答案。它更加简洁和优雅（尽管如果您不熟悉正则表达式可能难以理解）。根据@user20650 的第二条评论，请检查以确保它足够强大以处理您的实际数据。

这是一个tidyverse 选项：

library(tidyverse)

vec = c("this example sentence I have given here",
      "and here is another long example")

vec.abbrev = vec %>% 
  map_chr(~ str_split(.x, pattern=" ", simplify=TRUE) %>% 
            gsub("(.{5}).*", "\\1.", .) %>% 
            paste(., collapse=" "))
vec.abbrev

[1] "this examp. sente. I have given. here"
[2] "and here is anoth. long examp."

在上面的代码中，我们使用map_chr 来迭代vec 中的每个句子。管道 (%>%) 将每个函数的结果传递给下一个函数。

句点字符可能会造成混淆，因为它具有多个含义，具体取决于上下文。"(.{5}).*" 是Regular Expression，其中. 表示“匹配任何字符”。在"\\1." 中，. 实际上是一个句点。 gsub("(.{5}).*", "\\1.", .) 中的最后一个 . 和 paste(., collapse=" ") 中的第一个 . 是一个“代词”，表示我们传递给当前函数的前一个函数的输出。

这是一步一步的过程：

# Split each string into component words and return as a list
vec.abbrev = str_split(vec, pattern=" ", simplify=FALSE)

# For each sentence, remove all letters after the fifth letter in 
#  a word and replace with a period
vec.abbrev = map(vec.abbrev, ~ gsub("(.{5}).*", "\\1.", .x)) 

# For each sentence, paste the component words back together again, 
#  each separated by a space, and return the result as a vector, 
#  rather than a list
vec.abbrev = map_chr(vec.abbrev, ~paste(.x, collapse=" "))

【讨论】：

Base R 版本为：sapply(strsplit(x, " "), function(x) paste0(sub("(.{5}).*", "\\1.", x), collapse = " "))

【解决方案2】：

使用for 循环，您可以：

x <- "this example sentence I have given here"

x2 <- unlist(strsplit(x," "))

x3 <- NULL
for(w in x2)
{
  if(nchar(w) > 5) {
    w <- paste0(substr(w,1,5),".")
  }
  else{}
  x3 <- c(x3,w)
}
x_final <- paste(x3,collapse = " ")

最后的输出：

> x_final
[1] "this examp. sente. I have given here"

【讨论】：

我有类似的东西：sapply(strsplit(x, " "), function(x) { inds <- nchar(x) > 5; x[inds] <- paste0(substr(x[inds], 1, 5), "."); paste0(x, collapse = " ") })