使用 R 在数据框中的人名后添加逗号答案

【问题标题】：add comma after person's name in dataframe using R使用 R 在数据框中的人名后添加逗号
【发布时间】：2019-08-19 04:59:30
【问题描述】：

如何在我的字符串中的用户名之后添加逗号，以便我可以消除逗号之前的单词以获得可以用于精确匹配的统一字符串。

 a=dataframe(text=c("hi john what are you doing",
                "hi sunil what are you doing",
                "hello sanjay what are you doing"),stringsAsFactors =FALSE)

【问题讨论】：

你有名字列表还是名字向量
问题是，输入文件是小写的……所以很难区分名字。另外，建议是否有办法将用户名转换为大写字母，以便我们以后删除它们。
U 需要一个特定的模式，名称必须属于该模式。否则这将是不可能的 // 编辑：如果所有条目的结构都是这样的，你可以只使用第二个单词作为用户名的参考。
嗨，akrun，我没有名字列表，因为它是一个大文件
如果没有模式，那就很难了

标签： r

【解决方案1】：

如果您知道用户名在句子中的第二个位置，您可以从 DF 中提取句子并使用：

text=c("hi john what are you doing",
                "hi sunil what are you doing",
                "hello sanjay what are you doing")

for (sentence in text) {
  #separate words in sentence
  spl <- strsplit(sentence," ")
  #extract name and converto to uppercase
  name <- toupper(as.character(spl[[1]])[2])
  #put a comma after name
  name2 <- paste(name, ",", sep="")
  #replace original name with new one
  spl[[1]][2] <- name2
  #loop over the sentence words to recretae the sentence
  for ( i in 1:length(spl[[1]])-1 ) {
    if (i == 1) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
    else sentence2 <- paste(sentence2, spl[[1]][i+1])
    }
    #put in new list (text2)
    if (sentence == text[1]) text2 <- c(sentence2)
    else text2 <- append( text2, sentence2 )
  }

结果：

#text2
#[1] "hi JOHN, what are you doing"      "hi SUNIL, what are you doing"    
#[3] "hello SANJAY, what are you doing"

然后重新创建数据框。

否则，如果您的用户名在句子中的位置可能不同，但您有一个需要查找的用户名列表，您还可以检查是否至少找到一个，取用户名在句子中的位置，替换，放入逗号，然后重新创建，如果找不到则打印错误。

usernames <- c("john", "sunil", "sanjay")

text=c("hi john what are you doing",
                "hi sunil what are you doing",
                "hello sanjay what are you doing",
                "hello ciao how are you"
              )


for (sentence in text) {

  user_present <- NA

  #separate words in sentence
  spl <- strsplit(sentence," ")

  #check if a user is present in the sentence
  for (user in usernames) {
    if ( user %in% spl[[1]]) {
      user_present <- user
      break
    }}

  #if at least one user is found
  if ( !is.na(user_present) ) {
    pos <-   which( spl[[1]] == user_present )
    #extract name and converto to uppercase
    name <- toupper(as.character(spl[[1]])[pos])
    #put a comma after name
    name2 <- paste(name, ",", sep="")
    #replace original name with new one
    spl[[1]][2] <- name2
    #loop over the sentence words to recretae the sentence
    for ( i in 1:length(spl[[1]])-1 ) {
      if (i == 0) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
      else sentence2 <- paste(sentence2, spl[[1]][i+1])
      }
      #put in new list (text2)
      if (sentence == text[1]) text2 <- c(sentence2)
      else text2 <- append( text2, sentence2 )
  #if NO username in sentence
  } else {
    #print error message with username and sentence in which not found
    err.msg <- paste("NO username found in sentence: ", sentence)
    print(err.msg)
  }
}

结果：

#[1] "NO username found in sentence:  hello ciao how are you"

text2
#[1] " hi JOHN, what are you doing"      " hi SUNIL, what are you doing"    
#[3] " hello SANJAY, what are you doing"

希望对你有帮助！

###END

【讨论】：

粘贴错误(sentence2, spl[[1]][i + 1]) : object 'sentence2' not found for first code (if usename is second)

【解决方案2】：

解决这个问题的两个想法。

首先，如果你能得到一个包含用户名的列表。

usernames <- c("john", "sunil", "sanjay")
diag(sapply(usernames, function(x) gsub(x, paste0(x, ","), a$text)))
# [1] "hi john, what are you doing"      "hi sunil, what are you doing"     "hello sanjay, what are you doing"

或者，如果用户名始终是第二个单词。

gsub("(^\\w*\\s)(\\w*)", "\\1\\2,", a$text)
# [1] "hi john, what are you doing"      "hi sunil, what are you doing"     "hello sanjay, what are you doing"

数据

a <- structure(list(text = c("hi john what are you doing", "hi sunil what are you doing", 
"hello sanjay what are you doing")), class = "data.frame", row.names = c(NA, 
-3L))

【讨论】：