【问题标题】:R partial String Match - excludeR部分字符串匹配 - 排除
【发布时间】:2016-01-12 17:10:45
【问题描述】:

我有一个基本上我想清理的电子邮件列表。我想声明,如果“@”字符不在特定的电子邮件中,我想删除该电子邮件 - 这样一来,“mywebsite.com”之类的输入就会被删除。

我的代码如下:

  email_clean <- function(email, invalid = NA){
    email <- trimws(email)                                                          # Removes whitespace
    email[(nchar(email) %in% c(1,2)) ] <- invalid                                   # Removes emails with 1 or 2 character length
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",           # List of bad emails - modify to the 
                   "\\@noemail.com", "\\@test.com",                                 # specifications of the request

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")          # Deletes names matching bad email
    email <-gsub(pattern, invalid, sapply(email,as.character))
    unname(email)
  }

  ## Define vector of SSN from origianl csv column
  Cleaned_Email <- email_clean(my_data$Email)


  ## Binds cleaned phone to csv
  my_data<-cbind(my_data,Cleaned_Email)

谢谢!!

【问题讨论】:

  • 你有什么问题?

标签: r string csv


【解决方案1】:
  email_clean <- function(email, invalid = NA){
    email <- trimws(email)                                                          # Removes whitespace
    email[(nchar(email) %in% c(1,2)) ] <- invalid                                   # Removes emails with 1 or 2 character length
    email[!grepl("@", email)] <- invalid  # <------------------ New line added here ------------
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",           # List of bad emails - modify to the 
                   "\\@noemail.com", "\\@test.com",                                 # specifications of the request

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")          # Deletes names matching bad email
    email <-gsub(pattern, invalid, sapply(email,as.character))
    unname(email)
  }

【讨论】:

    【解决方案2】:

    尝试此操作以排除 my_data 中电子邮件列中没有“@”符号的任何行:

    my_data <- my_data[grep('@', my_data$Email), ]
    

    【讨论】:

    • 我认为 grep 不起作用,因为我在技术上正在查看电子邮件的向量,除非我遗漏了什么。
    • 您仍然可以将 grep 用作:Email[grep('@', Email)]。 grep 方法只返回发生匹配的索引向量。您可以根据返回的向量对数据框或向量进行子集化。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-10-18
    • 2014-07-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-06-15
    • 2020-09-24
    相关资源
    最近更新 更多