【问题标题】:How to write a "find and replace all BUT" function in R? [closed]如何在 R 中编写“查找并替换所有 BUT”函数? [关闭]
【发布时间】:2012-10-18 22:12:47
【问题描述】:

我有一个大致如下所示的数据框(这意味着它是为说明而制作的近似值,而不是您可以通过下面的链接下载或从我粘贴在下面的 dput() 获取的数据框的精确副本) :

March_created_at    March_email March_type  April_created_at April_email    April_type
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 4:03                     PushEvent
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 4:03                     PushEvent
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 4:03                     PushEvent
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 7:03     high            IssuesEvent
3/11/12 11:06   medium          PushEvent   4/1/12 13:57    medium          PushEvent
3/11/12 11:06   medium          PushEvent   4/1/12 13:57    medium          PushEvent
3/11/12 11:06   medium          PushEvent   4/1/12 13:57    medium          PushEvent
3/11/12 12:46                   PushEvent   
3/11/12 12:46                   PushEvent   
3/11/12 12:46                   PushEvent   

完整的数据集可以在here找到一个CSV文件

我正在寻找一个接受以下输入的函数:

  1. 一个数据框
  2. 该数据框的某些列
  3. 字符串列表(例如一组电子邮件地址)
  4. 替换字符串(例如“low”)

现在,我希望该函数仅遍历该数据帧的指定列,并替换 与点中指定的字符串列表匹配的所有字符串(以及空单元格) 3 与第 4 点中的替换字符串。但是,只有在以下条件成立时才应该这样做:

所考虑的单元格需要具有同一月份的时间戳。

例如,假设我们要替换“March_email”列中第 8 行的空单元格。我可以看到“March_created_at”列的第 8 行有一个时间戳,所以我可以继续用指定的字符串(例如“low”)替换这个空单元格。但是,请查看“April_email”列中的第 8 行。此单元格也是空的,“April_created_at”列中第 8 行的单元格也是空的。在这种情况下,什么都不应该做(即没有插入字符串)。

我想这样做的原因是某些单元格只是空的,因为没有数据,所以不应该插入任何内容。其他单元格是空的,因为数据丢失了,所以我需要根据我上面指定的函数来估算数据。

如何在 R 中完成此操作?

附录:这是数据集头部的dput():

structure(list(March_created_at = c("2012-03-11 07:28:04", "2012-03-11 07:28:04", 
"2012-03-11 07:28:04", "2012-03-11 07:28:19", "2012-03-11 07:28:19", 
"2012-03-11 07:28:19"), March_actor_attributes_email = c("jeremy@asynk.ch", 
"jeremy@asynk.ch", "jeremy@asynk.ch", "jeremy@asynk.ch", "jeremy@asynk.ch", 
"jeremy@asynk.ch"), March_type = c("PushEvent", "PushEvent", 
"PushEvent", "PushEvent", "PushEvent", "PushEvent"), April_created_at = c("2012-04-01     04:03:13", 
"2012-04-01 04:03:13", "2012-04-01 04:03:13", "2012-04-01 07:03:11", 
"2012-04-01 07:03:11", "2012-04-01 07:03:11"), April_actor_attributes_email = c("", 
"", "", "high", "high", "high"), April_type = c("PushEvent", 
"PushEvent", "PushEvent", "IssuesEvent", "IssuesEvent", "IssuesEvent"
), May_created_at = c("2012-05-01 00:16:05", "2012-05-01 00:16:05", 
"2012-05-01 00:16:05", "2012-05-01 01:03:19", "2012-05-01 01:03:19", 
"2012-05-01 01:03:19"), May_actor_attributes_email = c("john.firebaugh@gmail.com", 
"john.firebaugh@gmail.com", "john.firebaugh@gmail.com", "mitch.tishmack@gmail.com", 
"mitch.tishmack@gmail.com", "mitch.tishmack@gmail.com"), May_type = c("PushEvent", 
"PushEvent", "PushEvent", "IssueCommentEvent", "IssueCommentEvent", 
"IssueCommentEvent"), June_created_at = c("2012-06-01 00:25:05", 
"2012-06-01 00:25:05", "2012-06-01 00:25:05", "2012-06-01 00:42:29", 
"2012-06-01 00:42:29", "2012-06-01 00:42:29"), June_actor_attributes_email =     c("michaelklishin@me.com", 
"michaelklishin@me.com", "michaelklishin@me.com", "", "", ""), 
    June_type = c("IssueCommentEvent", "IssueCommentEvent", "IssueCommentEvent", 
    "PushEvent", "PushEvent", "PushEvent"), July_created_at = c("2012-07-01 13:46:20", 
    "2012-07-01 13:46:20", "2012-07-02 11:53:37", "2012-07-02 11:53:37", 
    "2012-07-02 12:27:30", "2012-07-02 12:27:30"), July_actor_attributes_email = c("medium", 
    "medium", "ryoqun@gmail.com", "ryoqun@gmail.com", "ryoqun@gmail.com", 
    "ryoqun@gmail.com"), July_type = c("PushEvent", "PushEvent", 
    "CreateEvent", "CreateEvent", "PushEvent", "PushEvent"), 
    August_created_at = c("2012-08-01 00:04:09", "2012-08-01 00:04:09", 
    "2012-08-01 00:04:42", "2012-08-01 00:04:42", "2012-08-01 00:05:04", 
    "2012-08-01 00:05:04"), August_actor_attributes_email = c("jeremy@asynk.ch", 
    "jeremy@asynk.ch", "jeremy@asynk.ch", "jeremy@asynk.ch", 
    "jeremy@asynk.ch", "jeremy@asynk.ch"), August_type = c("IssueCommentEvent", 
    "IssueCommentEvent", "IssuesEvent", "IssuesEvent", "IssueCommentEvent", 
    "IssueCommentEvent"), September_created_at = c("2012-09-01 18:12:24", 
    "2012-09-01 18:12:24", "2012-09-01 23:51:18", "2012-09-01 23:51:18", 
    "2012-09-02 00:34:54", "2012-09-02 00:34:54"), September_actor_attributes_email = c("ryoqun@gmail.com", 
    "ryoqun@gmail.com", "ryoqun@gmail.com", "ryoqun@gmail.com", 
    "ryoqun@gmail.com", "ryoqun@gmail.com"), September_type = c("CommitCommentEvent", 
    "CommitCommentEvent", "CreateEvent", "CreateEvent", "PushEvent", 
    "PushEvent"), October_created_at = c("2012-10-01 07:48:38", 
    "2012-10-01 10:01:40", "2012-10-01 10:01:43", "2012-10-01 10:17:00", 
    "2012-10-01 16:08:29", "2012-10-01 18:06:46"), October_actor_attributes_email = c("medium", 
    "medium", "medium", "medium", "", "core"), October_type = c("PushEvent", 
    "IssuesEvent", "PushEvent", "PushEvent", "ForkEvent", "PullRequestEvent"
    )), .Names = c("March_created_at", "March_actor_attributes_email", 
"March_type", "April_created_at", "April_actor_attributes_email", 
"April_type", "May_created_at", "May_actor_attributes_email", 
"May_type", "June_created_at", "June_actor_attributes_email", 
"June_type", "July_created_at", "July_actor_attributes_email", 
"July_type", "August_created_at", "August_actor_attributes_email", 
"August_type", "September_created_at", "September_actor_attributes_email", 
"September_type", "October_created_at", "October_actor_attributes_email", 
"October_type"), row.names = c(NA, 6L), class = "data.frame") 

【问题讨论】:

  • 这个问题被否决了。该人能否解释原因,以便我提高问题的质量?
  • 不是我,但我认为是因为它非常本地化。它不是询问 R 的问题,而是询问特定的实现目标。如果您将鼠标悬停在向下箭头上,我会认为它属于最后一类。它不太可能对其他人有用。

标签: r data-manipulation


【解决方案1】:

这样的事情怎么样:

myfun <- function(month, DF, matches, replacement) {
  email.col <- paste0(month, '_actor_attributes_email')
  date.col  <- paste0(month, '_created_at')


  DF[[email.col]] <- ifelse(DF[[date.col]] != '' & !DF[[email.col]] %in% matches,
                            DF[[email.col]],
                            replacement)

  return (DF[, c(date.col, email.col)])
}

myfun(dat, 'April', 'high', 'foo')

#          April_created_at April_actor_attributes_email
# 1 2012-04-01     04:03:13                          foo   
# 2     2012-04-01 04:03:13                          foo   
# 3     2012-04-01 04:03:13                          foo   
# 4     2012-04-01 07:03:11                          high
# 5     2012-04-01 07:03:11                          high
# 6     2012-04-01 07:03:11                          high

然后,你可以喂它几个月……

out <- lapply(list('March', 'April', 'May'), 
              myfun, DF=dat, matches='', replacement='foo')

您可以快速将其恢复到 data.frame 中。 plyr

as.data.frame(unlist(out, recursive=FALSE))

还有很多其他的方法和选择,但这应该会给你一个很好的开始。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-10-14
    • 2017-06-29
    • 2023-01-12
    • 2015-09-14
    • 2023-03-03
    • 1970-01-01
    • 1970-01-01
    • 2021-06-08
    相关资源
    最近更新 更多