【发布时间】:2015-10-12 21:42:12
【问题描述】:
假设我的数据集如下所示:
working_data <- dplyr::data_frame("Date" = c("2015-01-01", "2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04", "2015-01-04", "2015-01-04"),
"Time" = c("15:01", "15:01", "21:04", "13:19", "07:15", "07:15", "07:15"),
"SeizureTime" = c("0:10", "0:07", "0:11", "0:04", "0:08", "0:06", "0:07"),
"ET" = c("0:35", "0:35", "0:04", "1:10", "3:35", "3:35", "3:35"),
"ONumber" = c("(123)555-1234", "(123)555-1234", "(123)555-9999", "(000)555-9876", "(123)555-1111", "(123)555-1111", "(123)555-1111"),
"TNumber" = c("(123)555-1234", "(123)555-1234", "(123)555-9999", "(000)555-9876", "(123)555-1111", "(123)555-1111", "(123)555-1111"),
"CT" = c("a", "a", "b", "a", "b", "b", "b"))
我想从这些数据中提取可能重复的行。我这样做的方法如下:
while (nrow(working_data) != 0) {
target_call <- working_data[1, ]
working_data <- working_data[-1, ]
similar_calls <- working_data %>% dplyr::filter(Date == target_call$Date,
Time == target_call$Time,
ET == target_call$ET,
ONumber == target_call$ONumber,
TNumber == target_call$TNumber)
第一个循环将设置target_call 等于working_data 的第一行,并将设置similar_calls 等于第二行。假设一切顺利......我遇到的问题是,一旦我在target_call 和similar_calls 上运行我的函数,我就不想再看到它们了。所以我想从working_data 中删除被拉入similar_calls 的数据。
填充target_call 和similar_calls 后,我需要确定哪些调用(如果有)与target_call 相同,然后进一步确定哪个是正确的选择,一旦我'选择了正确的调用,将其添加到名为 resolved_calls 的新数据集。如果similar_calls 中还有剩余呼叫,那么我需要重复选择呼叫的分析并将其中一个呼叫添加到resolved_calls。
我能想到的最佳方法是将数据拆分为两个单独的数据帧。但是当我处理多个列时,我不知道该怎么做。我唯一的选择是一个非常丑陋的 ifelse 语句,例如:
working_data$Group <- ifelse(working_data$Date == target_call$Date & ... & working_data$TNumber == target_call$TNumber, 1, 0)
similar_calls <- working_data %>% dplyr::filter(Group == 1)
working_data <- working_data %>% dplyr::filter(Group == 0)
有没有更好的方法来做到这一点?
【问题讨论】: