【发布时间】:2014-07-18 18:10:06
【问题描述】:
我想编写一个函数或循环来创建三个新列,然后如果原始列中的值在三个指定列表之一内,则用相同的值或指定的值填充这些列。
例如,数据如下所示:
> data
a1 a2 a3
1 C C A
2 A B_20 B_20
3 A C B_30
4 C C B_40
5 C A A
6 B_60 B_60 B_60
7 A A C
8 A C B_80
9 B_90 C B_90
我想创建三个新列(a1_t、a2_t、a3_t)如果a1 在list1 中
list1 <-c('B_10','B_20','B_30')
然后填写a1_t,用B_00_30
或者如果a1 在list2 中
list2 <-c('B_40','B_50','B_60')
然后填写a1_t,用B_40_60
或者如果a1在list3中
list3 <-c('B_70','B_80','B_90')
然后填写a1_t,用B_70_90
如果不在list1、list2 或list3 中,则将值从a1 放置到a1_t。
然后使用a2 和a3 对a2_t 和a3_t 重复相同的过程进行匹配。
最后我希望输出看起来像这样:
> data
a1 a2 a3 a1_t a2_t a3_t
1 A A B_10 A A B_00_30
2 B_20 A C B_00_30 A C
3 B_30 A C B_00_30 A C
4 C C A C C A
5 A B_50 B_50 A B_40_60 B_40_60
6 C C A C C A
7 C B_70 A C B_70_90 A
8 B_80 C B_80 B_70_90 C B_70_90
9 B_90 C A B_70_90 C A
创建原始原始数据:
data <- structure(list(a1 = c("A", "B_20", "B_30", "C", "A", "C", "C",
"B_80", "B_90"), a2 = c("A", "A", "A", "C", "B_50", "C", "B_70",
"C", "C"), a3 = c("B_10", "C", "C", "A", "B_50", "A", "A", "B_80",
"A")), class = "data.frame", .Names = c("a1", "a2", "a3"), row.names = c(NA,
-9L))
创建所需的输出数据:
data <- structure(list(a1 = structure(c(1L, 2L, 3L, 6L, 1L, 6L, 6L, 4L, 5L), .Label = c("A", "B_20", "B_30", "B_80", "B_90", "C"), class = "factor"),
a2 = structure(c(1L, 1L, 1L, 4L, 2L, 4L, 3L, 4L, 4L), .Label = c("A", "B_50", "B_70", "C"), class = "factor"),
a3 = structure(c(2L, 5L, 5L, 1L, 3L, 1L, 1L, 4L, 1L), .Label = c("A", "B_10", "B_50", "B_80", "C"), class = "factor"),
a1_t = structure(c(1L, 2L, 2L, 4L, 1L, 4L, 4L, 3L, 3L), .Label = c("A", "B_00_30", "B_70_90", "C"), class = "factor"),
a2_t = structure(c(1L, 1L, 1L, 4L, 2L, 4L, 3L, 4L, 4L), .Label = c("A", "B_40_60", "B_70_90", "C"), class = "factor"),
a3_t = structure(c(2L, 5L, 5L, 1L, 3L, 1L, 1L, 4L, 1L), .Label = c("A", "B_00_30", "B_40_60", "B_70_90", "C"), class = "factor")),
.Names = c("a1", "a2", "a3", "a1_t", "a2_t", "a3_t"), class = "data.frame", row.names = c(NA, -9L))
谢谢 -al
带答案的最终工作代码:
library(dplyr)
list1 <-c('B_10','B_20','B_30')
list2 <-c('B_40','B_50','B_60')
list3 <-c('B_70','B_80','B_90')
lookup = rbind(cbind(list = list1, val = "B_00_30"),
cbind(list2, "B_40_60"),
cbind(list3, "B_70_90"))
g <- sapply(data, function(x) {
tmp = lookup[, 2][match(x, lookup[, 1])]
ifelse(is.na(tmp), x, tmp)
})
gd <- as.data.frame (g)
gd <- mutate (gd,a1_t=a1,a2_t=a2,a3_t=a3)
gd <- select (gd,a1_t,a2_t,a3_t)
h <- cbind (data,gd)
> h
a1 a2 a3 a1_t a2_t a3_t
1 A A B_10 A A B_00_30
2 B_20 A C B_00_30 A C
3 B_30 A C B_00_30 A C
4 C C A C C A
5 A B_50 B_50 A B_40_60 B_40_60
6 C C A C C A
7 C B_70 A C B_70_90 A
8 B_80 C B_80 B_70_90 C B_70_90
9 B_90 C A B_70_90 C A
【问题讨论】:
-
关于警告信息,(我应该提到它,对不起)我正在利用
as.numeric的属性使所有非数字元素不适用。当然,它会打开警告。您可以使用suppressWarnings或更新后的脚本(以备不时之需)来避免警告。