使用toString。
df$class <- factor(apply(df[c("City", "Age_Group")], 1, toString))
levels(df$class)
# [1] "City 1, 0-9" "City 1, 10-19" "City 1, 20-29" "City 1, 30-39"
# [5] "City 1, 40-49" "City 1, 50-59" "City 1, 60-69" "City 1, 70-79"
# [9] "City 1, 80-89" "City 1, 90+" "City 10, 0-9" "City 10, 10-19"
# [13] "City 10, 20-29" [...]
要获得随机样本,您可以将数据集 by "class" 拆分为子集,例如 s,然后在将 nrow(s)/20(个人)除以 20 时计算得到的组数。使用ceiling 这个可能是小数点的数字,比如x,然后利用 R 的回收属性;使用cbind 将1:ceiling(x) 绑定到s 并让它循环到nrow(s),在那里我们可以安全地suppressWarnings。当然我们现在要使用sample 来扰乱秩序,只需要列[,2]。最后使用do.call(rbind(.))解开数据集,如果需要,可以删除rownames。
set.seed(1) ## for sake of reproducibility
df <- `rownames<-`(do.call(rbind, by(df, df$class, function(s)
transform(s, SAMP=suppressWarnings(
sample(cbind(s$class, SAMP=1:ceiling(nrow(s)/20))[,2])
)))), NULL)
结果:
产生"SAMP" 列,组大小大致相等,每个"class" 有约20 个成员。
df[60:70, ] ##example rows
# ID City Age_Group class SAMP
# 60 8766 City 01 0-9 City 01, 0-9 4
# 61 8775 City 01 0-9 City 01, 0-9 1
# 62 9021 City 01 0-9 City 01, 0-9 3
# 63 9041 City 01 0-9 City 01, 0-9 3
# 64 9482 City 01 0-9 City 01, 0-9 1
# 65 9622 City 01 0-9 City 01, 0-9 1
# 66 47 City 01 10-19 City 01, 10-19 4
# 67 698 City 01 10-19 City 01, 10-19 3
# 68 833 City 01 10-19 City 01, 10-19 1
# 69 1166 City 01 10-19 City 01, 10-19 1
# 70 1221 City 01 10-19 City 01, 10-19 2
检查类的前十个表及其 SAMPles:
by(df$SAMP, df$class, table)[1:10]
# $`City 01, 0-9`
#
# 1 2 3 4
# 17 16 16 16
#
# $`City 01, 10-19`
#
# 1 2 3 4
# 18 17 17 17
#
# $`City 01, 20-29`
#
# 1 2 3 4
# 18 18 17 17
#
# $`City 01, 30-39`
#
# 1 2 3 4
# 19 19 19 19
#
# $`City 01, 40-49`
#
# 1 2 3 4
# 19 19 19 18
#
# $`City 01, 50-59`
#
# 1 2 3 4 5
# 18 17 17 17 17
#
# $`City 01, 60-69`
#
# 1 2 3 4
# 16 16 16 16
#
# $`City 01, 70-79`
#
# 1 2 3 4
# 19 19 19 19
#
# $`City 01, 80-89`
#
# 1 2 3 4
# 20 19 19 19
#
# $`City 01, 90+`
#
# 1 2 3 4
# 18 17 17 17
如果您希望按班级编号而不是全部编号,只需将 paste "class"(作为数字)和 "SAMP" 放在一起。
df <- transform(df, SAMP2=paste(as.numeric(class), SAMP, sep="."))
head(df)
# ID City Age_Group class SAMP SAMP2
# 1 193 City 01 0-9 City 01, 0-9 3 1.3
# 2 480 City 01 0-9 City 01, 0-9 1 1.1
# 3 742 City 01 0-9 City 01, 0-9 2 1.2
# 4 757 City 01 0-9 City 01, 0-9 1 1.1
# 5 811 City 01 0-9 City 01, 0-9 3 1.3
# 6 870 City 01 0-9 City 01, 0-9 3 1.3