从 dummies 包中查看 documentation 中的 dummy.data.frame 函数。它允许灵活使用model.matrix 函数。
library(dummies)
set.seed(20170402)
n <- 5
df <- data.frame(x = rnorm(n),
y = rnorm(n, 1),
red_herring = as.logical(round(runif(n, 0, 1))))
# Character column
df$red_herring <- dplyr::if_else(df$red_herring == T, 'Yes', 'No', NA_character_)
# Factor column
df$married <- factor(df$red_herring, levels = c('No', 'Yes'))
默认为字符和因子类创建虚拟变量:
dummies::dummy.data.frame(df)
# x y red_herringNo red_herringYes marriedNo marriedYes
# 1 -2.49355296 1.6209886 0 1 0 1
# 2 0.06896791 2.6101371 1 0 1 0
# 3 -0.01188042 0.4857511 0 1 0 1
# 4 0.47565318 1.1194925 0 1 0 1
# 5 0.34437239 3.0801658 1 0 1 0
您可以将要转换的变量向量传递给names 参数:
dummies::dummy.data.frame(df, names = 'married')
# x y red_herring marriedNo marriedYes
# 1 -2.49355296 1.6209886 Yes 0 1
# 2 0.06896791 2.6101371 No 1 0
# 3 -0.01188042 0.4857511 Yes 0 1
# 4 0.47565318 1.1194925 Yes 0 1
# 5 0.34437239 3.0801658 No 1 0
或者您可以通过dummy.classes 指定要将哪些类的变量转换为虚拟变量:
dummies::dummy.data.frame(df, dummy.classes = 'factor')
# x y red_herring marriedNo marriedYes
# 1 -2.49355296 1.6209886 Yes 0 1
# 2 0.06896791 2.6101371 No 1 0
# 3 -0.01188042 0.4857511 Yes 0 1
# 4 0.47565318 1.1194925 Yes 0 1
# 5 0.34437239 3.0801658 No 1 0