不久前,我写了一个函数(func_correlation),它接受一个矩阵或数据框(预测变量),并在考虑到许多参数的情况下返回高度相关的对。希望是您实际提出的问题。
示例数据,
lst_s = list()
set.seed(1)
for (i in 1:12) {
nam = paste0("s_", i)
lst_s[[nam]] = runif(100)
}
s_matr = do.call(cbind, lst_s)
使用devtools::install_github('mlampros/FeatureSelection')可以安装包然后运行,
dat = FeatureSelection::func_correlation(s_matr, target = NULL, correlation_thresh = 0.05, use_obs = "everything", correlation_method = "pearson")
# here the *correlation_thresh* is low because I use random data, adjust it to your needs
它返回一个列表(out_list)和一个数据框(out_df)。该列表显示了 correlation_thresh 之上的 individual 预测变量的相关性,
$out_list
$out_list[[1]]
s_1
s_3 0.14450632
s_6 0.10891246
s_7 0.13232308
s_8 0.07818346
s_9 0.06381170
$out_list[[2]]
s_2
s_10 0.1380704
s_11 0.1737746
.........
而数据框显示 correlation_thresh 之上的所有预测变量对,
$out_df
predictor1 predictor2 prob
1 s_3 s_1 0.14450632
2 s_6 s_1 0.10891246
3 s_7 s_1 0.13232308
4 s_8 s_1 0.07818346
5 s_9 s_1 0.06381170
6 s_10 s_2 0.13807039
7 s_11 s_2 0.17377459
8 s_4 s_3 0.10395950
9 s_6 s_3 0.21541706
...............