如何找到与另一个特定变量高度相关的变量？ [关闭]答案

【问题标题】：How do I find the highly correlated variables with another specific variable? [closed]如何找到与另一个特定变量高度相关的变量？ [关闭]
【发布时间】：2020-03-05 04:20:09
【问题描述】：

我有一个数据框，其中包含不同个人在各种事件中得分的数量。然后将总列添加到此数据框中。现在，如何计算与总列高度相关的变量？

【问题讨论】：

欢迎来到 StackOverflow！请阅读有关“how to ask a good question”和“how to give a reproducible example”的信息。这将使其他人更容易帮助您。
你已经尝试过什么？你在哪里卡住了？

标签： r statistics

【解决方案1】：

这听起来有点可疑，像是试图让我们为你做作业，但这是第一次尝试（为方便起见，假定总列是第一列）：

sort(sapply(df[,-1], function(x) cor(x,df$total)), decreasing=TRUE)[1:3]

【讨论】：

sapply(sort(cor(hep_cleanup[c(-1,-9)],hep_cleanup$Total)[,1],decreasing = TRUE), 名称(df[c(1,2, 3)]))。我最终来到了我在 sapply 中使用排序的地方。无论如何，谢谢。
names(sort(cor(hep_cleanup[c(-1,-9)],hep_cleanup$Total)[,1],decreasing = TRUE))[1:3]。我也能够在不使用 sapply 的情况下得到答案。干杯！
哦，当然可以。我以为你想使用 sapply，因为那是你的任务。

【解决方案2】：

真正的问题是您需要调整cor 输出的结果。融化是您的解决方案。

如果你不想做 tidyverse 解决方案，请使用reshape::melt.

require(dplyr) # for pipe notation and melt
n=1000
set.seed(15)
a=rnorm(n,0)
b=0.5*a                      +rnorm(n,0,.05)
c1=2*a   - 3*b               +rnorm(n,0,0.03)
d=-30*a    +40*b     +50*c1  +rnorm(n,0.01)
e=5*a -             2 * c    + rnorm(n,0,.1)
tot=sum(a,b,c,d,e)

这是代表，现在是解决方案

ans=k %>% cor %>%  # correlation matrix
          melt %>%  # convert to a 3 column format (var1,var2,correlation)
          filter(X1=='tot' & X2!='tot') %>% # remove uneeded (tot,tot) row
          arrange(-value)  %>% # sort descending

输出如下——从这里开始，你应该可以毫不费力地选出前 3 名。

|X1  |X2 |     value|
|:---|:--|---------:|
|tot |c1 | 0.9865469|
|tot |d  | 0.9827573|
|tot |a  | 0.9574199|
|tot |b  | 0.9301658|
|tot |e  | 0.7065238|

【讨论】：