【问题标题】:Python Correlation Matrix - Only Want columns that have absolute value more than .5Python 相关矩阵 - 只想要绝对值大于 0.5 的列
【发布时间】:2021-06-26 18:15:23
【问题描述】:

我有 41 个变量,其中大多数根本不相关。我只想包括几个列来说明更高相关或更高度负相关的列。尽我所能尝试,即使我查看了许多文章和问题,我似乎也无法让它发挥作用。谢谢。

df.columns

Index(['ResponseId', 'Consent', 'AgeQualifier', 'Team', 'TeamOther', “FanStrength”、“WinImportance”、“Emotion”、“Happiness”、“Satisfaction”、 '激情','ViewershipHomeGame','ViewershipRoadGame', 'ViewershipTVCable'、'ViewershipStreaming'、'ViewershipRestaurantBar'、 'NameChangeViewershipHomeGame', 'NameChangeViewershipRoadGame', 'NameChangeViewershipTVCable', 'NameChangeViewershipStreaming', 'NameChangeViewershipRestaurantBar', 'Purchased', 'Purchased_Jersey_1', 'Purchased_Clothing_2'、'Purchased_Memorabilia_3'、'Purchased_Office_4'、 'Purchased_Equipment_5', 'PurchaseIntentionNameChangeJersey', 'PurchaseIntentionNameChangeClothing', 'PurchaseIntentionNameChangeMemorabilia', 'PurchaseIntentionNameChangeHomeOffice', 'PurchaseIntentionNameChangeEquipment', 'Support_SeasonTickets', 'Support_Donations','Support_Volunteer', 'SupportNameChangeSeasonTickets', 'SupportNameChangeDonateMoney', “SupportNameChangeVolunteer”、“状态”、“性别”、“年龄”、“种族”、 'EthnicityOther','收入','绘图','电子邮件'], dtype='object')

correlation_matrix = df.corr().round(2)

无花果,斧头 = plt.subplots(figsize=(50,50)) sns.heatmap(data=correlation_matrix,cmap = 'rainbow' , annot=True, ax=ax)

想法?

【问题讨论】:

  • 你能发布一个你想要实现的例子吗?因为有可能两列高度相关,但有一列与许多其他列相关,因此相关矩阵中的值>0.5,以及许多其他场景
  • 这里是矩阵的一小部分。我想要 > .50 的列。 Consent AgeQualifier Team FanStrength WinImportance Emotion Happiness Satisfaction Passion ViewershipHomeGame Consent NaN NaN NaN NaN NaN NaN NaN NaN AgeQualifier NaN 1 NaN NaN NaN NaN NaN NaN NaN Team NaN NaN 1 0.02 0.02 0.02 -0.03 0.01 0 FanStrength NaN NaN 0.02 1 01 0.69 30. 0.32 WinImportance NaN NaN 0.02 0.69 1 0.44 0.44 0.34 0.37

标签: python correlation


【解决方案1】:

清洗后的矩阵将是

Consent AgeQualifier Team FanStrength WinImportance
Consent NaN NaN NaN NaN NaN
AgeQualifier NaN 1.0 NaN NaN NaN
Team NaN NaN 1.00 0.02 0.02
FanStrength NaN NaN 0.02 1.00 0.69
WinImportance NaN NaN 0.02 0.69 1.00

要解决这个问题,您需要选择任何非对角矩阵值且绝对值 >0.5

temp = df[(df>0.5)&(df!=1)].abs().max()
print(temp[~temp.isna()])

这将产生在相关矩阵中具有至少一个相关性的列名>0.5

这会产生

FanStrength      0.69
WinImportance    0.69
dtype: float64

【讨论】:

  • 祝福你!太好了。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-07-20
  • 2018-06-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-09-05
  • 2012-11-15
相关资源
最近更新 更多