【发布时间】:2021-06-26 18:15:23
【问题描述】:
我有 41 个变量,其中大多数根本不相关。我只想包括几个列来说明更高相关或更高度负相关的列。尽我所能尝试,即使我查看了许多文章和问题,我似乎也无法让它发挥作用。谢谢。
df.columns
Index(['ResponseId', 'Consent', 'AgeQualifier', 'Team', 'TeamOther', “FanStrength”、“WinImportance”、“Emotion”、“Happiness”、“Satisfaction”、 '激情','ViewershipHomeGame','ViewershipRoadGame', 'ViewershipTVCable'、'ViewershipStreaming'、'ViewershipRestaurantBar'、 'NameChangeViewershipHomeGame', 'NameChangeViewershipRoadGame', 'NameChangeViewershipTVCable', 'NameChangeViewershipStreaming', 'NameChangeViewershipRestaurantBar', 'Purchased', 'Purchased_Jersey_1', 'Purchased_Clothing_2'、'Purchased_Memorabilia_3'、'Purchased_Office_4'、 'Purchased_Equipment_5', 'PurchaseIntentionNameChangeJersey', 'PurchaseIntentionNameChangeClothing', 'PurchaseIntentionNameChangeMemorabilia', 'PurchaseIntentionNameChangeHomeOffice', 'PurchaseIntentionNameChangeEquipment', 'Support_SeasonTickets', 'Support_Donations','Support_Volunteer', 'SupportNameChangeSeasonTickets', 'SupportNameChangeDonateMoney', “SupportNameChangeVolunteer”、“状态”、“性别”、“年龄”、“种族”、 'EthnicityOther','收入','绘图','电子邮件'], dtype='object')
correlation_matrix = df.corr().round(2)
无花果,斧头 = plt.subplots(figsize=(50,50)) sns.heatmap(data=correlation_matrix,cmap = 'rainbow' , annot=True, ax=ax)
想法?
【问题讨论】:
-
你能发布一个你想要实现的例子吗?因为有可能两列高度相关,但有一列与许多其他列相关,因此相关矩阵中的值>0.5,以及许多其他场景
-
这里是矩阵的一小部分。我想要 > .50 的列。 Consent AgeQualifier Team FanStrength WinImportance Emotion Happiness Satisfaction Passion ViewershipHomeGame Consent NaN NaN NaN NaN NaN NaN NaN NaN AgeQualifier NaN 1 NaN NaN NaN NaN NaN NaN NaN Team NaN NaN 1 0.02 0.02 0.02 -0.03 0.01 0 FanStrength NaN NaN 0.02 1 01 0.69 30. 0.32 WinImportance NaN NaN 0.02 0.69 1 0.44 0.44 0.34 0.37
标签: python correlation