【发布时间】:2019-06-18 11:27:00
【问题描述】:
我在记录链接问题中得到的结果是将更多的值分类为误报而不是误报。有没有办法平衡这些?
# Initialize the classifier
logreg = rl.LogisticRegressionClassifier()
# Train the classifier
logreg.fit(golden_pairs, golden_matches_index)
print ("Intercept: ", logreg.intercept)
print ("Coefficients: ", logreg.coefficients)
# Predict the match status for all record pairs
result_logreg = logreg.predict(test_pairs[columns_to_keep])
len(result_logreg)
#true_links = features_complete_new_index[features_complete_new_index['evaluation'] == True].index
true_links = test_pairs[test_pairs['evaluation'] == True].index
print("confusion matrix of Logistic Regression ",rl.confusion_matrix(true_links, result_logreg, len(test_pairs)), "False positives ", rl.false_positives(true_links, result_logreg), "False negatives ", rl.false_negatives(true_links, result_logreg))
The output is Intercept: -6.974042394356818
Coefficients: [-0.07818545 7.83113994 0.96939354 -6.97404239 1.65737031 0.694744 ]
confusion matrix of Logistic Regression [[ 5915 2576]
[ 1075 7167134]] False positives 1075 False negatives 2576
F-Score of Log Regr 0.7641625218009173
【问题讨论】:
-
很遗憾,我们不能简单地告诉您您的实施有什么问题。可能有无数种情况:您的数据有多干净和清理,您是否进行过任何特征工程,您是否使用交叉折叠验证,您是否尝试过 SFS 或 BFE 等。请尝试重新处理您的问题更直接一点,以便我们为您提供帮助。
标签: python pandas logistic-regression