【发布时间】:2019-08-06 11:36:21
【问题描述】:
这是我用来执行二进制分类的线性分类器,这里是代码 sn-p:
my_optimizer = tf.train.AdagradOptimizer(learning_rate = learning_rate)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer,5.0)
# Create a linear classifier object
linear_classifier = tf.estimator.LinearClassifier(
feature_columns = feature_columns,
optimizer = my_optimizer
)
linear_classifier.train(input_fn = training_input_fn, steps = steps)
数据集不平衡,只有两个类是/否。 NO类示例数为36548,YES类示例数为4640。
如何对这些数据应用平衡?我一直在四处寻找,我可以找到与类权重等相关的东西,但我找不到如何创建类权重以及如何应用于张量流的训练方法。
这是我计算损失的方法:
training_probabilities = linear_classifier.predict(input_fn = training_predict_input_fn)
training_probabilities = np.array([item['probabilities'] for item in training_probabilities])
validation_probabilities = linear_classifier.predict(input_fn=validation_predict_input_fn)
validation_probabilities = np.array([item['probabilities'] for item in validation_probabilities])
training_log_loss = metrics.log_loss(training_targets, training_probabilities)
validation_log_loss = metrics.log_loss(validation_targets, validation_probabilities)
【问题讨论】:
标签: tensorflow machine-learning scikit-learn logistic-regression