欠拟合、过拟合、良好_泛化答案

【问题标题】：Underfitting, Overfitting, Good_Generalization欠拟合、过拟合、良好_泛化
【发布时间】：2019-07-03 20:25:21
【问题描述】：

因此，作为我作业的一部分，我将应用线性回归和套索回归，这是问题 7。

根据第 6 题的分数，什么 gamma 值对应于模型欠拟合（并且测试集精度最差）？什么 gamma 值对应于过拟合的模型（并且具有最差的测试集精度）？什么伽玛选择是最好的选择具有良好泛化性能的模型数据集（训练集和测试集的准确率都很高）？

提示：尝试绘制问题 6 的分数以可视化伽玛和精度之间的关系。记得注释掉提交前导入 matplotlib 行。

这个函数应该返回一个具有以下顺序的度值的元组：(Underfitting, Overfitting, Good_Generalization) 请注意只有一个正确的解决方案。

我真的需要帮助，我真的想不出任何办法来解决最后一个问题。我应该使用什么代码来确定（欠拟合、过拟合、Good_Generalization）以及为什么？？？

谢谢，

数据集：http://archive.ics.uci.edu/ml/datasets/Mushroom?ref=datanews.io

这是我的问题 6 中的代码：

from sklearn.svm import SVC
from sklearn.model_selection import validation_curve

def answer_six():
    # SVC requires kernel='rbf', C=1, random_state=0 as instructed
    # C: Penalty parameter C of the error term
    # random_state: The seed of the pseudo random number generator 
    # used when shuffling the data for probability estimates
    # e radial basis function kernel, or RBF kernel, is a popular 
    # kernel function used in various kernelized learning algorithms, 
    # In particular, it is commonly used in support vector machine 
    # classification

    model = SVC(kernel='rbf', C=1, random_state=0)

    # Return numpy array numbers spaced evenly on a log scale (start, 
    # stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)

    gamma = np.logspace(-4,1,6)

    # Create a Validation Curve for model and subsets.
    # Create parameter name and range regarding gamma. Test Scoring 
    # requires accuracy. 
    # Validation curve requires X and y.

    train_scores, test_scores = validation_curve(model, X_subset, y_subset, param_name='gamma', param_range=gamma, scoring ='accuracy')

    # Determine mean for scores and tests along columns (axis=1)
    sc = (train_scores.mean(axis=1), test_scores.mean(axis=1))                                                 

    return sc

answer_six()

【问题讨论】：

标签： python python-3.x linear-regression lasso-regression

【解决方案1】：

好吧，让自己熟悉过拟合。你应该产生这样的东西：Article on this topic

左边是欠拟合，右边是过拟合……两个错误都低的地方，你有很好的泛化能力。

这些东西是 gamma（正则化器）的函数

【讨论】：

【解决方案2】：

过拟合 = 你的模型错误如果模型为假分散它使用工作内核将线性更改为多边形或支持向量... 欠拟合=您的数据集错误添加新数据理想相关...

由 nubers 检查测试和训练的分数/准确性，如果测试和训练高并且没有太大差异，你做得很好...... 如果测试低或训练低，那么您将面临过拟合/欠拟合

希望给你解释...

【讨论】：