如何找到给出最佳分数的 train_test_split 的最佳 random_state 值？ [复制]答案

【问题标题】：How to find the best random_state value of train_test_split that gives the best score? [duplicate]如何找到给出最佳分数的 train_test_split 的最佳 random_state 值？ [复制]
【发布时间】：2021-06-29 13:16:51
【问题描述】：

我对我的数据进行了训练测试拆分，并为其拟合了支持向量机。

xtrain, xtest, ytrain, ytest = train_test_split(df, target)

svc = svm.SVC(C=30, gamma='auto')
svc.fit(xtrain,ytrain)
svc.score(xtest,ytest)

我正在将 SVC 模型拟合到 iris 数据集，每次运行 train_test_split 都会得到不同的结果（这很明显）。

是否有 train test_test_split 的任何属性或函数或任何其他方式，以便在得到结果后（执行上述代码后）我可以找出我得到结果的random_state 的值是多少？

【问题讨论】：

你不能优化random_state；见Is random state a parameter to tune?
谢谢你，我得到了答案

标签： python machine-learning scikit-learn grid-search hyperparameters

【解决方案1】：

您可以运行自制网格搜索来找到random_state的最佳价值。

但是，您永远不应该针对随机性进行优化。通过这样做，您将找到最适合某些随机事件的模型，根据定义，这些随机事件与您的目标变量没有因果关系。

如果你真的想继续，那么你必须记录每个随机状态分裂产生的分数。

import numpy as np
import pandas as pd

# Array of random_state values from -100 to 100
random_states = np.arange(start=-100, stop=101)

# Initialize a list where we'll store the score of each random_state
scores = []

# Initialize search
for state in random_states:
    xtrain, ytrain, xtest, ytest = train_test_split(df, target, random_state=state)
    svc = svm.SVC(C=30, gamma='auto')
    svc.fit(xtrain, ytrain)
    scores.append(svc.score(xtest, ytest))

现在将这两个数组放入一个 pandas 数据框中。

results = pd.DataFrame({'random_state':random_states, 'score':scores})
results[results['score'] == results['score'].max()]

【讨论】：