【问题标题】:How to select features for logistic regression from scratch in python?如何在python中从头开始为逻辑回归选择特征?
【发布时间】:2018-07-02 08:59:37
【问题描述】:

我一直在尝试从头开始编写逻辑回归,我已经完成了,但我正在使用我的乳腺癌数据集中的所有特征,并且我想选择一些特征(特别是我发现 scikit-当我与它进行比较并在数据上使用它的特征选择时,learn 已经为自己选择了)。但是,我不确定在我的代码中在哪里执行此操作,我目前拥有的是:

X_train = ['texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean', 'radius_se', 'symmetry_se'
'fractal_dimension_se', 'radius_worst', 'texture_worst', 'area_worst', 'smoothness_worst', 'compactness_worst']
X_test = ['texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean', 'radius_se', 'symmetry_se'
'fractal_dimension_se', 'radius_worst', 'texture_worst', 'area_worst', 'smoothness_worst', 'compactness_worst']

def Sigmoid(z):
    return 1/(1 + np.exp(-z))

def Hypothesis(theta, X):   
    return Sigmoid(X @ theta)

def Cost_Function(X,Y,theta,m):
    hi = Hypothesis(theta, X)
    _y = Y.reshape(-1, 1)
    J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
    return J

def Cost_Function_Derivative(X,Y,theta,m,alpha):
    hi = Hypothesis(theta,X)
    _y = Y.reshape(-1, 1)
    J = alpha/float(m) * X.T @ (hi - _y)
    return J

def Gradient_Descent(X,Y,theta,m,alpha):
    new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
    return new_theta

def Accuracy(theta):
    correct = 0
    length = len(X_test)
    prediction = (Hypothesis(theta, X_test) > 0.5) 
    _y = Y_test.reshape(-1, 1)
    correct = prediction == _y
    my_accuracy = (np.sum(correct) / length)*100
    print ('LR Accuracy: ', my_accuracy, "%")

def Logistic_Regression(X,Y,alpha,theta,num_iters):
    m = len(Y)
    for x in range(num_iters):
        new_theta = Gradient_Descent(X,Y,theta,m,alpha)
        theta = new_theta
        if x % 100 == 0:
            print #('theta: ', theta)    
            print #('cost: ', Cost_Function(X,Y,theta,m))
    Accuracy(theta)
ep = .012 
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)

我假设如果我手动更改 X_train 和 X_test 包含的哪些功能会起作用,但我收到一个错误:AttributeError: 'list' object has no attribute 'shape' at the initial_theta 行。任何正确方向的帮助将不胜感激。

【问题讨论】:

    标签: python machine-learning logistic-regression feature-selection


    【解决方案1】:

    问题在于 X_train 是一个列表和形状仅适用于数据帧。

    您可以: - 保留列表但使用 len(X_train) 代替,或者 - 将 X_train 类型更改为 pandas 数据框,pandas.DataFrame(X_train).shape[0]

    【讨论】:

    • 谢谢你,将 X_train 更改为 pandas 数据框,然后在最后的 Logistic_regression 行给我一个新错误:TypeError: Cannot cast array data from dtype('float64') to dtype('
    • 不是 100% 确定这一点,但根据 stackoverflow.com/questions/34173101/… 看来,您需要更改输入的类型。类型(Y_train)是浮动的吗?
    猜你喜欢
    • 2021-06-10
    • 2017-12-07
    • 2018-02-27
    • 2016-01-25
    • 1970-01-01
    • 1970-01-01
    • 2014-01-19
    • 2019-10-03
    • 2016-05-15
    相关资源
    最近更新 更多