在python中编码逻辑回归时出现AttributeError答案

【问题标题】：AttributeError when coding logistic regression in python在python中编码逻辑回归时出现AttributeError
【发布时间】：2018-04-12 08:31:05
【问题描述】：

您好，我正在尝试使用 wisconsin 乳腺癌数据集来学习 Python 和机器学习，并且我正在尝试从头开始编写逻辑回归代码，但一直出现属性错误（'AttributeError：'DataFrame'对象没有属性'target'）在我的代码（Y = data.target）的第一部分，我无法在线找到解决方案。我知道它的其余部分可能也不适用于数据，我从我在网上看到的不同事物中获取它，但我目前只是尝试一次解决一个问题以了解逻辑回归的工作原理。我非常感谢任何形式的帮助。

这是我的代码：

data = pd.read_csv("C:\\Users\Hannah\Desktop\Research Project\data.csv",header=0)
    X = data.values[:, :30] 
    Y = data.target['diagnosis']

X = X[:500,:]
Y = Y[:500]

def sigmoid(z):
    s= 1/(1 + np.exp(-z))
    return s

def propagate(w, b, X, Y):

    m = X.shape[1]

    A = sigmoid(np.dot(w.T,X)+b)                                

    cost = -1/m * np.sum(Y * np.log(A) + (1-Y) * (np.log(1-A)))

    dz= (1/m)*(A - Y)
    dw = np.dot(X, dz.T)
    db = np.sum(dz)


    cost = np.squeeze(cost)
    grads = {"dw": dw,
             "db": db}

    return grads, cost


def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):

    costs = []

    for i in range(num_iterations):
        m = X.shape[1]
        grads,cost = propagate(w, b, X, Y)
        b = b - learning_rate*grads["db"]
        w = w - learning_rate*grads["dw"]
         if i % 100 == 0:
            costs.append(cost)
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    params = {"w": w,
              "b": b}
    return params, grads, costs



def predict(w, b, X):
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    A = sigmoid(np.dot(w.T,X)+ b)

    for i in range(A.shape[1]):
        x_exp = np.exp(A)
        x_sum = np.sum(x_exp,axis=1,keepdims=True)
        s = np.divide(x_exp,x_sum)

    Y_prediction = 1. * (A > 0.5)

    return Y_prediction



def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    w, b = initialize_with_zeros(X_train.shape[0])

    print("learning rate:",learning_rate)

    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)

    w = parameters["w"]
    b = parameters["b"]

    Y_prediction_train = predict(w,b,X_train)
    Y_prediction_test = predict(w,b,X_test)
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}

    return d

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

【问题讨论】：

标签： python machine-learning logistic-regression

【解决方案1】：

据我了解，您正在使用 pandas 读取 csv 并希望将列 diagnosis 中的值分配给变量 Y。在这种情况下，您不需要target。根据错误，DataFrame 根本没有该属性。只需 data['diagnosis'] 即可返回您需要的内容。

【讨论】：

使用 pd['diagnosis'] 我得到：TypeError: 'module' object is not subscriptable 但感谢您的帮助编辑：然后我尝试了 data['diagnosis] 现在我得到：NameError: name 'train_set_x' 没有定义，所以谢谢你，这让我前进了！
由于我们没有您的数据，也没有样本，我们只能做出假设。也许Y = data['diagnosis'] 会起作用
是的，正确的。应该是 data 而不是 pd。我确定了答案。