【问题标题】:Cross-Validated Metrics for Logistic RegressionLogistic 回归的交叉验证指标
【发布时间】:2020-05-13 21:23:47
【问题描述】:

为了尝试使用 Python 进行一些练习,我给自己分配了一些机器学习统计任务。目前,我正在为逻辑回归编码交叉验证而苦苦挣扎。

下面是一些生成我正在处理的合成数据集的代码:

#### Create synthetic data

import pandas as pd
from pandas import DataFrame
import numpy as np
import random
from scipy.stats import bernoulli 
from sklearn import preprocessing

customerID, sex, age, salary, happiness = [], [], [], [], []

random.seed(45)

for i in range(0,60):
    customerID.append(i+1)
    age.append(random.randint(18,65))
    salary.append(random.randint(1200,3600))
    if i%2==0:
     sex.append('M')
    else:
     sex.append('F')
    if salary[i]>=120*age[i] and sex[i]=='M':
       p = 0.75
    elif salary[i]>=120*age[i] and sex[i]=='F':
       p = 0.7
    elif salary[i]<=70*age[i] and sex[i]=='M':
       p = 0.4
    elif salary[i]<=70*age[i] and sex[i]=='F':
       p = 0.5
    else:
       p = 0.58
    happiness.append(-1+bernoulli.rvs(p,1))

### Create dataFrame now

df = pd.DataFrame(list(zip(customerID,sex,age,salary,happiness)), 
               columns =['customerID','sex','age','salary','happiness']) 
le = preprocessing.LabelEncoder()
for column_name in df.columns:
        if df[column_name].dtype == object:
            df[column_name] = le.fit_transform(df[column_name])
        else:
            pass

df.head()
# Divide the data into dependent variable and independent variables
X = pd.DataFrame(df.iloc[:,[0,1,2,3]])
y = pd.DataFrame(df.iloc[:,[4]])

这是产生“IndexError: too many indices for array”的代码:

from sklearn.linear_model import LogisticRegression
from sklearn import metrics, cross_validation

from sklearn import metrics, cross_validation
logreg=LogisticRegression()
predicted = cross_validation.cross_val_predict(logreg, X, y, cv=10)
print(metrics.accuracy_score(y, predicted))
print(metrics.classification_report(y, predicted))

你会如何解决这个问题?

【问题讨论】:

  • X.shapey.shape 是什么,你能添加 x.head()y.head() 看起来像这些以某种方式造成错误

标签: python pandas scikit-learn logistic-regression


【解决方案1】:

我刚刚意识到替换

predicted = cross_validation.cross_val_predict(logreg, X, y, cv=10)

predicted = cross_validation.cross_val_predict(logreg, X, y.values.ravel(), cv=10)

工作正常。

【讨论】:

  • 谢谢,伙计!没有什么比烦人的错误更能刺激记忆了。
猜你喜欢
  • 2016-06-27
  • 2020-12-01
  • 1970-01-01
  • 2016-09-30
  • 1970-01-01
  • 2017-11-12
  • 1970-01-01
  • 2017-07-07
  • 1970-01-01
相关资源
最近更新 更多