【问题标题】:Getting a value error for Linear Regression model获取线性回归模型的值错误
【发布时间】:2019-06-04 06:14:59
【问题描述】:

我正在努力完成 Kaggle 的泰坦尼克号比赛。在尝试将线性回归模型应用于我的代码并检查其准确性分数时,我在 Pycharm 上收到以下错误:

Traceback (most recent call last):
  File "C:/Users/security/Downloads/AP/Titanic-Kaggle/TItanic-Kaggle.py", line 27, in <module>
    accuracy = linReg.score(x_text, y_test)
  File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\base.py", line 330, in score
    return r2_score(y, self.predict(X), sample_weight=sample_weight,
  File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\linear_model\base.py", line 213, in predict
    return self._decision_function(X)
  File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\linear_model\base.py", line 196, in _decision_function
    X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
  File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\utils\validation.py", line 582, in check_array
    context))
ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required.

这是我目前的代码:

import pandas as pd
from sklearn.linear_model import LinearRegression

train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")

train['Sex'].replace(['female', 'male'], [0, 1])
train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3])

linReg = LinearRegression()

# Fill missing values in Age feature with each sex’s median value of Age
train['Age'].fillna(train.groupby('Sex')['Age'].transform("median"), inplace=True)

data = train[['Pclass', 'SibSp', 'Parch', 'Fare', 'Age']]

# Splitting the dataset that contains the missing values and no missing values as test and train respectively.
x_train = data[data['Age'].notnull()].drop(columns='Age')
y_train = data[data['Age'].notnull()]['Age']
x_text = data[data['Age'].isnull()].drop(columns='Age')
y_test = data[data['Age'].isnull()]['Age']

# Training the machine learning algorithm
linReg.fit(x_train, y_train)

# Checking the accuracy score of the model
accuracy = linReg.score(x_text, y_test)
print(accuracy*100, '%')

【问题讨论】:

  • data['Age'] 中没有NaN,使data[data['Age'].isnull()] 成为空数据集。错误是抱怨你的x_text 是空的:)
  • 开枪,你是对的。如何重新调整此代码?我正在检查基于x_testy_test 的准确性?
  • 确实,data[data['Age'].isnull()] 返回空数据框。
  • 您想通过获取data...isnull() 的条目来达到什么目的?
  • @AndrosAdrianopolos 这里的问题是:你想如何划分训练和测试数据集?我的建议是使用sklearn.model_selection.train_test_split 创建基线:)

标签: python pandas machine-learning linear-regression kaggle


【解决方案1】:

试试这个替换,它会起作用:

x_text = data[data['Age'] != None].drop(columns='Age')
y_test = data[data['Age'] != None]['Age']

这会有所帮助。

【讨论】:

    猜你喜欢
    • 2018-10-13
    • 2021-03-21
    • 1970-01-01
    • 2019-10-09
    • 1970-01-01
    • 1970-01-01
    • 2021-03-12
    • 2018-05-03
    • 1970-01-01
    相关资源
    最近更新 更多