【问题标题】:ValueError: could not convert string to float: sklearnValueError:无法将字符串转换为浮点数:sklearn
【发布时间】:2020-05-25 02:38:48
【问题描述】:

最近我在 python 中处理数据集时遇到了意外错误。错误是:ValueError: could not convert string to float。实际上在数据集中还有文本数据,我用 LabelEncoder 将其转换为整数。但是当我进入适合模型的训练部分时,我遇到了这个没有意义的错误。

代码:

import sklearn
from sklearn import model_selection
from sklearn import linear_model
from sklearn import preprocessing
import pandas as pd
import pickle
import numpy as np
data = pd.read_csv("house_train.csv")
data = data.fillna(value=0)
dataX_train = data.drop(["SalePrice"], axis = 1)
dataX_test = data.SalePrice


le = preprocessing.LabelEncoder()

dataX_train.MSZoning = le.fit_transform(list(data["MSZoning"]))
dataX_train.Street = le.fit_transform(list(data["Street"]))
dataX_train.Alley = le.fit_transform(list(data["Alley"]))
dataX_train.LotShape = le.fit_transform(list(data["LotShape"]))
dataX_train.LandContour = le.fit_transform(list(data["LandContour"]))
dataX_train.Utilities = le.fit_transform(list(data["Utilities"]))
dataX_train.LotConfig = le.fit_transform(list(data["LotConfig"]))
dataX_train.LandSlope = le.fit_transform(list(data["LandSlope"]))
dataX_train.Neighborhood = le.fit_transform(list(data["Neighborhood"]))
dataX_train.Condition1 = le.fit_transform(list(data["Condition1"]))
dataX_train.Condition2 = le.fit_transform(list(data["Condition2"]))
dataX_train.BldgType = le.fit_transform(list(data["BldgType"]))
dataX_train.HouseStyle = le.fit_transform(list(data["HouseStyle"]))
dataX_train.RoofStyle = le.fit_transform(list(data["RoofStyle"]))
dataX_train.RoofMatl = le.fit_transform(list(data["RoofMatl"]))
dataX_train.Exterior1st = le.fit_transform(list(data["Exterior1st"]))
dataX_train.Exterior2nd = le.fit_transform(list(data["Exterior2nd"]))
dataX_train.MasVnrType = le.fit_transform(list(data["MasVnrType"]))
dataX_train.ExterQual = le.fit_transform(list(data["ExterQual"]))
dataX_train.ExterCond = le.fit_transform(list(data["ExterCond"]))
dataX_train.Foundation = le.fit_transform(list(data["Foundation"]))
dataX_train.BsmtQual = le.fit_transform(list(data["BsmtQual"]))
dataX_train.BsmtExposure = le.fit_transform(list(data["BsmtExposure"]))
dataX_train.BsmtFinType1 = le.fit_transform(list(data["BsmtFinType1"]))
dataX_train.BsmtFinType2 = le.fit_transform(list(data["BsmtFinType2"]))
dataX_train.Heating = le.fit_transform(list(data["Heating"]))
dataX_train.HeatingQC = le.fit_transform(list(data["HeatingQC"]))
dataX_train.CentralAir = le.fit_transform(list(data["CentralAir"]))
dataX_train.Electrical = le.fit_transform(list(data["Electrical"]))
dataX_train.KitchenQual = le.fit_transform(list(data["KitchenQual"]))
dataX_train.Functional = le.fit_transform(list(data["Functional"]))
dataX_train.FireplaceQu = le.fit_transform(list(data["FireplaceQu"]))
dataX_train.GarageType = le.fit_transform(list(data["GarageType"]))
dataX_train.GarageFinish = le.fit_transform(list(data["GarageFinish"]))
dataX_train.GarageQual = le.fit_transform(list(data["GarageQual"]))
dataX_train.GarageCond = le.fit_transform(list(data["GarageCond"]))
dataX_train.PavedDrive = le.fit_transform(list(data["PavedDrive"]))
dataX_train.PoolQC = le.fit_transform(list(data["PoolQC"]))
dataX_train.Fence = le.fit_transform(list(data["Fence"]))
dataX_train.MiscFeature = le.fit_transform(list(data["MiscFeature"]))
dataX_train.SaleType = le.fit_transform(list(data["SaleType"]))
dataX_train.SaleCondition = le.fit_transform(list(data["SaleCondition"]))


best = 0

x_train, x_test, y_train, y_test = model_selection.train_test_split(dataX_train, dataX_test, 
test_size = 0.2)
clf = linear_model.LinearRegression()
clf.fit(x_train, y_train)
acc = clf.score(x_test, y_test)
if acc > best:
   best = acc
   with open("housingmodel.pickle", "wb") as f:
      pickle.dump(clf , f)
print(acc)

【问题讨论】:

  • 错误发生在哪一行?
  • 当我要通过model.fit方法训练模型时出现错误

标签: python pandas scikit-learn


【解决方案1】:

首先检查您是否在dataX_train 中编码了所有功能,我认为您在那里遗漏了一些东西。

尝试:dataX_train.dtypes 并检查是否有任何非数字值,然后在非数字列上使用 to_numeric。例如

dataX_train['NonNumericCol'] = dataX_train['NonNumericCol'].apply(pd.to_numeric)

【讨论】:

    猜你喜欢
    • 2018-09-20
    • 2019-02-06
    • 2019-02-08
    • 2020-08-26
    • 2020-05-05
    • 2020-01-22
    • 2019-04-27
    • 2018-04-03
    • 1970-01-01
    相关资源
    最近更新 更多