【发布时间】:2018-06-26 13:43:57
【问题描述】:
我认为问题出在我的变量“info.venue”上。它实际上是字符串值,我使用 labelencoder 和 hotoneencoder 对其进行了编码。 但是当我尝试实施决策树时它给了我错误。当我尝试只使用 2 个变量时,它就像一个魅力。但是当我使用一个热编码器使用“info.venue”时,它给了我以下错误。
错误是“值错误:使用序列设置数组元素”
info.toss.decision info.toss.winner info.venue
field Australia Shere Bangla National Stadium
field Australia Adelaide Oval
field Australia Melbourne Cricket Ground
bat Australia Brabourne Stadium
bat Australia Melbourne Cricket Ground
bat Australia Sydney Cricket Ground
bat Australia Punjab Cricket Association
field India Kensington Oval, Bridgetown
field India Stadium Australia
field India Saurashtra Cricket Association Stadium
bat India Kingsmead
bat India Melbourne Cricket Ground
bat India R Premadasa Stadium
代码如下:
使用 LabelEncoder 和 OneHotEncoder 对数据进行编码
> from sklearn.preprocessing import LabelEncoder,OneHotEncoder
> labelencoder=LabelEncoder() onehotencoder=OneHotEncoder()
> df['info.toss.decision'] =
> labelencoder.fit_transform(df['info.toss.decision'])
> df['info.toss.winner']=
> labelencoder.fit_transform(df['info.toss.winner'])
> df['info.outcome.winner']=
> labelencoder.fit_transform(df['info.outcome.winner'])
> df['info.venue']=labelencoder.fit_transform(df['info.venue'])
> df['info.venue']=onehotencoder.fit_transform(df[['info.venue']])
从数据框中选择特定的列
X = df[['info.venue','info.toss.decision','info.toss.winner']]
Y = df[['info.outcome.winner']]
将数据集拆分为训练集和测试集
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25)
将决策树分类拟合到训练集
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'gini', random_state = 0)
classifier.fit(X_train, y_train)
“info.venue”栏如下;
info.venue
Kingsmead
Melbourne Cricket Ground
Brabourne Stadium
Kensington Oval, Bridgetown
Stadium Australia
Melbourne Cricket Ground
R Premadasa Stadium
Saurashtra Cricket Association Stadium
Shere Bangla National Stadium
Adelaide Oval
Melbourne Cricket Ground
Sydney Cricket Ground
Punjab Cricket Association IS Bindra Stadium, Mohali
【问题讨论】:
-
能否请您发布您的程序的实际输入和输出
-
请检查更新。
-
请关注变量 'info.venue' 因为我认为这就是我出错的地方。
-
@Dark 你可以编码并告诉我,即我应该在哪里进行更改?
标签: python pandas scikit-learn