【问题标题】:TypeError from SciKit-Learn's LabelEncoder来自 SciKit-Learn 的 LabelEncoder 的 TypeError
【发布时间】:2019-06-28 07:30:29
【问题描述】:

这是我的代码:

#Importing the dataset
dataset = pd.read_csv('insurance.csv')
X = dataset.iloc[:, :-2].values
X = pd.DataFrame(X)

#Encoding Categorical data
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:, 1:2] = labelencoder_X.fit_transform(X[:, 1:2])

样本数据集

    age sex bmi children    smoker  region  charges
19  female  27.9    0   yes southwest   16884.924
18  male    33.77   1   no  southeast   1725.5523
28  male    33  3   no  southeast   4449.462
33  male    22.705  0   no  northwest   21984.47061
32  male    28.88   0   no  northwest   3866.8552
31  female  25.74   0   no  southeast   3756.6216
46  female  33.44   1   no  southeast   8240.5896
37  female  27.74   3   no  northwest   7281.5056
37  male    29.83   2   no  northeast   6406.4107
60  female  25.84   0   no  northwest   28923.13692

运行 labelencoder 时出现以下错误

文件“E:\Anaconda2\lib\site-packages\pandas\core\generic.py”,行 1840,在 _get_item_cache res = cache.get(item) TypeError: unhashable 输入

什么可能导致这个错误?

【问题讨论】:

  • 尝试使用X.loc[:, 1:2]X.iloc[:, 1:2] 而不是X[:, 1:2]...
  • @MaxU,我试过了,还是一样的错误
  • 请在您的问题中提供一个小样本数据集,这将有助于重现此错误
  • @MaxU 在上面添加了示例数据,我正在使用 Spyder 和 python 2.7。感谢您的帮助
  • 请阅读how to make good reproducible pandas examples并相应地编辑您的帖子。为了帮助您,我们不会从图片中键入此数据集;)

标签: python pandas scikit-learn


【解决方案1】:

这是一个小演示:

In [36]: from sklearn.preprocessing import LabelEncoder

In [37]: le = LabelEncoder()

In [38]: X = df.apply(lambda c: c if np.issubdtype(df.dtypes.loc[c.name], np.number) 
                                  else le.fit_transform(c))

In [39]: X
Out[39]:
   age  sex     bmi  children  smoker  region      charges
0   19    0  27.900         0       1       3  16884.92400
1   18    1  33.770         1       0       2   1725.55230
2   28    1  33.000         3       0       2   4449.46200
3   33    1  22.705         0       0       1  21984.47061
4   32    1  28.880         0       0       1   3866.85520
5   31    0  25.740         0       0       2   3756.62160
6   46    0  33.440         1       0       2   8240.58960
7   37    0  27.740         3       0       1   7281.50560
8   37    1  29.830         2       0       0   6406.41070
9   60    0  25.840         0       0       1  28923.13692

来源 DF:

In [35]: df
Out[35]:
   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520
5   31  female  25.740         0     no  southeast   3756.62160
6   46  female  33.440         1     no  southeast   8240.58960
7   37  female  27.740         3     no  northwest   7281.50560
8   37    male  29.830         2     no  northeast   6406.41070
9   60  female  25.840         0     no  northwest  28923.13692

【讨论】:

    【解决方案2】:

    您的问题是您正在尝试标记编码切片。

    重现错误的步骤:

    df = pd.DataFrame({"score":[0,1],"gender":["male","female"]})
    enc = LabelEncoder()
    enc.fit_transform(df[:,1:2])
    ...
    TypeError: unhashable type: 'slice'
    

    尝试改为正确访问您的列,以便向LabelEncoder 提供类似数组的形状类型 (n_samples,):numpy 数组、列表、pandas 系列(请参阅docs)。

    证明:

    enc.fit_transform(df["gender"])
    array([1, 0])
    

    最后,如果你想改变你的df,可以使用以下几行:

    for col in df.select_dtypes(include="object").columns:
        df[col] = enc.fit_transform(df[col])
    

    【讨论】:

    • 感谢@Sergey,将数据帧转换为 numpy 数组对我来说效果很好。非常感谢
    猜你喜欢
    • 2016-11-24
    • 1970-01-01
    • 2019-07-03
    • 2017-01-22
    • 2018-05-18
    • 2017-10-13
    • 1970-01-01
    • 2018-08-02
    • 2020-11-13
    相关资源
    最近更新 更多