【发布时间】:2019-03-02 19:58:17
【问题描述】:
我正在尝试使用 python 清理我的数据集。我正在使用来自 scikit learn 的 imputer。我的数据集是一个包含大量“NULL”值的 csv 文件。在使用 panda 库的 from_csv 导入数据并将该数据框转换为矩阵时,我的数据如下所示:
[1 '2013-04-04 08:32:15' 12 187 nan nan 219 10404 4 4.0 1 2.2 0.0149 5.03
26 170.74 0 23246 1 0 4 0 1 1 nan nan 1 nan nan nan nan nan nan 0.0 0.0
nan nan nan nan 0.0 1.0 nan nan nan nan nan nan nan 0.0 0.0 nan 0 nan 0]
但是现在当我尝试使用 imputer 时,它给了我以下错误:
Traceback (most recent call last):
File "myRandomForesy.py", line 27, in <module>
temp[i] = imp.transform(temp[i])
File "/Users/Sherlock/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py", line 331, in transform
self.axis)
File "/Users/Sherlock/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py", line 252, in _dense_fit
mask = _get_mask(X, missing_values)
File "/Users/Sherlock/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py", line 30, in _get_mask
if value_to_mask == "NaN" or np.isnan(value_to_mask):
TypeError: Not implemented for this type
这是我的代码的 sn-p
imp = Imputer(missing_values="nan",strategy='mean',axis=1)
while i<len(temp):
imp=imp.fit(temp[i])
temp[i] = imp.transform(temp[i])
test_temp[i] = imp.transform(test_temp[i])
i+=1
【问题讨论】:
-
用
NaN替换nan -
我正在尝试了解您的数据。您的示例是单行吗?您是否尝试遍历每个元素? Imputer 作用于数值类型的列或行向量。
-
成功了。谢谢 Barmaley.exe
-
@Barmaley.exe - 你想把你的评论变成答案吗? - meta.stackoverflow.com/questions/251597/…
标签: python machine-learning scikit-learn nan