隔离森林 - TypeError：无效的类型提升答案

【问题标题】：Isolation Forest - TypeError: invalid type promotion隔离森林 - TypeError：无效的类型提升
【发布时间】：2021-04-11 11:45:07
【问题描述】：

我正在尝试对从事件日志转换的数据应用隔离林，但我收到“TypeError：无效类型提升”是因为日期时间吗？我不明白我做错了什么！

我的表的一部分（处理后）：

 +--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| org:resource | lifecycle:transition | concept:name |   time:timestamp   |   case:REG_DATE    | case:concept:name | case:AMOUNT_REQ |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
|           52 |                    0 |            9 | 2011 10-01 38:44.5 | 2011 10-01 38:44.5 |                 0 |           20000 |
|           52 |                    0 |            6 | 2011 10-01 38:44.9 | 2011 10-01 38:44.5 |                 2 |           20000 |
|           52 |                    0 |            7 | 2011 10-01 39:37.9 | 2011 10-01 38:44.5 |                 0 |           20000 |
|           52 |                    1 |           19 | 2011 10-01 39:38.9 | 2011 10-01 38:44.5 |                 1 |           20000 |
|           68 |                    2 |           19 | 2011 10-01 36:46.4 | 2011 10-01 38:44.5 |                 3 |           20000 |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+

打印时

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 262200 entries, 0 to 262199
Data columns (total 7 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   org:resource          262200 non-null  int64         
 1   lifecycle:transition  262200 non-null  int64         
 2   concept:name          262200 non-null  int64         
 3   time:timestamp        262200 non-null  datetime64[ns]
 4   case:REG_DATE         262200 non-null  datetime64[ns]
 5   case:concept:name     262200 non-null  int64         
 6   case:AMOUNT_REQ       262200 non-null  int32         
dtypes: datetime64[ns](2), int32(1), int64(4)
memory usage: 13.0 MB

我的代码是：

from sklearn.ensemble import IsolationForest

contamination = 0.05

model = IsolationForest(contamination=contamination, n_estimators=10000)
model.fit(df)

df["iforest"] = pd.Series(model.predict(df))
df["iforest"] = df["iforest"].map({1: 0, -1: 1})
df["score"] = model.decision_function(df)
df.sort_values("score")

但是我收到以下错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-5edb86351ac8> in <module>
      4 
      5 model = IsolationForest(contamination=contamination, n_estimators=10000)
----> 6 model.fit(df)
      7 
      8 df["iforest"] = pd.Series(model.predict(df))

~\.conda\envs\process_mining\lib\site-packages\sklearn\ensemble\_iforest.py in fit(self, X, y, sample_weight)
    261                 )
    262 
--> 263         X = check_array(X, accept_sparse=['csc'])
    264         if issparse(X):
    265             # Pre-sort indices to avoid that each individual tree of the

~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    531 
    532         if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
--> 533             dtype_orig = np.result_type(*dtypes_orig)
    534 
    535     if dtype_numeric:

<__array_function__ internals> in result_type(*args, **kwargs)

TypeError: invalid type promotion

【问题讨论】：

标签： python python-3.x machine-learning unsupervised-learning

【解决方案1】：

我在这个答案的帮助下找到了解决方案： Python - linear regression TypeError: invalid type promotion

从技术上讲，您需要将时间戳转换为序数，它会起作用，我使用以下方法进行了转换：

df['time:timestamp'] = df['time:timestamp'].map(dt.datetime.toordinal)
df['case:REG_DATE'] = df['case:REG_DATE'].map(dt.datetime.toordinal)

【讨论】：