似乎您的PMMLPipeline 缩进错误,很可能您不需要DataFrameMapper,因为它是(根据help page):
DataFrameMapper,用于将 pandas 数据框列映射到的类
不同的 sklearn 转换
你没有以不同的方式应用转换,所以我们不需要。
设置一个示例数据集,例如:
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn_pandas import DataFrameMapper
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
features = 'ABCDEFGHIJKLMNO'
X = pd.DataFrame(np.random.uniform(0,1,(50,15)),
columns=[i for i in features])
y = np.random.binomial(1,0.5,50)
X_train, X_test,y_train, y_test = train_test_split(X,y,test_size=0.3)
运行更正后的代码可以正常工作:
for i in range(0,len(features)):
pipeline = PMMLPipeline([
('pca', PCA(n_components=3)),
('classifier', DecisionTreeClassifier())
])
pipeline.fit(X_train.drop([features[i:i+1]],axis=1),y_train)
result = pipeline.predict(X_test.drop([features[i:i+1]],axis=1))
actual = y_test
print("Dropped feature: {}, Accuracy: {}".format(features[i:i+1],
accuracy_score(actual,result)))
Dropped feature: A, Accuracy: 0.9333333333333333
Dropped feature: B, Accuracy: 0.6
Dropped feature: C, Accuracy: 0.7333333333333333
Dropped feature: D, Accuracy: 0.6
Dropped feature: E, Accuracy: 0.6666666666666666
Dropped feature: F, Accuracy: 0.6666666666666666
Dropped feature: G, Accuracy: 0.6
Dropped feature: H, Accuracy: 0.8
Dropped feature: I, Accuracy: 0.6666666666666666
Dropped feature: J, Accuracy: 0.6666666666666666
Dropped feature: K, Accuracy: 0.7333333333333333
Dropped feature: L, Accuracy: 0.8
Dropped feature: M, Accuracy: 0.6
Dropped feature: N, Accuracy: 0.8
Dropped feature: O, Accuracy: 0.6666666666666666