【问题标题】:Code to run recursive feature extraction multiple iteration each time deleating the lowest ranked feature每次删除排名最低的特征时运行递归特征提取多次迭代的代码
【发布时间】:2019-02-09 20:21:01
【问题描述】:

好的,所以我想在我的数据集上运行具有 X 个特征的递归特征提取,并在每次迭代中删除排名最低的特征,而不是重新运行 RFE,直到我只剩下 5 个特征。但是,我不知道如何编码。

运行 RFE 的第一部分很好,但我不想坐下来手动重新运行 RFE 并一次删除一个功能,这样做需要很长时间。有人可以帮我编码吗?

import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=2000, n_features=85, random_state=42)


# split data into train and test split
from sklearn.model_selection import train_test_split
# if we need train test split
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3,random_state=42)

estimator = RandomForestClassifier(n_estimators=500, min_samples_leaf=5,
                             min_samples_split=8, max_features='auto',
                             max_depth=90, bootstrap=True)
selector = RFE(estimator, 83, step=1)
selector = selector.fit(X_train, y_train)


# predict and get rankings and optimal number of features
selector.fit(X_test, y_test)
selector.predict(X_test)
ranking = selector.ranking_
y_hats = selector.predict(X_test)
predictions = [round(value) for value in y_hats]
accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy: %.2f%%" % (accuracy*100.0))


# index rankings
header = X_test.columns
frame = pd.DataFrame(ranking, index=header)
frame = frame.rename(columns = {frame.columns[0]: 'rankings'}, inplace = False)
frame = frame.sort_values(by = 'rankings', ascending=True)

# save table
from pandas.tools.plotting import table
ax = plt.subplot(111, frame_on=True) # no visible frame
ax.xaxis.set_visible(False)  # hide the x axis
ax.yaxis.set_visible(False)  # hide the y axis

table(ax, frame)  # where df is your data frame

【问题讨论】:

    标签: python machine-learning classification rfe


    【解决方案1】:
    features = [1, 28, 9, 17, 0, 16, 9]
    
    def recursive(features, max_features=5, max_iterations=100):
        feature_length = len(features)
        long_enough = (feature_length <= max_features)
        tried_too_many_times = (max_iterations < 1)
        if long_enough or tried_too_many_times:
            raise GeneratorExit(
                {
                    'features': features,
                    'iterations': max_iterations,
                }
            )
        _features = sorted(features, reverse=True)
        lowest = _features.pop()
        print('Removing', lowest)
        max_iterations -= 1
        yield from recursive(_features, max_features=max_features, max_iterations=max_iterations)
    
    
    s = recursive(features)
    
    try:
        list(s)
    except GeneratorExit as e:
        context = e.args[0]
    
    # Removing 0
    # Removing 1
    
    context
    # {'features': [28, 17, 16, 9, 9], 'iterations': 98}
    

    【讨论】:

    • 好的,首先谢谢你,我不知道这是否有效,但明天必须坐下来尝试让它工作并理解它。其次,我如何学会自己想出这些东西?我觉得这些愚蠢的在线 python 课程并没有真正教给我这种东西......
    • 您能否解释一下并扩展您的答案,因为我不明白如何在我的代码中实现这一点。
    • 当你解决现实世界的问题时,你会学会想出比这些更好的东西。在线 python 课程因为没有教你这种东西而感到愚蠢。当您处理您经常在网上找不到解决方案的实际问题时,您会想出自己的解决方案。然后你用它作为其他问题的基础。例如,我必须为一个由用户以文本形式输入的数百个方程的系统进行编写。为此,我必须递归地遍历方程并更新系统。我分享的代码就是受此启发的。这很丑陋,但它有效。
    【解决方案2】:

    自己解决了这个问题:

    iters = list(range(1, 235))
    iters = iters[::-1]
    
    for i in iters:
    
    
        estimator = RandomForestClassifier(n_estimators=500, min_samples_leaf=5,
                                 min_samples_split=8, max_features='auto',
                                 max_depth=90, bootstrap=True)
        selector = RFE(estimator, i, step=1)
        selector = selector.fit(X_train, y_train)
    

    【讨论】:

      猜你喜欢
      • 2021-04-09
      • 1970-01-01
      • 2018-05-23
      • 2017-09-12
      • 2015-07-11
      • 2017-05-10
      • 2023-01-30
      • 2019-07-15
      • 2015-04-23
      相关资源
      最近更新 更多