【发布时间】:2019-06-11 12:13:22
【问题描述】:
我目前正在使用RandomForestRegressor 处理一个用例。要基于一列分别获取训练和测试数据,假设 Home,数据框被拆分为字典。几乎完成了建模,但坚持获取字典中每个键的特征重要性(键数 = 21)。请看下面的代码:
hp = pd.get_dummies(hp)
hp = {i: g for i, g in hp.set_index(["Home"]).groupby(level = [0])}
feature = {}; feature_train = {}; feature_test = {}
target = {}; target_train = {}; target_test = {}; target_pred = {}
importances = {}
for k, v in hp.items():
target[k] = np.array(v["HP"])
feature[k] = v.drop(["HP", "Corr"], axis = 1)
feature_list = list(feature[1].columns)
for k, v in zip(feature, target):
feature[k] = np.array(feature[v])
for k, v in zip(feature_train, target_train):
feature_train[k], feature_test[k], target_train[k], target_test[k] = train_test_split(
feature[v], target[v], test_size = 0.25, random_state = 42)
在Random Forest Feature Importance Chart using Python 的帮助下我尝试过的事情
for name, importance in zip(feature_list, list(rf.feature_importances_)):
print(name, "=", importance)
但这仅打印一本字典的重要性(我不知道是哪个)。我想要的是为字典“重要性”中的所有键打印它。提前致谢!
【问题讨论】:
-
为什么“根据一列分别获取训练和测试数据,比如说 Home”?为什么不直接使用sklearn's train_test_split?很难判断你的代码中发生了什么。
标签: python-3.x dictionary dataframe machine-learning random-forest