【发布时间】:2021-10-15 01:34:49
【问题描述】:
我有一张表格,其中包含用于构建一些模型来预测用户是否会购买新保险的特征。在同一张表中,我有属于该模型预测的第 1 类(将购买)和第 0 类(不会购买)的概率。我不知道使用什么样的算法来构建这个模型。我只有它的预测概率。
问题:如何识别影响这些预测结果的特征? 我需要建立相关矩阵或进行任何测试吗?
表格示例:
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
| user_id | age | car_price | car_age | income | education | gender | crashes | probability | true_labes |
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
| 1 | 29 | 15600 | 3 | 20000 | 3 | 1 | 1 | 0.23 | 0 |
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
| 2 | 41 | 43000 | 1 | 65000 | 2 | 0 | 1 | 0.1 | 0 |
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
| 3 | 39 | 23500 | 5 | 43000 | 3 | 1 | 0 | 0.46 | 1 |
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
| 4 | 19 | 12200 | 3 | 13000 | 1 | 1 | 0 | 0.34 | 1 |
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
| 5 | 68 | 21900 | 2 | 31300 | 3 | 0 | 1 | 0.85 | 1 |
+---------+-----+-----------+---------+--------+-----------+--------+---------+-------------+------------+
【问题讨论】:
标签: python machine-learning dataset data-science feature-selection