【问题标题】:How to deal with categorical variables in decision tree when we are passing categorical variables and calling predict function当我们传递分类变量并调用预测函数时如何处理决策树中的分类变量
【发布时间】:2021-08-20 18:04:23
【问题描述】:
defpredict_career(Problem_Solving_Skill,Analytical_Ability,Maths_Score_12th,Logical_Reasoning,Social_Responsibility,Maths_score_12th,NATA_Score,Interest_in_Biology_subject,Thinking_Reasoning_Skills,Negotiation_Skill,Number_Of_Design_Language,Expansive_Thinking,Assurity_About_Work,Nift_Exam_Score,Typography_Skill,Score_12th,Critical_Thinking_Skill,NEET_Score,Experimental_skill,Communication_Skill,PCM_percentage_12th,Coding_skills,Score_12th_Sci,Diagnostic_skill,Understanding_scientific_literature,score_12th_Science,able_to_do_Mental_calculation,interpret_prescriptions_accurately,Ready_to_take_care_of_animals,Do_you_like_animals,selfless_concern_for_the_wellbeing_of_others,Good_Verbal_Communication,Team_Player,Preference_Technical_Management,Continuous_Learning,Patience_person,memory_skills,Budget_for_Graduation,Interest_research_field,do_work_in_Team,Self_learning_capability):
            x=np.zeros(len(X.columns))
            x[0]=Problem_Solving_Skill
            x[1]=Analytical_Ability
            x[2]=Maths_Score_12th
            x[3]=Logical_Reasoning
            x[4]=Social_Responsibility
            x[5]=Maths_score_12th
            x[6]=NATA_Score
            x[7]=Interest_in_Biology_subject
            x[8]=Thinking_Reasoning_Skills
            x[9]=Negotiation_Skill
            x[10]=Number_Of_Design_Language 
            x[11]=Expansive_Thinking
            x[12]=Assurity_About_Work
            x[13]=Nift_Exam_Score
            x[14]=Typography_Skill
            x[15]=Score_12th
            x[16]=Critical_Thinking_Skill
            x[17]=NEET_Score
            x[18]=Experimental_skill
            x[19]=Communication_Skill
            x[20]=PCM_percentage_12th
            x[21]=Coding_skills
            x[22]=Score_12th_Sci
            x[23]=Diagnostic_skill
            x[24]=Understanding_scientific_literature
            x[25]=score_12th_Science
            x[26]=able_to_do_Mental_calculation
            x[27]=interpret_prescriptions_accurately
            x[28]=Ready_to_take_care_of_animals
            x[29]=Do_you_like_animals
            x[30]=selfless_concern_for_the_wellbeing_of_others
            x[31]=Good_Verbal_Communication
            x[32]=Team_Player
            x[33]=Preference_Technical_Management
            x[34]=Continuous_Learning
            x[35]=Patience_person
            x[36]=memory_skills
            x[37]=Budget_for_Graduation
            x[38]=Interest_research_field
            x[39]=do_work_in_Team
            x[40]=Self_learning_capability
        return dt.predict([x])[0]

predict_career(8,7,83,8,6,81,89,9,9,8,4,8,8,60,5,71,1,555,9,7,78,9,80,6,6,80,'Yes','Yes','Yes','Yes','Yes','Yes','Yes','Technical','No','No','No','Upto 1 Lakh','Yes','Yes','Yes')

    error:
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-23-9d14ae182888> in <module>
    ----> 1 predict_career(8,7,83,8,6,81,89,9,9,8,4,8,8,60,5,71,1,555,9,7,78,9,80,6,6,80,'Yes','Yes','Yes','Yes','Yes','Yes','Yes','Technical','No','No','No','Upto 1 Lakh','Yes','Yes','Yes')
    
    <ipython-input-22-c44fec6e90b8> in predict_career(Problem_Solving_Skill, Analytical_Ability, Maths_Score_12th, Logical_Reasoning, Social_Responsibility, Maths_score_12th, NATA_Score, Interest_in_Biology_subject, Thinking_Reasoning_Skills, Negotiation_Skill, Number_Of_Design_Language, Expansive_Thinking, Assurity_About_Work, Nift_Exam_Score, Typography_Skill, Score_12th, Critical_Thinking_Skill, NEET_Score, Experimental_skill, Communication_Skill, PCM_percentage_12th, Coding_skills, Score_12th_Sci, Diagnostic_skill, Understanding_scientific_literature, score_12th_Science, able_to_do_Mental_calculation, interpret_prescriptions_accurately, Ready_to_take_care_of_animals, Do_you_like_animals, selfless_concern_for_the_wellbeing_of_others, Good_Verbal_Communication, Team_Player, Preference_Technical_Management, Continuous_Learning, Patience_person, memory_skills, Budget_for_Graduation, Interest_research_field, do_work_in_Team, Self_learning_capability)
         46     x[24]=Understanding_scientific_literature
         47     x[25]=score_12th_Science
    ---> 48     x[26]=able_to_do_Mental_calculation
         49     x[27]=interpret_prescriptions_accurately
         50     x[28]=Ready_to_take_care_of_animals
    
    ValueError: could not convert string to float: 'Yes'

【问题讨论】:

    标签: python decision-tree


    【解决方案1】:

    二进制分类变量可以映射到0和1。对于超过2个类别的分类变量,可以使用一种热编码。

    【讨论】:

      【解决方案2】:

      您需要将字符串值编码为数字。

      如果使用 scikit-learn 库,则需要创建决策树类的实例并在进行预测之前训练模型。

      您可能想先对机器学习和统计进行一些研究。

      【讨论】:

        【解决方案3】:

        您可以使用 Scikit-Learn 类来处理分类数据,LabelEncoder 是专门为处理分类变量而设计的。

        from sklearn import preprocessing
        le = preprocessing.LabelEncoder()
        le.fit(['Yes','Yes','Yes','Yes','Yes','Yes','Yes','Technical','No','No','No','Upto 1 Lakh','Yes','Yes','Yes'])
        le.transform(['Yes','Yes','Yes','Yes','Yes','Yes','Yes','Technical','No','No','No','Upto 1 Lakh','Yes','Yes','Yes']) 
        

        这会自动将它们编码为您的 DT 算法的数字。

        如果你想从整数返回字符串。你可以通过简单地调用来做到这一点

        inverse_transform

        如下:

        list(le.inverse_transform([1,1,1,1,1,1,1,2,3,3,3,4,1,1,1]))
        

        您还可以使用一个热编码器onehote 将分类变量转换为其他机器学习分类器的数值。

        【讨论】:

          猜你喜欢
          • 2020-05-19
          • 2017-01-27
          • 2016-09-11
          • 2016-07-12
          • 2018-05-03
          • 2020-09-06
          • 2021-03-15
          • 2019-04-28
          • 2021-12-02
          相关资源
          最近更新 更多