【问题标题】:How to determine the most contributing features for a Linear SVM?如何确定对线性 SVM 贡献最大的特征?
【发布时间】:2020-09-23 08:38:06
【问题描述】:

我正在尝试在 sklearn 中为线性 svm 编写最重要的功能,并且我已经在网上找到了一些行,但是将其应用于我的代码时会返回错误。我该怎么办?

我的代码:

class SVMSentiment(Base):
    """Predict sentiment scores using a linear Support Vector Machine (SVM).
    Uses a sklearn pipeline.
    """
    def __init__(self, model_file: str=None) -> None:
        super().__init__()
        # pip install sklearn
        from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, TfidfVectorizer
        from sklearn.linear_model import SGDClassifier
        from sklearn.svm import SVC, LinearSVC
        from sklearn.pipeline import Pipeline


        self.pipeline = Pipeline(
            [
                #('vect', CountVectorizer()),
               # ('tfidf', TfidfTransformer()),
                ('tfidf', TfidfVectorizer()),
                ('clf', LinearSVC( loss='hinge'
   


                )),
            ]
        )



    def predict(self, train_file: str, test_file: str, lower_case: bool) -> pd.DataFrame:
        "Train model using sklearn pipeline"
        from sklearn.model_selection import GridSearchCV
        from sklearn.svm import SVC
        from sklearn.linear_model import SGDClassifier
        from sklearn import svm
        from sklearn import preprocessing
        from sklearn.preprocessing import LabelEncoder, OneHotEncoder
        from sklearn.feature_extraction.text import TfidfVectorizer
        from sklearn.svm import LinearSVC
        train_df = self.read_data(train_file, lower_case)



        learner = self.pipeline.fit(train_df['text'], train_df['truth'])
        # Fit the learner to the test data
        test_df = self.read_data(test_file, lower_case)

        test_df['pred'] = learner.predict(test_df['text'])
      
        return test_df

    def f_importances(coef, names):
    

        imp = coef
        imp, names = zip(*sorted(zip(imp, names)))
        plt.barh(range(len(names)), imp, align='center')
        plt.yticks(range(len(names)), names)
        plt.show()



    features_names = [train_df['text'], train_df['truth']]
    learner = self.pipeline.fit(train_df['text'], train_df['truth'])
    print(f_importances(self.pipeline.coef_, features_names))

错误提示:

NameError: name 'train_df' is not defined

但是我已经定义了train_df,所以我不明白我应该如何修复它。

【问题讨论】:

    标签: python python-3.x machine-learning scikit-learn


    【解决方案1】:

    您只在类中的函数 predict() 中定义了变量 train_df,这意味着它现在只存在于那里。由于您没有在任何地方返回它并且它不是全局变量,因此您不能只在该类之外访问它。

    要访问变量train_df,你需要在类之外定义这个变量,或者在这里调用它之前从类中返回它:

    ...
    features_names = [train_df['text'], train_df['truth']]
    ...
    

    【讨论】:

      猜你喜欢
      • 2020-03-14
      • 2017-05-26
      • 2013-12-14
      • 2016-05-16
      • 1970-01-01
      • 2017-05-28
      • 1970-01-01
      • 2016-06-29
      • 2018-07-26
      相关资源
      最近更新 更多