【问题标题】:How to use SVM regression in Iris dataset with pandas如何在 Iris 数据集中使用 SVM 回归与 pandas
【发布时间】:2016-07-16 13:32:20
【问题描述】:

我正在使用 python 学习机器学习并使用 scikit learn 包。我已经为此目的使用了 R,并且发现它的数据框结构非常简单。 Scikit learn 使用 numpy 数组,我觉得这有点困难。在 python 中,我们有类似于 R 数据框的 pandas。此代码取自website

R

library(e1071)
library(MASS)
data(iris)

mysvm <- svm(Species ~ ., iris)
mysvm.pred <- predict(mysvm, iris)
table(mysvm.pred,iris$Species)
# mysvm.pred   setosa versicolor virginica
#   setosa     50      0          0
#   versicolor  0     48          2
#   virginica   0      2         48

Python

from sklearn import svm, datasets
from sklearn.metrics import confusion_matrix
iris = datasets.load_iris()

mysvm = svm.SVC().fit(iris.data, iris.target)
mysvm_pred = mysvm.predict(iris.data)
print confusion_matrix(mysvm_pred, iris.target)
# [[50  0  0]
#  [ 0 48  2]
#  [ 0  0 50]]

如何将上述 python 代码与 pandas 数据框一起使用并使用 SVM 回归

已编辑

这就是我所做的

from sklearn import svm, datasets
from sklearn.metrics import confusion_matrix
import pandas as pd
iris = datasets.load_iris()
X=pd.DataFrame(iris.data,columns=iris.feature_names)
y=pd.DataFrame(iris.target)
X.head()
y.head()
mysvm = svm.SVC().fit(X,y )
mysvm_pred = mysvm.predict(X)
print confusion_matrix(mysvm_pred, y)

但它给出了这个错误

>>> mysvm = svm.SVC().fit(X,y )
/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py:514: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y_ = column_or_1d(y, warn=True)
>>> mysvm_pred = mysvm.predict(X)
>>> print confusion_matrix(mysvm_pred, y)
/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2645: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
[[50  0  0]
 [ 0 48  0]
 [ 0  2 50]]

【问题讨论】:

    标签: python numpy pandas scikit-learn


    【解决方案1】:

    您可以像这样使用交叉验证:

    from sklearn import svm, datasets, cross_validation
    from sklearn import metrics 
    import pandas as pd
    
    clf = svm.SVC()
    cv_scores = cross_validation.cross_val_score(clf,iris.data,iris.target,cv=10)
    cv_preds = cross_validation.cross_val_predict(clf,iris.data,iris.target,cv =10)
    

    我不确定你想用 pandas 做什么,但如果你想将数据集加载到 pandas 数据框,你可以这样做:

    clf.fit(iris.data,iris.target)
    preds = clf.predict(iris.data)
    
    df = pd.DataFrame(iris.data)
    df['target'] = iris.target
    df['preds'] = preds
    
    print(df)
    print confusion_matrix(df['target'],df['preds'])
    

    计算准确性:

    accuracy = metrics.accuracy_score(iris.target, preds)
    print(accuracy)
    

    【讨论】:

      【解决方案2】:

      关于错误:

      >>> print confusion_matrix(mysvm_pred, y)
      /usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2645: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.
        VisibleDeprecationWarning)
      

      你可以参考这个链接:

      Numpy/scipy deprecation warning for "rank"

      【讨论】:

        猜你喜欢
        • 2017-09-20
        • 2015-01-18
        • 1970-01-01
        • 2019-02-14
        • 1970-01-01
        • 2017-04-28
        • 2018-12-29
        • 1970-01-01
        • 2020-01-16
        相关资源
        最近更新 更多