【问题标题】:python SVM: execution errorpython SVM:执行错误
【发布时间】:2018-08-22 18:52:04
【问题描述】:

我正在使用 Python 3.6 和 Windows,并且正在学习 Python SVM 预测。我得到下面的代码。但是,经过彻底运行和检查后,我仍然收到如下错误:

  File "C:\Users\Lawrence\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 614, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))

ValueError: bad input shape ()

原python代码如下:

import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR

input_file = r"C:\Users\Lawrence\Desktop\traffic_data.txt"

# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
    for line in f.readlines():
        data = line[:-1].split(',')
        X.append(data)

X = np.array(X)

# Convert string data to numerical data
label_encoder = [] 
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
    if item.isdigit():
        X_encoded[:, i] = X[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)

# Build SVR
params = {'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2} 
regressor = SVR(**params)
regressor.fit(X, y)

# Cross validation
import sklearn.metrics as sm

y_pred = regressor.predict(X)
print ("Mean absolute error =", round(sm.mean_absolute_error(y, y_pred), 2))

# Testing encoding on single data instance
input_data = ['Tuesday', '13:35', 'San Francisco', 'yes']
input_data_encoded = [-1] * len(input_data)
count = 0
for i,item in enumerate(input_data):
    if item.isdigit():
        input_data_encoded[i] = int(input_data[i])
    else:
        input_data_encoded[i] = int(label_encoder[count].transform(input_data[i]))
        count = count + 1 

input_data_encoded = np.array(input_data_encoded)

# Predict and print output for a particular datapoint
print ("Predicted traffic:", int(regressor.predict(input_data_encoded)[0]))

输入文件数据(traffic_data.txt)如下:

Tuesday,00:00,San Francisco,no,3
Tuesday,00:05,San Francisco,no,8
Tuesday,00:10,San Francisco,no,10
Tuesday,00:15,San Francisco,no,6
Tuesday,00:20,San Francisco,no,1
Tuesday,00:25,San Francisco,no,4
Tuesday,00:30,San Francisco,no,9
Tuesday,00:35,San Francisco,no,4
Tuesday,00:40,San Francisco,no,6
Tuesday,00:45,San Francisco,no,13
Tuesday,00:50,San Francisco,no,5
Tuesday,00:55,San Francisco,no,5
Tuesday,01:00,San Francisco,no,4
Tuesday,01:05,San Francisco,no,7
Tuesday,01:10,San Francisco,no,5
Tuesday,01:15,San Francisco,no,4
Tuesday,01:20,San Francisco,no,5
Tuesday,01:25,San Francisco,no,1
Tuesday,01:30,San Francisco,no,8
Tuesday,01:35,San Francisco,no,2
Tuesday,01:40,San Francisco,no,3
Tuesday,01:45,San Francisco,no,0
Tuesday,01:50,San Francisco,no,2
Tuesday,01:55,San Francisco,no,1
Tuesday,02:00,San Francisco,no,1
Tuesday,02:05,San Francisco,no,0
Tuesday,02:10,San Francisco,no,2
Tuesday,02:15,San Francisco,no,1
Tuesday,02:20,San Francisco,no,2
Tuesday,02:25,San Francisco,no,4
Tuesday,02:30,San Francisco,no,0
Tuesday,02:35,San Francisco,no,0
Tuesday,02:40,San Francisco,no,0
Tuesday,02:45,San Francisco,no,3
Tuesday,02:50,San Francisco,no,1
Tuesday,02:55,San Francisco,no,0
Tuesday,03:00,San Francisco,no,3
Tuesday,03:05,San Francisco,no,0
Tuesday,03:10,San Francisco,no,3
Tuesday,03:15,San Francisco,no,0
Tuesday,03:20,San Francisco,no,0
Tuesday,03:25,San Francisco,no,2
Tuesday,03:30,San Francisco,no,1
Tuesday,03:35,San Francisco,no,1
Tuesday,03:40,San Francisco,no,1
Tuesday,03:45,San Francisco,no,1
Tuesday,03:50,San Francisco,no,0
Tuesday,03:55,San Francisco,no,3
Tuesday,04:00,San Francisco,no,1
Tuesday,04:05,San Francisco,no,2
Tuesday,04:10,San Francisco,no,1
Tuesday,04:15,San Francisco,no,1
Tuesday,04:20,San Francisco,no,2
Tuesday,04:25,San Francisco,no,1
Tuesday,04:30,San Francisco,no,2
Tuesday,04:35,San Francisco,no,2
Tuesday,04:40,San Francisco,no,5
Tuesday,04:45,San Francisco,no,2
Tuesday,04:50,San Francisco,no,5
Tuesday,04:55,San Francisco,no,4
Tuesday,05:00,San Francisco,no,6
Tuesday,05:05,San Francisco,no,5
Tuesday,05:10,San Francisco,no,5
Tuesday,05:15,San Francisco,no,7
Tuesday,05:20,San Francisco,no,4
Tuesday,05:25,San Francisco,no,5
Tuesday,05:30,San Francisco,no,12
Tuesday,05:35,San Francisco,no,12
Tuesday,05:40,San Francisco,no,11
Tuesday,05:45,San Francisco,no,12
Tuesday,05:50,San Francisco,no,11
Tuesday,05:55,San Francisco,no,13
Tuesday,06:00,San Francisco,no,19
Tuesday,06:05,San Francisco,no,16
Tuesday,06:10,San Francisco,no,19
Tuesday,06:15,San Francisco,no,15
Tuesday,06:20,San Francisco,no,8
Tuesday,06:25,San Francisco,no,14
Tuesday,06:30,San Francisco,no,30
Tuesday,06:35,San Francisco,no,35
Tuesday,06:40,San Francisco,no,20
Tuesday,06:45,San Francisco,no,27
Tuesday,06:50,San Francisco,no,33
Tuesday,06:55,San Francisco,no,24
Tuesday,07:00,San Francisco,no,39
Tuesday,07:05,San Francisco,no,42
Tuesday,07:10,San Francisco,no,36
Tuesday,07:15,San Francisco,no,50
Tuesday,07:20,San Francisco,no,42
Tuesday,07:25,San Francisco,no,38
Tuesday,07:30,San Francisco,no,38
Tuesday,07:35,San Francisco,no,40
Tuesday,07:40,San Francisco,no,49
Tuesday,07:45,San Francisco,no,39
Tuesday,07:50,San Francisco,no,43
Tuesday,07:55,San Francisco,no,44
Tuesday,08:00,San Francisco,no,40
Tuesday,08:05,San Francisco,no,22
Tuesday,08:10,San Francisco,no,25
Tuesday,08:15,San Francisco,no,42
Tuesday,08:20,San Francisco,no,37
Tuesday,08:25,San Francisco,no,36
Tuesday,08:30,San Francisco,no,34
Tuesday,08:35,San Francisco,no,41
Tuesday,08:40,San Francisco,no,37
Tuesday,08:45,San Francisco,no,36
Tuesday,08:50,San Francisco,no,40
Tuesday,08:55,San Francisco,no,37
Tuesday,09:00,San Francisco,no,41
Tuesday,09:05,San Francisco,no,38
Tuesday,09:10,San Francisco,no,36
Tuesday,09:15,San Francisco,no,44
Tuesday,09:20,San Francisco,no,33
Tuesday,09:25,San Francisco,no,30
Tuesday,09:30,San Francisco,no,41
Tuesday,09:35,San Francisco,no,36
Tuesday,09:40,San Francisco,no,35
Tuesday,09:45,San Francisco,no,36
Tuesday,09:50,San Francisco,no,35
Tuesday,09:55,San Francisco,no,42
Tuesday,10:00,San Francisco,no,31
Tuesday,10:05,San Francisco,no,25
Tuesday,10:10,San Francisco,no,28
Tuesday,10:15,San Francisco,no,27
Tuesday,10:20,San Francisco,no,23
Tuesday,10:25,San Francisco,no,25

希望有人能解决这个问题。

【问题讨论】:

    标签: python-3.x machine-learning classification svm


    【解决方案1】:

    问题是由我的以下事实引起的:

    当您 fit_transform label_encoder 时,您使用 X[:, i] 作为输入,其大小为 (126,)

    另一方面,你调用:

    label_encoder[count].transform(input_data[i])
    

    现在,您输入形状为(1, )input_data[i]


    编辑 1:

    import numpy as np
    from sklearn import preprocessing
    from sklearn.svm import SVR
    from sklearn.model_selection import KFold
    
    input_file = r"traffic_data.txt"
    
    # Reading the data
    X = []
    count = 0
    with open(input_file, 'r') as f:
        for line in f.readlines():
            data = line[:-1].split(',')
            X.append(data)
    X = np.array(X)
    
    # Convert string data to numerical data
    label_encoder = [] 
    X_encoded = np.empty(X.shape)
    for i,item in enumerate(X[0]):
        if item.isdigit():
            X_encoded[:, i] = X[:, i]
        else:
            label_encoder.append(preprocessing.LabelEncoder())
            X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])
    
    X = X_encoded[:, :-1].astype(int)
    y = X_encoded[:, -1].astype(int)
    
    # Build SVR
    params = {'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2} 
    regressor = SVR(**params)
    
    # OPTION 1: Cross validation with 2 folds AND LOOP
    kf = KFold(n_splits = 2)
    
    # In this loop, the model is fitted using oNLY the training samples and then the model predicts using ONLY the test samples. 
    
    # The predicted values are stored in the predicted_values 
    # The actual (true) values are stored in the true_values
    predicted_values = []
    true_values=[]
    for train_index, test_index in kf.split(X):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        regressor.fit(X_train, y_train)
        y_pred = regressor.predict(X_test)
        predicted_values.append(y_pred)
        true_values.append(y_test)
    
    # Now, you can use the predicted_values and true_values to calculate things like accuracy, MSE, MAE e.t.c
    
    
    # OPTION 2: use cross_val_predict function directly
    from sklearn.model_selection import cross_val_predict
    
    # The cross validated predicted values are stored in the y_pred 
    y_pred = cross_val_predict(regressor, X, y, cv = kf)
    
    
    ## OPTION 3: use train_test_split function
    from sklearn.model_selection import train_test_split
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)
    

    【讨论】:

    • 你好 Sera,我是 python 新手,你能显示代码吗?因为我不确定根据 126 数据样本我应该在哪里更改;我应该将“X_encoded = np.empty(X.shape)”更改为“X_encoded = np.empty(1,shape)”吗?它不起作用
    • 为什么不一开始就转换所有数据,然后使用K-Fold交叉验证?这样,您将转换所有数据,然后使用折叠将数据拆分为训练和测试数据。
    • 对高级python用户的好建议!但是,我很新,我不知道如何开始,甚至不知道如何更改您提供的代码。 “谈话很便宜,给我看代码”引用..
    • 嗨 Sera,我的原始代码试图预测结果,但是您的代码显示了测试大小和训练大小??
    • 没有。 y_pred 包含交叉验证的预测。尝试逐行阅读代码。测试和训练集的大小只是为了告诉你我把数据分成训练和测试
    猜你喜欢
    • 2020-03-06
    • 2020-10-29
    • 1970-01-01
    • 2015-06-02
    • 2015-06-28
    • 1970-01-01
    • 2021-01-30
    • 2020-03-01
    • 1970-01-01
    相关资源
    最近更新 更多