【发布时间】:2018-04-23 09:03:18
【问题描述】:
我从read_sql_query 生成一个熊猫数据框。它有三列,“结果、速度、重量”
我想使用 scikit-learn LinearRegression 来适应 results = f(speed, weight)
我找不到正确的语法来允许我将此数据帧或它的列切片传递给LinearRegression.fit(y, X)。
print df['result'].shape
print df[['speed', 'weight']].shape
(8L,)
(8, 2)
但我无法将其传递给fit
lm.fit(df['result'], df[['speed', 'weight']])
它会抛出一个deprecation warning 和一个ValueError
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19.
ValueError: Found arrays with inconsistent numbers of samples: [1 8]
获取目标和特征的数据帧并将它们传递给fit 操作的有效、干净的方法是什么?
这就是我生成示例的方式:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
data2 = np.random.randint(1, high=100, size=len(days))
data3 = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'result': data,'speed': data2,'weight': data3})
df = df.set_index('test')
print(df)
【问题讨论】:
-
df['result'].values有时你需要df.iloc[:, :-1]
标签: python pandas scikit-learn