clf.fit(X, Y) Scikit learn 790 scikit learn in fit 236. ValueError: Number of labels=44 does not match number of samples=45答案

【问题标题】：clf.fit(X, Y) Scikit learn 790 scikit learn in fit 236. ValueError: Number of labels=44 does not match number of samples=45clf.fit(X, Y) Scikit learn 790 scikit learn in fit 236. ValueError: Number of labels=44 does not match number of samples=45
【发布时间】：2018-05-05 17:34:04
【问题描述】：

无法判断它是来自我的代码还是框架中的错误。好的，所以我只是在做一个供个人使用的个人项目，以更好地使用 python。这是我第一个有超过 100 行代码的项目，所以我一定会出错，但我一直收到这个错误。当我得到参考以防万一我有很大的语法错误时，我真的看不出有什么不同。它指向库内的错误和代码，所以我想弄清楚是否有修复。事情是超过100行代码，所以我会尽力放一个简化版本。如果您能帮助我了解我在内部做错了什么，我将不胜感激。

from sklearn import tree

import pandas as pd

#to read the csv file
df = pd.read_csv('aapl.csv', parse_dates=True, index_col=0)

#sets up the Decision tree
clf = tree.DecisionTreeClassifier()

#input data for training ... there is a lot of data so this is 
#the smaller version to get to the point
X = [[7, 1, 17], [7, 3, 17], [7, 5, 17], [7, 7, 17], [7, 10, 17],
    [7, 11, 17], [7, 13, 17], [7, 15, 17], [7, 17, 17], [7, 19, 17]]

#Output data... This is only a fraction ,but it is simplified like X

Y = ['144.88,  145.30,  143.10,  143.50,  14277848',

     '144.88,  145.30,  143.10,  143.50,  14277848',

     '143.69,  144.79,  142.72,  144.09,  21569557',

     '142.90, 144.75,  142.90,  144.18,  19201712',

     '144.11,  145.95,  143.37,  145.06,  21090636',

     '144.73,  145.85,  144.38,  145.53,  19781836',

     '145.50,  148.49,  145.44,  147.77,  25199373',

     '147.97,  149.33,  147.33,  149.04,  20132061',

     '148.82,  150.90,  148.57,  149.56,  23793456',

     '150.48, 151.42,  149.95,  151.02,  20922969']

#fitting the data in. This is where is said there was a error ,but it
#is still consistent with the variables above
clf = clf.fit(X, Y)

#tells it to predict
test = clf.predict([[9, 12, 17]])

#prints the prediction
print(test)

然后当我尝试运行它时它给我的错误

Traceback（最近一次调用最后一次）：文件“/Users/kodecreer/Documents/PersonalDataProj.py”，第 117 行，在 clf = clf.fit(X, Y) 文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/tree/tree.py”，第 790 行，适合 X_idx_sorted=X_idx_sorted) 文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/tree/tree.py”，第 236 行，适合 "样本数=%d" % (len(y), n_samples)) ValueError：标签数=44 与样本数=45 不匹配

我尝试卸载 scikit 然后重新安装并刷新 python 编译器。我也试过在stackoverflow上搜索，但找不到...

答案：输入与输出不匹配，这就是它这样做的原因。谢谢江川智宏的回答

【问题讨论】：

标签： python pandas scikit-learn

【解决方案1】：

似乎由于“标签数=44 与样本数=45 不匹配”而发生错误这意味着您的 X 和 Y 长度不同。你能确认一下X和Y的长度吗？

【讨论】：

是的，它奏效了。对不起，我浪费了你的时间。我是一周前才开始的，所以我对这个有点菜鸟
不用担心。每个人都是从菜鸟开始的。