【发布时间】:2020-04-17 20:11:09
【问题描述】:
我尝试从 cvs 文件中分析原理组件,但是当我运行代码时出现此错误
C:\Users\Lenovo\Desktop>python pca.py
ValueError: 无法将字符串转换为浮点数:Annee;NET;INT;SUB;LMT;DCT;IMM;EXP;VRD
这是我的 cvs 文件
我尝试删除任何空间和任何想法 这是我的python脚本,我不知道我想念什么
注意:我在python2.7下运行这段代码
from sklearn.externals import joblib
import numpy as np
import glob
import os
import time
import numpy
my_matrix = numpy.loadtxt(open("pca.csv","rb"),delimiter= ",",skiprows=0)
def pca(dataMat, r, autoset_r=False, autoset_rate=0.9):
"""
purpose: principal components analysis
"""
print("Start to do PCA...")
t1 = time.time()
meanVal = np.mean(dataMat, axis=0)
meanRemoved = dataMat - meanVal
# normData = meanRemoved / np.std(dataMat)
covMat = np.cov(meanRemoved, rowvar=0)
eigVals, eigVects = np.linalg.eig(np.mat(covMat))
eigValIndex = np.argsort(-eigVals)
if autoset_r:
r = autoset_eigNum(eigVals, autoset_rate)
print("autoset: take top {} of {} features".format(r, meanRemoved.shape[1]))
r_eigValIndex = eigValIndex[:r]
r_eigVect = eigVects[:, r_eigValIndex]
lowDDataMat = meanRemoved * r_eigVect
reconMat = (lowDDataMat * r_eigVect.T) + meanVal
t2 = time.time()
print("PCA takes %f seconds" %(t2-t1))
joblib.dump(r_eigVect, './pca_args_save/r_eigVect.eig')
joblib.dump(meanVal, './pca_args_save/meanVal.mean')
return lowDDataMat, reconMat
def autoset_eigNum(eigValues, rate=0.99):
eigValues_sorted = sorted(eigValues, reverse=True)
eigVals_total = eigValues.sum()
for i in range(1, len(eigValues_sorted)+1):
eigVals_sum = sum(eigValues_sorted[:i])
if eigVals_sum / eigVals_total >= rate:
break
return i
【问题讨论】:
-
如果您的 df 长度 isdigit 检查每个条目吗?从那里您可以找到问题条目并从那里进一步排除故障。快速referece
isdigit
标签: pandas python-2.7 scikit-learn anaconda pca