【发布时间】:2017-08-23 08:59:28
【问题描述】:
我收到Mean of empty slice 运行时警告。
当我打印出我的变量是什么(numpy 数组)时,有几个
其中包含nan 值。运行时警告正在查看行
58 作为问题。我可以进行哪些更改才能使其正常工作?
有时程序会毫无问题地运行。大多数时候确实如此 不是。
这是一个从头开始的 K-Means 算法,它是聚类的 虹膜数据集。它首先提示用户输入金额 他们想要的质心(集群)。然后它随机生成说 给定范围内的簇数来自加载中的数字 在文本文件中。
我在 else 语句中有 break 值来防止无限 循环。
是不是因为当我减去 文件中数据点的质心?
我在运行时遇到错误:
How Many Centrouds? 3
Dimensionality of Data: (150, 4)
Starting Centroiuds:
[[ 1.4 7.9 0.2 3.4]
[ 7.8 0.2 4.3 1.4]
[ 5.7 6.9 3. 6.6]]
t0 :
[[[-3.7 4.4 -1.2 3.2]
[ 2.7 -3.3 2.9 1.2]
[ 0.6 3.4 1.6 6.4]]
[[-3.5 4.9 -1.2 3.2]
[ 2.9 -2.8 2.9 1.2]
[ 0.8 3.9 1.6 6.4]]
[[-3.3 4.7 -1.1 3.2]
[ 3.1 -3. 3. 1.2]
[ 1. 3.7 1.7 6.4]]
...,
[[-5.1 4.9 -5. 1.4]
[ 1.3 -2.8 -0.9 -0.6]
[-0.8 3.9 -2.2 4.6]]
[[-4.8 4.5 -5.2 1.1]
[ 1.6 -3.2 -1.1 -0.9]
[-0.5 3.5 -2.4 4.3]]
[[-4.5 4.9 -4.9 1.6]
[ 1.9 -2.8 -0.8 -0.4]
[-0.2 3.9 -2.1 4.8]]]
Warning (from warnings module):
File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 59
warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.
Warning (from warnings module):
File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 68
ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
---------------
Starting Centroids:
[[ 1.4 7.9 0.2 3.4]
[ 7.8 0.2 4.3 1.4]
[ 5.7 6.9 3. 6.6]]
Starting NewMeans:
[[ nan nan nan nan]
[ 5.84333333 3.054 3.75866667 1.19866667]
[ nan nan nan nan]]
Starting Centroids Now:
[[ nan nan nan nan]
[ 5.84333333 3.054 3.75866667 1.19866667]
[ nan nan nan nan]]
NewMeans now:
[[ nan nan nan nan]
[ 5.84333333 3.054 3.75866667 1.19866667]
[ nan nan nan nan]]
Python 代码:
import numpy as np
from pprint import pprint
import random
import sys
import warnings
arglist = sys.argv
#UNCOMMENT BELOW IN FINAL PROGRAM
'''
NoOfCentroids = int(arglist[2])
dataPointsFromFile = np.array(np.loadtxt(sys.argv[1], delimiter = ','))
'''
dataPointsFromFile = np.array(np.loadtxt('iris.txt', delimiter = ','))
NoOfCentroids = input('How Many Centrouds? ')
dataRange = ([])
#UNCOMMENT BELOW IN FINAL PROGRAM
'''
with open(arglist[1]) as f:
print 'Points in data set: ',sum(1 for _ in f)
'''
dataRange.append(round(np.amin(dataPointsFromFile),1))
dataRange.append(round(np.amax(dataPointsFromFile),1))
dataRange = np.asarray(dataRange)
dataPoints = np.array(dataPointsFromFile)
print 'Dimensionality of Data: ', dataPoints.shape
randomCentroids = []
data = ([])
templist = []
i = 0
while i<NoOfCentroids:
for j in range(len(dataPointsFromFile[1,:])):
cat = round(random.uniform(np.amin(dataPointsFromFile),np.amax(dataPointsFromFile)),1)
templist.append(cat)
randomCentroids.append(templist)
templist = []
i = i+1
centroids = np.asarray(randomCentroids)
def kMeans(array1, array2):
ConvergenceCounter = 1
keepGoing = True
StartingCentroids = np.copy(centroids)
print 'Starting Centroiuds:\n {}'.format(StartingCentroids)
while keepGoing:
#--------------Find The new means---------#
t0 = StartingCentroids[None, :, :] - dataPoints[:, None, :]
print 't0 :\n {}'.format(t0)
t1 = np.linalg.norm(t0, axis=-1)
t2 = np.argmin(t1, axis=-1)
#------Push the new means to a new array for comparison---------#
CentroidMeans = []
for x in range(len(StartingCentroids)):
CentroidMeans.append(np.mean(dataPoints[t2 == [x]], axis=0))
#--------Convert to a numpy array--------#
NewMeans = np.asarray(CentroidMeans)
#------Compare the New Means with the Starting Means------#
if np.array_equal(NewMeans,StartingCentroids):
print ('Convergence has been reached after {} moves'.format(ConvergenceCounter))
print ('Starting Centroids:\n{}'.format(centroids))
print ('Final Means:\n{}'.format(NewMeans))
print ('Final Cluster assignments: {}'.format(t2))
for x in xrange(len(StartingCentroids)):
print ('Cluster {}:\n'.format(x)), dataPoints[t2 == [x]]
for x in xrange(len(StartingCentroids)):
print ('Size of Cluster {}:'.format(x)), len(dataPoints[t2 == [x]])
keepGoing = False
else:
print 15*'-'
ConvergenceCounter = ConvergenceCounter +1
print 'Starting Centroids:\n'
print StartingCentroids
print '\n'
print 'Starting NewMeans:\n'
print NewMeans
StartingCentroids =np.copy(NewMeans)
print 'Starting Centroids Now:\n'
print StartingCentroids
print '\n'
print 'NewMeans now:'
print NewMeans
break
kMeans(centroids, dataPoints)
【问题讨论】:
-
在
data = ([])中,()什么都不做。([],)会创建一个 1 元素元组,但我认为这不是你想要的。data = []应该足够了,就像使用templist一样。你甚至在代码后面使用data吗?