【发布时间】:2020-10-14 03:22:49
【问题描述】:
我需要为我的作业这样做:
我们关注以下变量子集:regime、oil、logGDPcp 和 illit。删除在任何这些变量中具有缺失值的观测值。使用scale() 函数,缩放这些变量,使每个变量的均值为零,标准差为一。用两个聚类拟合 k-means 聚类算法。每个集群分配了多少个观测值?使用原始的非标准化数据,计算每个集群中这些变量的均值。
这就是我所做的
resources <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/resources.csv")
#subset
resources.subset <- subset(resources, select = c("cty_name", "year", "regime", "oil", "logGDPcp", "illit"))
#removing missing values
resources1 <- na.omit(resources.subset)
#scaling
scaled.resources <- scale(resources1)
#mean of zero
colMeans(scaled.resources)
#standard deviation of 1
apply(scaled.resources, 2, sd)
#fitting into two clusters
cluster2 <- kmeans(resources.scaled, centers = 2)
#how many observations are assigned to each cluster?
nrow(resources.scaled)
table(cluster2$cluster)
#means of the variables
cluster2$centers
g1 <- resources1[cluster2$cluster == 1, ]
colMeans(g1)
g2 <- resources1[cluster2$cluster == 2, ]
colMeans(g2)
但是我得到了这个错误” colMeans(x, na.rm = TRUE) 中的错误:“x”必须是数字
我该如何解决这个问题?
【问题讨论】:
标签: r scale mean standard-deviation