【发布时间】:2018-11-12 19:50:43
【问题描述】:
我正在尝试对具有变量目标、助攻和得分的预测变量进行 NHL 统计数据的回归。但是,我们的输出与我们想要的输出不同。我们得到的不是我们指定的预测变量(目标、助攻和得分),而是我们拦截实例的每个实例。见下文:
urlname <- "https://www.hockey-reference.com/leagues/NHL_2018_skaters.html"
scraped_data <- read_html(urlname)
table.nhl <- html_nodes(scraped_data, "table")
scraped.nhl.data <- as.data.frame(html_table(table.nhl, header = TRUE))
colnames(scraped.nhl.data) = scraped.nhl.data[1, ] # the first row will be the header
scraped.nhl.data = scraped.nhl.data[-1, ] # removing the first row.
for (i in 1:nrow(scraped.nhl.data)){
if (scraped.nhl.data[i,1] == "Rk"){
scraped.nhl.data <- scraped.nhl.data[-i,]
}
}
pittsburgh <- scraped.nhl.data[scraped.nhl.data$Tm == "PIT", ]
pittsburgmodel <- pittsburgh[, c( "G", "A", "PTS")]
pittsburgmodel <- pittsburgmodel[complete.cases(pittsburgmodel), ]
View(pittsburgmodel)
names(pittsburgmodel) <- c(" goals", "assists", "points")
attach(pittsburgmodel)
fit = lm(games played ~., data = pittsburgmodel)
summary(fit)
输出
Coefficients: (18 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.719e-15 2.835e-15 -1.312e+00 0.247
assists1 2.000e+00 6.945e-15 2.880e+14 <2e-16 ***
assists10 4.000e+00 6.945e-15 5.759e+14 <2e-16 ***
assists12 1.800e+01 6.945e-15 2.592e+15 <2e-16 ***
assists13 5.000e+00 6.945e-15 7.199e+14 <2e-16 ***
assists2 4.000e+00 6.945e-15 5.759e+14 <2e-16 ***
assists20 2.900e+01 6.945e-15 4.175e+15 <2e-16 ***
assists21 1.100e+01 6.945e-15 1.584e+15 <2e-16 ***
assists22 7.000e+00 6.945e-15 1.008e+15 <2e-16 ***
assists23 4.000e+00 6.945e-15 5.759e+14 <2e-16 ***
assists25 1.300e+01 6.945e-15 1.872e+15 <2e-16 ***
assists26 2.200e+01 6.945e-15 3.168e+15 <2e-16 ***
assists3 2.000e+00 5.305e-15 3.770e+14 <2e-16 ***
assists4 4.000e+00 6.945e-15 5.759e+14 <2e-16 ***
assists42 9.000e+00 6.945e-15 1.296e+15 <2e-16 ***
assists5 3.000e+00 6.945e-15 4.319e+14 <2e-16 ***
assists56 4.200e+01 6.945e-15 6.047e+15 <2e-16 ***
assists58 3.400e+01 6.945e-15 4.895e+15 <2e-16 ***
assists6 2.000e+00 6.945e-15 2.880e+14 <2e-16 ***
assists60 2.900e+01 6.945e-15 4.175e+15 <2e-16 ***
assists8 4.000e+00 6.945e-15 5.759e+14 <2e-16 ***
points1 1.000e+00 6.945e-15 1.440e+14 <2e-16 ***
points10 2.000e+00 8.967e-15 2.231e+14 <2e-16 ***
points12 NA NA NA NA
points13 -1.000e+00 8.967e-15 -1.115e+14 <2e-16 ***
points14 NA NA NA NA
points18 NA NA NA NA
points27 NA NA NA NA
points29 NA NA NA NA
points3 NA NA NA NA
points30 NA NA NA NA
points31 -1.000e+00 8.967e-15 -1.115e+14 <2e-16 ***
points32 NA NA NA NA
points38 NA NA NA NA
points4 -2.000e+00 8.967e-15 -2.231e+14 <2e-16 ***
points48 NA NA NA NA
points49 NA NA NA NA
points5 NA NA NA NA
points51 NA NA NA NA
points6 NA NA NA NA
points8 NA NA NA NA
points89 NA NA NA NA
points92 NA NA NA NA
points98 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.34e-15 on 5 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.72e+30 on 25 and 5 DF, p-value: < 2.2e-16
期望的输出
Estimate Std. Error t value Pr(>|t|)
(Intercept) value value value value
Goals value value value value
Assists value value value value
【问题讨论】:
-
使用
str(pittsburgmodel)查看每列的数据类型。看起来像数字的值实际上并未编码为数值。
标签: r