【发布时间】:2018-09-15 17:36:34
【问题描述】:
我正在使用 R Markdown 绘制曲面
p1 <- plot_ly() %>% add_surface(z=z,x=wRange,y=yRange) %>% layout(showlegend=FALSE,scene=list(xaxis=list(title="wMult"),yaxis=list(title="yMult"),zaxis=list(title="MAE")))
p1
稍后我想通过这样做向这个表面添加一个点:
p2 <- p1 %>% add_markers(z=MAE1,x=wMult1,y=yMult1) %>% layout(showlegend=FALSE)
p2
不久之后,我尝试通过在 p2 上添加另一个标记来绘制 p3。
p3 <- p2 %>% add_markers(z=MAE2,x=wMult2,y=yMult2) %>% layout(showlegend=FALSE)
p3
不幸的是,只有 p1 在 HTML 中呈现为交互式图表。 p2 和 p3 显示为一个空白区域,大致相当于图表的大小,但在查看器和浏览器中都没有内部空间。如果我使用网络检查器,我可以看到它正在尝试渲染一个绘图对象,但它看起来是空的。
如果我直接在 RStudio 中运行相同的代码,我可以查看添加了额外标记的图,但是当我编织 markdown 时它们不会呈现。
这是怎么回事?
数据集可在此处获得:https://archive.ics.uci.edu/ml/datasets/auto+mpg
这是目前为止的完整降价代码:
---
title: "Gradient Descent Demo"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
setwd("[your working directory]")
data = read.csv("mpg.csv")
require(plotly)
```
## Our Dataset
Let's take a look at some sample data. It shows attributes of several old cars and their fuel economy measured in miles per gallon.
```{r c1}
head(data)
```
We'll try to predict a given car's mpg using only its weight and year.
```{r c2}
data <- data[,c("mpg","weight","year")]
pairs(data)
```
## Create a Hypothesis
Our hypothesis will be that we can get an approximation of the mpg by multipling the car's weight by some number "wMult" and adding that to the car's year multiplied by some other number "yMult". Let's just pick some numbers to start.
```{r c3, include=FALSE}
mod1 <- lm(mpg~weight+year,data=data)
bias1 <- mod1$coefficients[1]
```
```{r}
wMult1 <- -.02
yMult1 <- -2
```
We can turn this into a prediction.
(Ignore the bias - I cheated and did some behind-the-scenes pre-work.)
```{r c4}
data$mpgPred1 <- wMult1*data$weight + yMult1*data$year + bias1
head(data)
```
Ok so we have predictions. They're clearly pretty bad since they're negative, and most cars don't get negative miles per gallon. But can we measure how bad they are?
## Evaluate the Hypothesis
We need some measure of how good (or bad) our prediction is. We'll use the Mean Absolute Error ("MAE"). As the name suggests, this is calculated finding the average of the absolute difference between each predicted value and actual value.
```{r c5}
MAE1 <- mean(abs(data$mpgPred1-data$mpg))
MAE1
```
Ok so on average we're only off by about 250 mpg. Surely we can do better.
## Adjust the Hypothesis
What to use for our next hypothesis? Well we assign new wMult and yMult values and see how we do.
```{R c6}
wMult2 <- wMult1 + .03
yMult2 <- wMult2 - 1.2
data$mpgPred2 <- wMult2*data$weight + yMult2*data$year + bias1
head(data)
```
Our predictions look better (At least they're positive!), but they're still pretty far off. Let's see how much better or worse they are.
## Evaluate the Hypothesis - Round 2
```{R c7}
MAE2 <- mean(abs(data$mpgPred2-data$mpg))
MAE1
MAE2
```
Now we're only off by 50 on average. Still pretty terrible, but better than before.
## Adjust the Hypothesis - There has to be a better way.
Ok so instead of just continuing to make random guesses, let's develop a way to intelligently update our hypothesis.
Thankfully, since we're only using two variables for our analysis, we can pretty easily visualize the effect of every reasonable combination of wMult and yMult.
```{R c8, include=FALSE}
plotdata <- data.frame(wCoef=double(),yCoef=double(),MAE=double())
wRange <- seq(mod1$coefficients[2]-300*summary(mod1)$coefficients["weight","Std. Error"],mod1$coefficients[2]+300*summary(mod1)$coefficients["weight","Std. Error"],length.out=201)
yRange <- seq(mod1$coefficients[3]-300*summary(mod1)$coefficients["year","Std. Error"],mod1$coefficients[3]+300*summary(mod1)$coefficients["year","Std. Error"],length.out=201)
for(i in wRange)
{for(j in yRange)
{
preds <- (i*data$weight) + (j*data$year) + bias1
resid <- preds-data$mpg
MAE = mean(abs(resid))
newRec <- data.frame(wCoef=i,yCoef=j,MAE=MAE)
plotdata <- rbind(plotdata,newRec)
}
}
z <- matrix(plotdata$MAE,nrow=201,ncol=201)
```
```{R c9}
p1 <- plot_ly() %>% add_surface(z=z,x=wRange,y=yRange) %>% layout(showlegend=FALSE,scene=list(xaxis=list(title="wMult"),yaxis=list(title="yMult"),zaxis=list(title="MAE")))
p1
```
Great - we can visibly explore this graph and see what some good weights might be. The best one is the one that minimizes the MAE. That's the center spot at the middle of the valley, where the crease seems to dip slightly.
Let's add our first hypothesis to this chart to see where it falls.
```{R c10,warning=F}
p2 <- p1 %>% add_markers(z=MAE1,x=wMult1,y=yMult1) %>% layout(showlegend=FALSE)
p2
```
And let's add our second one
```{R c11}
p3 <- p2 %>% add_markers(z=MAE2,x=wMult2,y=yMult2) %>% layout(showlegend=FALSE)
p3
```
Ok so it turns out our second guess actually overshot. This means that if we kept updating our hypothesis in the same manner, we'd actually get worse with each new step.
## Letting the machine do it
As I mentioned before, this approach works because we only have 2 variables we're working with. But if we had more, we'd be dealing with spaces greater than 3 dimensions. This gets hard to visualize.
Thankfully there's a way for the machine to navigate those higher dimensional spaces. We'll continue to use this two dimensional approach for now to help illustrate the approach.
【问题讨论】:
-
您能否分享一个包含 rmarkdown 代码的可重现示例?
-
数据集在此处可用。 archive.ics.uci.edu/ml/datasets/auto+mpg我将使用完整的降价代码更新我的帖子。
标签: r markdown r-markdown plotly r-plotly