【问题标题】:Function to calculate a gradient using a for loop in R在R中使用for循环计算梯度的函数
【发布时间】:2020-07-16 23:10:16
【问题描述】:

我正在尝试编写一个函数来计算 R 中的梯度。该函数必须专门使用 for 循环来执行此操作。我编写这个函数是为了说明在尝试计算梯度时,for 循环的效率如何低于矢量化编程。

该函数接受一个设计矩阵 X 和一个系数向量 beta 以计算成本函数的梯度(设计矩阵是一个协变量矩阵,第一列是协变量。)

成本函数是 MSE,

。我正在使用解析解计算梯度,采用损失函数的偏导数。这给了我们偏导数,

对于第一个系数 $\beta_{0}$,然后类似地对于所有其他系数 $\beta_{j}$,

我设法使用矢量化编程作为下面的代码行来计算答案(实现上述)。

-2*t(X)%*%(y-X%*%beta)

我尝试使用 for 循环,但不起作用,但看起来像这样,

  # Initialize the matrix to hold gradient
  gradient = matrix(0, nrow = nrow(beta))
  
  for(i in 1:nrow(beta)){
    
    if(i == 1){
      gradient[i] = -2*sum(y - X%*%beta) # first value
    }else if(i>1){
      gradient[i] = -2*sum( t(X)%*%(y - X%*%beta) ) * apply(X[,-1],2,sum)[i-1]  
    } }
  

下面是生成要使用的数据的代码以及我在 R 中尝试的两个实现。代码生成数据,如果复制到 R 中,可用于测试修复 for 循环。

# Values
# Random data Generated
n = 4  
p = 3

beta_0  = 2
beta = matrix(c(beta_0,3,1,4), nrow= (p+1) ) # coefficients
X = runif(n=(n*p), min=-5,max= 5) # covariates
X  = matrix(X, nrow = n, ncol = p)
X = cbind(1, X) # make a design matrix
y = apply(X[,-1],1,mean)  # Response (Some function) # 

# Print all to show all initial values 
print(list("Initial Values:"="","n:"=n," p:" = p, "beta_0:" = beta_0," beta:"=beta,
           " X:"=X," y:" = y))



#             Function 1
# Find the gradient (using a for loop)
# The partial derivative of the loss function
df_ols = function(beta,X,y){
  
  begin.time <- proc.time() 
  # Initialize the matrix to hold gradient
  gradient = matrix(0, nrow = nrow(beta))
  
  for(i in 1:nrow(beta)){
    
    if(i == 1){
      gradient[i] = -2*sum(y - X%*%beta) # first value
    }else if(i>1){
      gradient[i] = -2*sum( t(X)%*%(y - X%*%beta) ) * apply(X[,-1],2,sum)[i-1]  
    } }
  
  end.time <- proc.time()
  time <- (end.time-begin.time)[3]
  
  print(list("gradient 1"=gradient,"time"=time))
}

df_ols(beta,X,y)

# Function 2
# Find the gradient Approach 2 using vectorized programming
gradient_3 <- function(X, beta){
  
  begin.time <- proc.time()
  
  # Finding the gradient
  grad_3 <- -2*t(X)%*%(y-X%*%beta)
  grad_3 <- matrix(grad_3, ncol = 1,nrow = ncol(X)) # Turn into a column matrix
  end.time <- proc.time()
  
  time <- (end.time-begin.time)[3]
  
  print(list("gradient 3"= grad_3 ,"time"=time))
  
}

gradient_3(X, beta) # Assumed Correct

如果我不是太罗嗦,我深表歉意。任何帮助将不胜感激。

【问题讨论】:

    标签: r for-loop math optimization statistics


    【解决方案1】:

    我设法让 for 循环工作,下面你会看到答案。

    # Find the gradient (using a for loop) explicit, element-wise formulations 
    # The partial derivative of the loss function
    df_ols = function(beta,X,y){
    
      begin.time <- proc.time() 
      # Initialize the matrix to hold gradient
      gradient = matrix(0, nrow = nrow(beta))
    
      for(i in 1:nrow(beta)){
    
        if(i == 1){
          gradient[i] = -2*sum(y - X%*%beta) # first value
        }else if(i>1){
          gradient[i]   =  -2*t(X[,i])%*%(y-X%*%beta)
          #gradient[i] = -2*sum( t(X)%*%(y - X%*%beta) ) * apply(X[,-1],2,sum)[i-1]  
        } }
    
      end.time <- proc.time()
      time <- (end.time-begin.time)[3]
    
      print(list("gradient 1"=gradient,"time"=time))
    }
    
    df_ols(beta,X,y)
    

    ,同时运行向量化和元素形式,我们发现向量化过程要快得多。我们在下面实现了完整的流程。

    # Random data Generated
    n = 100000*12  
    p = 3
    
    beta_0  = 2
    beta = matrix(c(beta_0,3,1,4), nrow= (p+1) ) # coefficients
    X = runif(n=(n*p), min=-5,max= 5) # covariates
    X  = matrix(X, nrow = n, ncol = p)
    X = cbind(1, X) # make a design matrix
    y = apply(X[,-1],1,mean)  # Response (Some function) 
    
    # Print parameters. To show all initial values 
    print(list("Initial Values:" = '',"n:"=n," p:" = p, "beta_0:" = beta_0," beta:"=beta,
               " X:"=round(X,digits=2)," y:" = round(y,digits=2)))
    
    
    # Function 3
    # Find the gradient Approach 3, using vectorized programming
    gradient_3 <- function(X, beta,y){
    
      begin.time <- proc.time()
    
      # Find the partial derivative of the rest (j'th)
      grad_3 <- -2*t(X)%*%(y-X%*%beta)
      grad_3 <- matrix(grad_3, ncol = 1,nrow = ncol(X)) # Turn into a column matrix
      end.time <- proc.time()
    
      time <- (end.time-begin.time)[3]
    
      print(list("gradient 3"= grad_3 ,"time"=time))
    }
    

    显示结果

    > df_ols(beta,X,y) # Elementwise
    $`gradient 1`
             [,1]
    [1,]  4804829
    [2,] 53311879
    [3,] 13471077
    [4,] 73259191
    
    $time
    elapsed 
       3.59 
    > gradient_3(X, beta,y) # Vectorized Programming solution
    $`gradient 3`
             [,1]
    [1,]  4804829
    [2,] 53311879
    [3,] 13471077
    [4,] 73259191
    
    $time
    elapsed 
       0.89 
    

    【讨论】:

      猜你喜欢
      • 2021-12-19
      • 2020-07-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-04-30
      • 1970-01-01
      • 2022-01-05
      • 2012-11-07
      相关资源
      最近更新 更多