【发布时间】:2017-03-30 04:55:17
【问题描述】:
我很想听听关于优化代码以计算向量 x(长度为 l)与 n 其他向量(存储在任何结构中,例如矩阵 m 与n 行和l 列)。
n 的值通常远大于l 的值。
我目前正在使用此自定义 Rcpp 函数来计算向量 x 与矩阵 m 的每一行的相似度:
library(Rcpp)
cppFunction('NumericVector cosine_x_to_m(NumericVector x, NumericMatrix m) {
int nrows = m.nrow();
NumericVector out(nrows);
for (int i = 0; i < nrows; i++) {
NumericVector y = m(i, _);
out[i] = sum(x * y) / sqrt(sum(pow(x, 2.0)) * sum(pow(y, 2.0)));
}
return out;
}')
不同的n 和l,我得到了以下几种时间:
下面的可重现代码。
# Function to simulate data
sim_data <- function(l, n) {
# Feature vector to be used for computing similarity
x <- runif(l)
# Matrix of feature vectors (1 per row) to compare against x
m <- matrix(runif(n * l), nrow = n)
list(x = x, m = m)
}
# Rcpp function to compute similarity of x to each row of m
library(Rcpp)
cppFunction('NumericVector cosine_x_to_m(NumericVector x, NumericMatrix m) {
int nrows = m.nrow();
NumericVector out(nrows);
for (int i = 0; i < nrows; i++) {
NumericVector y = m(i, _);
out[i] = sum(x * y) / sqrt(sum(pow(x, 2.0)) * sum(pow(y, 2.0)));
}
return out;
}')
# Timer function
library(microbenchmark)
timer <- function(l, n) {
dat <- sim_data(l, n)
microbenchmark(cosine_x_to_m(dat$x, dat$m))
}
# Results for grid of l and n
library(tidyverse)
results <- cross_d(list(l = seq(200, 1000, by = 200), n = seq(500, 4000, by = 500))) %>%
mutate(timings = map2(l, n, timer))
# Plot results
results_plot <- results %>%
unnest(timings) %>%
mutate(time = time / 1000000) %>% # Convert time to seconds
group_by(l, n) %>%
summarise(mean = mean(time), ci = 1.96 * sd(time) / sqrt(n()))
pd <- position_dodge(width = 20)
results_plot %>%
ggplot(aes(n, mean, group= l)) +
geom_line(aes(color = factor(l)), position = pd, size = 2) +
geom_errorbar(aes(ymin = mean - ci, ymax = mean + ci), position = pd, width = 100) +
geom_point(position = pd, size = 2) +
scale_color_brewer(palette = "Blues") +
theme_minimal() +
labs(x = "n", y = "Seconds", color = "l") +
ggtitle("Algorithm Runtime",
subtitle = "Error bars represent 95% confidence intervals")
【问题讨论】:
-
这比 SO 更适合 CodeReview。
-
感谢您的建议。我在 CodeReview 上打开了这个问题:codereview.stackexchange.com/questions/159396/…
标签: c++ r performance vector rcpp