dqrng 与 Rcpp 用于从正态分布和二项分布中绘图答案

【问题标题】：dqrng with Rcpp for drawing from a normal and a binomial distributiondqrng 与 Rcpp 用于从正态分布和二项分布中绘图
【发布时间】：2019-02-16 14:54:27
【问题描述】：

我正在尝试学习如何从 Rcpp OpenMP 循环中的正态分布和二项分布中提取随机数。

我使用R::rnorm 和R::rbinom 编写了以下代码，我认为这是don't do that。

#include <RcppArmadillo.h>
#include <omp.h>
#include <dqrng.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(dqrng)]]
// [[Rcpp::plugins(openmp)]]
using namespace Rcpp;

// [[Rcpp::export]]
arma::mat  my_matrix3(const arma::mat& A, 
                      const arma::mat& B) {

  dqrng::dqRNGkind("Xoroshiro128+");
  dqrng::dqset_seed(42);



  const int nObservations = A.n_cols;
  const int nDraws = B.n_rows;
  const int nRows = nObservations * nDraws;

  // Show initialization information
  Rcpp::Rcout << "nObservations: " << nObservations << std::endl 
              << "nDraws: " << nDraws << std::endl 
              << "nRows: " << nRows << std::endl;

  arma::mat out(nRows, 5);

  // Show trace of matrix construction
  Rcpp::Rcout << "out - rows: " << out.n_rows << std::endl 
              << "out - columns: " << out.n_cols << std::endl;

  int i,n,iter,row;
  double dot_product, random_number, p;
  omp_set_num_threads(2);
#pragma omp parallel for private(i, n, iter, row)
  for(i = 0; i < nDraws; ++i){
    for(n = 0; n < nObservations; ++n) {
      row = i * nObservations + n;
      dot_product = arma::as_scalar(A.col(n).t() * B.row(i).t());
      // Show trace statement of index being accessed
      out(row, 0) = i + 1;
      out(row, 1) = n + 1;
      out(row, 2) = R::rnorm(2.0, 1.0) ;//dqrng::dqrnorm(1, 2.0, 1.0)[1]; 
      out(row, 3) = 1 / (1 + std::exp(-out(row, 3) - std::exp(dot_product)));
      out(row, 4) = R::rbinom(1,out(row, 3));
    }
  }

  return out;
}

/*** R
set.seed(9782)
A <- matrix(rnorm(10), nrow = 5)
B <- matrix(rnorm(10), ncol = 5)
test <- my_matrix3(A = A, B = B)
test

mean(test[,5])
*/

正如@ralf-stubner 所建议的，我正在尝试用dqrng 替换该代码。如果我正确理解了文档，我可以将R::rnorm(2.0, 1.0) 替换为dqrng::dqrnorm(1, 2.0, 1.0)[1]。那正确吗？替换R::rbinom(1,out(row, 3))怎么样？我在the documentation 中找不到任何对二项式绘图的参考

我能够编写以下从二项分布中并行提取的代码：

#include <RcppArmadillo.h>
// [[Rcpp::depends(dqrng, BH, RcppArmadillo)]]
#include <pcg_random.hpp>
#include <boost/random/binomial_distribution.hpp>
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
arma::mat parallel_random_matrix(int n, int m, int ncores, double p=0.5) {
  //dqrng::uniform_distribution dist(0.0, 1.0);
  boost::random::binomial_distribution<int> dist(1,p);
  dqrng::xoshiro256plus rng(42);
  arma::mat out(n,m);
  // ok to use rng here



#pragma omp parallel num_threads(ncores)
{
  dqrng::xoshiro256plus lrng(rng);      // make thread local copy of rng 
  lrng.jump(omp_get_thread_num() + 1);  // advance rng by 1 ... ncores jumps 
  auto gen = std::bind(dist, std::ref(lrng));

#pragma omp for
  for (int i = 0; i < m; ++i) {
    double lres(0);
    for (int j = 0; j < n; ++j) {
      out(j,i) = gen(); /// CAN I MAKE P BE A FUNCTION OF i and j? how???
    }
  }
}
// ok to use rng here
return out;
}

/*** R
parallel_random_matrix(5, 5, 4, 0.75)
*/

> parallel_random_matrix(5, 5, 4, 0.75)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    1
[2,]    0    1    1    1    1
[3,]    0    1    0    1    0
[4,]    1    1    1    1    1
[5,]    1    1    1    1    1

有没有办法调用out(j,i) = gen(); 并使概率成为 i 和 j 的函数？？

我写的代码有什么问题吗？

【问题讨论】：

请参阅parallel RNG usage 小插图了解详情。目前，R 和 C++ 接口不适合并行使用。您必须直接使用提供的生成器。对于二项分布，您可以使用 boost 中的一个（通过 BH 包）。
感谢@RalfStubner 的帮助。我想我能够弄清楚如何做最难的部分。我现在的问题是如何使gen 中的p 成为i 和j 的函数。例如，我尝试了设置 gen(p=0.0002) 之类的方法，但它并没有像我预期的那样工作。

标签： rcpp

【解决方案1】：

一个简单的解决方案是在循环中创建一个新的分发对象：

// [[Rcpp::depends(dqrng, BH, RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <boost/random/binomial_distribution.hpp>
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
arma::mat parallel_random_matrix(int n, int m, int ncores, double p=0.5) {
  dqrng::xoshiro256plus rng(42);
  arma::mat out(n,m);
  // ok to use rng here

#pragma omp parallel num_threads(ncores)
{
  dqrng::xoshiro256plus lrng(rng);      // make thread local copy of rng 
  lrng.jump(omp_get_thread_num() + 1);  // advance rng by 1 ... ncores jumps 

#pragma omp for
  for (int i = 0; i < m; ++i) {
    for (int j = 0; j < n; ++j) {
      // p can be a function of i and j
      boost::random::binomial_distribution<int> dist(1,p);
      auto gen = std::bind(dist, std::ref(lrng));
      out(j,i) = gen();
    }
  }
}
  // ok to use rng here
  return out;
}

/*** R
parallel_random_matrix(5, 5, 4, 0.75)
*/

这样你可以让p 依赖于i 和j。保留一个全局 dist 对象并在循环中重新配置它可能更有效，请参阅 here 了解类似问题。

【讨论】：

最后一个问题。在我最初的问题中，我提到我实际上想从二项式和法线中提取。我需要有多个lrng 吗？每个gen 函数一个？例如，如果我有 1 个二项式和 2 个法线（使用不同的方法），我需要 3 个lrng？
@Ignacio 不，您可以将bind 单个lrng 分配给多个发行版。顺便说一句，如果您对如何改进文档有任何建议，请在GitHub 上提出问题。
感谢@ralf-stubner。一旦我开始工作，我将发布一个额外示例的想法:-)