在 C++ 函数中，如何将 Rcpp 对象传递给其他函数（通过引用或复制）？答案

【问题标题】：Within C++ functions, how are Rcpp objects passed to other functions (by reference or by copy)?在 C++ 函数中，如何将 Rcpp 对象传递给其他函数（通过引用或复制）？
【发布时间】：2014-06-09 02:23:13
【问题描述】：

我刚刚使用 Rcpp 编写了一个新版本的 ABCoptim 包。有了大约 30 倍的速度提升，我对新版本的性能（与旧版本相比）感到非常满意，但我仍然担心我是否有空间在不修改太多代码的情况下提高性能。

在 ABCoptim（用 C++ 编写）的主要功能中，我传递了一个 Rcpp::List 对象，其中包含“蜜蜂位置”（NumericMatrix）和一些带有算法本身重要信息的 NumericVectors。我的问题是，当我在其他函数周围传递一个 Rcpp::List 对象时，例如

#include <Rcpp.h>

using namespace Rcpp;

List ABCinit([some input]){[some code here]};
void ABCfun2(List x){[some code here]};
void ABCfun3(List x){[some code here]};

List ABCmain([some input])
{
  List x = ABCinit([some input]);
  while ([some statement])
  {
    ABCfun2(x);
    ABCfun3(x);
  }
  ...

  return List::create(x["results"]);
}

Rcpp 在 while 循环中做了什么？ x 对象是通过引用传递还是通过深拷贝传递给函数ABCfun2 和ABCfun3？我已经看到“const List&x”的用法，它告诉我可以使用指针传递 Rcpp 对象，但问题是我需要这个列表是可变的（而不是常量），有什么办法可以改进吗？我担心这个 x List 的迭代副本会减慢我的代码速度。

PS：我还是 C++ 新手，而且我正在使用 Rcpp 来学习 C++。

【问题讨论】：

标签： r performance pointers rcpp

【解决方案1】：

Rcpp 中没有深拷贝，除非您使用clone 请求它。当您按值传递时，您正在创建一个新的 List 对象，但它使用相同的底层 R 对象。

因此，按值传递和按引用传递之间的差异很小。

但是，当你通过价值传递时，你必须为保护底层对象多付出一次代价。它可能会产生额外的成本，因为这个 Rcpp 依赖于递归不是很有效的R_PreserveObject。

我的指导方针是尽可能通过引用传递，这样您就不会支付额外的保护价格。如果您知道 ABCfun2 不会更改对象，我建议您通过引用 const 传递：ABCfun2( const List& )。如果您要更改List，那么我建议使用ABCfun2( List& )。

考虑这段代码：

#include <Rcpp.h>
using namespace Rcpp  ;

#define DBG(MSG,X) Rprintf("%20s SEXP=<%p>. List=%p\n", MSG, (SEXP)X, &X ) ;

void fun_copy( List x, const char* idx ){
    x[idx] = "foo" ;
    DBG( "in fun_copy: ", x) ;

}
void fun_ref( List& x, const char* idx ){
    x[idx] = "bar" ;
    DBG( "in fun_ref: ", x) ;
}


// [[Rcpp::export]]
void test_copy(){

    // create a list of 3 components
    List data = List::create( _["a"] = 1, _["b"] = 2 ) ;
    DBG( "initial: ", data) ;

    fun_copy( data, "a") ;
    DBG( "\nafter fun_copy (1): ", data) ;

    // alter the 1st component of ths list, passed by value
    fun_copy( data, "d") ;
    DBG( "\nafter fun_copy (2): ", data) ;


}

// [[Rcpp::export]]
void test_ref(){

    // create a list of 3 components
    List data = List::create( _["a"] = 1, _["b"] = 2 ) ;
    DBG( "initial: ", data) ;

    fun_ref( data, "a") ;
    DBG( "\nafter fun_ref (1): ", data) ;

    // alter the 1st component of ths list, passed by value
    fun_ref( data, "d") ;
    DBG( "\nafter fun_ref (2): ", data) ;


}

我所做的只是将一个列表传递给一个函数，更新它并打印一些关于指向底层 R 对象的指针和指向 List 对象的指针 (this) 的信息。

以下是我调用test_copy 和test_ref 时发生的结果：

> test_copy()
           initial:  SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
       in fun_copy:  SEXP=<0x7ff97c26c278>. List=0x7fff5b909f30

after fun_copy (1):  SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
$a
[1] "foo"

$b
[1] 2

       in fun_copy:  SEXP=<0x7ff97b2b3ed8>. List=0x7fff5b909f20

after fun_copy (2):  SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
$a
[1] "foo"

$b
[1] 2

我们从与 R 对象关联的现有列表开始。

           initial:  SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0

我们通过值将它传递给fun_copy，所以我们得到一个新的List，但使用相同的底层R对象：

       in fun_copy:  SEXP=<0x7fda4926d278>. List=0x7fff5bb5ef30

我们退出fun_copy。再次使用相同的底层 R 对象，然后回到我们原来的 List ：

after fun_copy (1):  SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0

现在我们再次调用fun_copy，但这次更新不在列表中的组件：x["d"]="foo"。

       in fun_copy:  SEXP=<0x7fda48989120>. List=0x7fff5bb5ef20

List别无选择，只能为自己创建一个新的底层R对象，但这个对象只是本地List的底层。因此，当我们离开get_copy 时，我们又回到了原来的List 及其原来的底层SEXP。

after fun_copy (2):  SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0

这里的关键是第一次"a"已经上榜了，所以我们直接更新了数据。因为fun_copy 的本地对象和test_copy 的外部对象共享相同的底层R 对象，所以fun_copy 内部的修改被传播。

第二次，fun_copy 增长它的本地 List 对象，将其与不会传播到外部函数的全新 SEXP 相关联。

现在考虑当你通过引用传递时会发生什么：

> test_ref()
           initial:  SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
        in fun_ref:  SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0

  after fun_ref(1):  SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
$a
[1] "bar"

$b
[1] 2

        in fun_ref:  SEXP=<0x7ff97b5254c8>. List=0x7fff5b909fd0

  after fun_ref(2):  SEXP=<0x7ff97b5254c8>. List=0x7fff5b909fd0
$a
[1] "bar"

$b
[1] 2

$d
[1] "bar"

只有一个List 对象0x7fff5b909fd0。当我们必须在第二次调用中获得一个新的SEXP 时，它会正确地传播到外层。

对我来说，通过引用传递时的行为更容易理解。

【讨论】：

我现在很清楚了！谢谢罗曼和德克！你的两个建议都非常有用。 Rcpp 的忠实粉丝试图向西班牙语世界展示它rstudio-pubs-static.s3.amazonaws.com/… =) 谢谢！
好帖子。您能否在（足够多的）调用次数上添加一些测量值，以查看实际成本是多少？

【解决方案2】：

简单地说：

void ABCfun(List x) 传递值但是List 是一个 Rcpp 对象，它包装了一个 SEXP 这是一个指针 - 所以成本在这里 比 C++ 程序员所怀疑的要少，而且它实际上是轻量级的。（但正如 Romain 正确指出的那样，额外的保护层是有成本的。）
void ABCfun(const List x) 承诺不会更改x，但再次因为它是一个指针...
void ABCfun(const List & x) 对于 C++ 程序员来说看起来最正常，并且从去年开始在 Rcpp 中得到支持。

事实上，在 Rcpp 上下文中，这三个都差不多。但是您应该按照最佳 C++ 实践的思路思考并更喜欢 3。因为有一天您可能会使用 std::list<....> 而在这种情况下，显然更可取 const 引用（Scott Meyers 在 Effective C++ 中有一篇关于此的完整文章（或者可能在配套的更有效的 C++ 中）。

但最重要的教训是，您不应该只相信人们在互联网上告诉您的内容，而应该尽可能地衡量和分析。

【讨论】：

【解决方案3】：

我是Rcpp 的新手，所以我想我会回答@Dirk 的请求，要求测量两种传递样式（复制和参考）的成本......

这两种方法之间几乎没有什么区别。

我得到以下信息：

microbenchmark(test_copy(), test_ref(), times = 1e6)
Unit: microseconds
        expr   min    lq     mean median    uq        max neval cld
  test_copy() 5.102 5.566 7.518406  6.030 6.494 106615.653 1e+06   a
   test_ref() 4.639 5.566 7.262655  6.029 6.494   5794.319 1e+06   a

我使用了@Roman 代码的精简版：删除了DBG 调用。

#include <Rcpp.h>
using namespace Rcpp;

void fun_copy( List x, const char* idx){
    x[idx] = "foo";
}

void fun_ref( List& x, const char* idx){
    x[idx] = "bar";
}

// [[Rcpp::export]]
List test_copy(){

    // create a list of 3 components
    List data = List::create( _["a"] = 1, _["b"] = 2);

    // alter the 1st component of the list, passed by value
    fun_copy( data, "a");

    // add a 3rd component to the list
    fun_copy( data, "d");
    return(data);

}

// [[Rcpp::export]]
List test_ref(){

    // create a list of 3 components
    List data = List::create( _["a"] = 1, _["b"] = 2);

    // alter the 1st component of the list, passed by reference
    fun_ref( data, "a");

    // add a 3rd component to the list
    fun_ref( data, "d");
    return(data);

}

/*** R

# benchmark copy v. ref functions
require(microbenchmark)
microbenchmark(test_copy(), test_ref(), times = 1e6)

*/

【讨论】：