【发布时间】:2017-07-05 15:00:36
【问题描述】:
我已经成功实现了一个函数,它可以从环形缓冲区中的任意点开始将任意数量的值复制到连续数组,但我想让它更高效。这是我的代码的最小示例:
#include <string.h>
#include <iostream>
#include <chrono>
#include <thread>
using namespace std;
/*Foo: a function*/
void Foo(int * print_array, int print_amount){
/*Simulate overhead*/
this_thread::sleep_for(chrono::microseconds(1000));
int sum = 0;
for (int i = 0; i < print_amount; i++){
sum += print_array[i]; //Linear operation
// cout << print_array[i] << " "; //Uncomment to check if correct funtionality
}
}
/*Example function*/
int main(){
/*Initialze ring buffer*/
int ring_buffer_elements = 32; //A largeish size
int ring_buffer_size = ring_buffer_elements * sizeof(int);
int * ring_buffer = (int *) malloc(ring_buffer_size);
for (int i = 0; i < ring_buffer_elements; i++)
ring_buffer[i] = i; //Fill buffer with ordered numbers
/*Initialze array*/
int array_elements = 16; //A smaller largeish size
int array_size = array_elements * sizeof(int);
int * array = (int *) malloc(array_size);
/*Set reference pointers*/
int * start_pointer = ring_buffer;
int * end_pointer = ring_buffer + ring_buffer_elements;
/*Set moving copy pointer*/
int * copy_pointer = start_pointer;
/*Set "random" amount to be copied at each iteration*/
int copy_amount = 11;
/*Set loop amount to check functionality or run time*/
int loop_amount = 1000; //Set lower if checking functionality
/***WORKING METHOD***/
/*Start timer*/
auto start_time = chrono::high_resolution_clock::now();
/*"Continuous" loop*/
for (int i = 0; i < loop_amount; i++){
/*Copy loop*/
for (int j = 0; j < copy_amount; j++){
array[j] = *copy_pointer; //Copy value from ring buffer
copy_pointer++; //Move pointer
if (copy_pointer >= end_pointer)
copy_pointer = start_pointer; //Reset pointer if reached end of ring buffer
}
Foo(array, copy_amount); //Call a function
}
/*Check run time*/
chrono::duration<double> run_time_ticks = chrono::high_resolution_clock::now() - start_time;
double run_time = run_time_ticks.count();
/*Print result*/
cout << endl << run_time << endl;
/***NAIVE METHOD***/
/*Reset moving pointer*/
copy_pointer = start_pointer;
/*Start timer*/
start_time = chrono::high_resolution_clock::now();
/*"Continuous" loop*/
for (int i = 0; i < loop_amount; i++){
/*Compute how many elements must be copied after reaching end of ring buffer*/
int copy_remainder = copy_pointer + copy_amount - end_pointer; //Ugly pointer arithmetic?
/*Check if we need to loop back or not*/
if (copy_remainder <= 0){
Foo(copy_pointer, copy_amount); //Call function
copy_pointer += copy_amount; //Move pointer
} else {
Foo(copy_pointer, copy_amount-copy_remainder); //Call function with part of values from copy pointer
Foo(start_pointer, copy_remainder); //Call function with remainder of values from start of ring buffer
copy_pointer = start_pointer + copy_remainder; //Move pointer
}
}
/*Check run time*/
run_time_ticks = chrono::high_resolution_clock::now() - start_time;
run_time = run_time_ticks.count();
/*Print result*/
cout << endl << run_time << endl;
/***memcpy METHOD***/
/*Reset moving pointer*/
copy_pointer = start_pointer;
/*Initialize size reference*/
int int_size = (int) sizeof(int);
/*Start timer*/
start_time = chrono::high_resolution_clock::now();
/*"Continuous" loop*/
for (int i = 0; i < loop_amount; i++){
/*Compute how many elements must be copied after reaching end of ring buffer*/
int copy_remainder = copy_pointer + copy_amount - end_pointer; //Ugly pointer arithmetic?
/*Check if we need to loop back or not*/
if (copy_remainder <= 0){
memcpy(array, copy_pointer, copy_amount*int_size); //Use memcpy
copy_pointer += copy_amount; //Move pointer
} else {
memcpy(array, copy_pointer, (copy_amount-copy_remainder)*int_size); //Use memcpy with part of values from copy pointer
memcpy(array+(copy_amount-copy_remainder), start_pointer, copy_remainder*int_size); //Use memcpy wih remainder of values from start of ring buffer
copy_pointer = start_pointer + copy_remainder; //Move pointer
}
/*Call a function*/
Foo(array, copy_amount);
}
/*Check run time*/
run_time_ticks = chrono::high_resolution_clock::now() - start_time;
run_time = run_time_ticks.count();
/*Print result*/
cout << endl << run_time << endl;
}
环形缓冲区用于持续更新音频数据流,因此引入的延迟量必须保持在最低限度,这是我尝试改进它的原因。
我认为复制 WORKING METHOD 中的值是多余的,应该可以只传递原始环形缓冲区数据。我这样做的幼稚方法是使用原始数据进行写入,并且每当数据循环回写时再次写入(请参阅 NAIVE IMPROVEMENT)。
确实,在这个最小示例中,这种改进要快几个数量级。但是,在我的实际应用程序中,Foo 被替换为写入硬件缓冲区的函数,并且开销很大 ̣̣̣̣̣- 最终结果比 WORKING METHOD 代码慢,这意味着我永远不应该使用它(或 Foo 在这种情况下)不止一次(每次写入音频数据)。 (EDIT一个模拟开销被添加到 Foo 以准确描述这个问题)。
因此,我的问题是,是否有更快的方法将数据从环形缓冲区复制到单个连续数组?
(此外,环形缓冲区每次写入都不需要多次回送:copy_amount 总是小于 ring_buffer_elements)
谢谢!
编辑 根据 Passer By 的建议,将原始代码 sn-p 替换为最小示例。
编辑 2 根据 duong_dajgja 的建议添加了模拟开销和 memcpy。在示例中,memcpy 方法和工作方法具有基本相同的性能(后者具有一定的优势)。在我的应用程序中,使用尽可能小的缓冲区时,memcpy 比工作方法快 3-4%。如此之快,但遗憾的是远非重要。
【问题讨论】:
-
您需要minimal reproducible example。您关于
snd_pcm_writei的额外内容与您的问题关系不大。 -
为什么不直接使用
memcopy? -
感谢您的评论,添加了一个工作示例@PasserBy
-
感谢您的建议@duong_dajgja!我添加了一个关于 memcpy 的测试和评论
-
不确定您的情况,但如果重新格式化确实需要时间,那么如何考虑并行化工作(例如多线程)?
标签: c++ arrays pointers buffer circular-buffer