如何录制麦克风直到没有声音？答案

【问题标题】：How to record the microphone untill there is no sound?如何录制麦克风直到没有声音？
【发布时间】：2014-03-14 09:35:05
【问题描述】：

我创建了 2 个函数： - 一个记录麦克风 - 一种播放麦克风声音的工具

它会记录麦克风 3 秒

#include <iostream>
#include <Windows.h>
#include <vector>
using namespace std;

#pragma comment(lib, "winmm.lib")

 short int waveIn[44100 * 3];

void PlayRecord();

void StartRecord()
{
const int NUMPTS = 44100 * 3;   // 3 seconds
int sampleRate = 44100;  
// 'short int' is a 16-bit type; I request 16-bit samples below
                         // for 8-bit capture, you'd use 'unsigned char' or 'BYTE' 8-bit     types

 HWAVEIN      hWaveIn;
 MMRESULT result;

 WAVEFORMATEX pFormat;
 pFormat.wFormatTag=WAVE_FORMAT_PCM;     // simple, uncompressed format
 pFormat.nChannels=1;                    //  1=mono, 2=stereo
 pFormat.nSamplesPerSec=sampleRate;      // 44100
 pFormat.nAvgBytesPerSec=sampleRate*2;   // = nSamplesPerSec * n.Channels *    wBitsPerSample/8
 pFormat.nBlockAlign=2;                  // = n.Channels * wBitsPerSample/8
 pFormat.wBitsPerSample=16;              //  16 for high quality, 8 for telephone-grade
 pFormat.cbSize=0;

 // Specify recording parameters

 result = waveInOpen(&hWaveIn, WAVE_MAPPER,&pFormat,
        0L, 0L, WAVE_FORMAT_DIRECT);

  WAVEHDR      WaveInHdr;
 // Set up and prepare header for input
  WaveInHdr.lpData = (LPSTR)waveIn;
  WaveInHdr.dwBufferLength = NUMPTS*2;
  WaveInHdr.dwBytesRecorded=0;
  WaveInHdr.dwUser = 0L;
  WaveInHdr.dwFlags = 0L;
  WaveInHdr.dwLoops = 0L;
  waveInPrepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));

 // Insert a wave input buffer
  result = waveInAddBuffer(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));


 // Commence sampling input
  result = waveInStart(hWaveIn);


 cout << "recording..." << endl;

  Sleep(3 * 1000);
 // Wait until finished recording

 waveInClose(hWaveIn);

 PlayRecord();
}

void PlayRecord()
{
const int NUMPTS = 44100 * 3;   // 3 seconds
int sampleRate = 44100;  
// 'short int' is a 16-bit type; I request 16-bit samples below
                            // for 8-bit capture, you'd    use 'unsigned char' or 'BYTE' 8-bit types

HWAVEIN  hWaveIn;

WAVEFORMATEX pFormat;
pFormat.wFormatTag=WAVE_FORMAT_PCM;     // simple, uncompressed format
pFormat.nChannels=1;                    //  1=mono, 2=stereo
pFormat.nSamplesPerSec=sampleRate;      // 44100
pFormat.nAvgBytesPerSec=sampleRate*2;   // = nSamplesPerSec * n.Channels * wBitsPerSample/8
pFormat.nBlockAlign=2;                  // = n.Channels * wBitsPerSample/8
pFormat.wBitsPerSample=16;              //  16 for high quality, 8 for telephone-grade
pFormat.cbSize=0;

// Specify recording parameters

waveInOpen(&hWaveIn, WAVE_MAPPER,&pFormat, 0L, 0L, WAVE_FORMAT_DIRECT);

WAVEHDR      WaveInHdr;
// Set up and prepare header for input
WaveInHdr.lpData = (LPSTR)waveIn;
WaveInHdr.dwBufferLength = NUMPTS*2;
WaveInHdr.dwBytesRecorded=0;
WaveInHdr.dwUser = 0L;
WaveInHdr.dwFlags = 0L;
WaveInHdr.dwLoops = 0L;
waveInPrepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));

HWAVEOUT hWaveOut;
cout << "playing..." << endl;
waveOutOpen(&hWaveOut, WAVE_MAPPER, &pFormat, 0, 0, WAVE_FORMAT_DIRECT);
waveOutWrite(hWaveOut, &WaveInHdr, sizeof(WaveInHdr)); // Playing the data
Sleep(3 * 1000); //Sleep for as long as there was recorded

waveInClose(hWaveIn);
waveOutClose(hWaveOut);
}

int main()
{
 StartRecord();
    return 0;
}

我怎样才能改变我的 StartRecord 功能（我猜我的 PlayRecord 功能也是如此），让它一直录制到麦克风没有输入为止？

（到目前为止，这两个功能运行良好 - 将麦克风录制 3 秒，然后播放录音）...

谢谢！

编辑：没有声音，我的意思是音量太低或其他什么（意味着这个人可能没有说话）......

【问题讨论】：

没有“没有声音”这回事。相反，声级会在一段时间内降至某个阈值以下。
这就是我的意思。那么如何检查音量呢？
Trying to write functions that record and play sound through WinAPI 可能被同一张海报复制。
@KenWhite 这不是，因为在这个线程中，我试图将 1 个函数拆分为 2 个，而在这里我试图改变它......
“开始录制，我对着麦克风说一个词，当沉默一两秒时它会停止录制还是什么？”不等于“在没有麦克风输入之前让它录音？”

标签： c++ winapi voip voice-recording

【解决方案1】：

我建议你通过 DirectShow 来做。您应该创建麦克风、SampleGrabber、音频编码器和文件写入器的实例。你的图表应该是这样的：

麦克风 -> SampleGrabber -> 音频编码器 -> 文件编写器

每个样本都通过 SampleGrabber，您可以读取所有原始样本并检查是否应该继续记录。这是记录和检查内容的最佳方式。

【讨论】：

写入文件 -> 打开文件以获取音频并在计算机上播放要慢得多，特别是如果我想将它用于 VOIP...
一切由您选择。您可以通过默认 DirectSound 设备渲染它，或将其渲染到 Null Renderer，或将其写入文件，等等。这完全是您的选择。我建议您写入文件，因为您的代码会这样做。
我的代码没有写入文件，播放函数获取缓冲区并播放它，使其更快...
对不起阿米特，我读错了。好吧，你仍然可以渲染它：Microphone -> SampleGrabber -> Default DirectSound Device。您可以根据需要暂停/开始/停止图表。
谢谢！但是DirectShow中是否有一个选项，例如使用WINAPI Wave函数，通过缓冲区进行录制和播放？

【解决方案2】：

因为声音是一种波，它会在高压和低压之间振荡。该波形通常记录为正数和负数，零是中性压力。如果你取信号的绝对值并保持一个移动平均值就足够了。

应该在足够长的时间段内取平均值，以便您考虑适当的静音量。保持对运行平均值的估计的一种非常便宜的方法是这样的：

const double threshold = 50;    // Whatever threshold you need
const int max_samples = 10000;  // The representative running average size

double average = 0;             // The running average
int sample_count = 0;           // When we are building the average

while( sample_count < max_samples || average > threshold ) {
    // New sample arrives, stored in 'sample'

    // Adjust the running absolute average
    if( sample_count < max_samples ) sample_count++;
    average *= double(sample_count-1) / sample_count;
    average += std::abs(sample) / sample_count;
}

max_samples 越大，average 对信号的响应就越慢。声音停止后，它会慢慢消失。但是，再次上涨也会很缓慢。这对于合理连续的声音来说很好。

对于可能有短暂停顿或长暂停的语音等内容，您可能需要使用基于冲动的方法。您可以只定义您期望的“沉默”样本的数量，并在收到超过阈值的脉冲时将其重置。使用上面的运行平均值和更短的窗口大小将为您提供一种检测冲动的简单方法。那你只需要数一数...

const int max_samples = 100;             // Smaller window size for impulse
const int max_silence_samples = 10000;   // Maximum samples below threshold
int silence = 0;                         // Number of samples below threshold

while( silence < max_silence_samples ) {
    // Compute running average as before

    //...

    // Check for silence.  If there's a signal, reset the counter.
    if( average > threshold ) silence = 0;
    else ++silence;
}

调整threshold 和max_samples 将控制对爆音和点击的敏感度，而max_silence_samples 让您可以控制在停止录制之前允许多少静音。

无疑有更多技术方法可以实现您的目标，但首先尝试简单的方法总是好的。看看你是怎么做的。

【讨论】：