提取MFCC特征后如何计算音频文件的时间线答案

【问题标题】：How to calculate the timeline of an audio file after extracting MFCC features提取MFCC特征后如何计算音频文件的时间线
【发布时间】：2020-10-11 03:28:03
【问题描述】：

使用python_speech_features提取MFCC特征后如何计算音频文件的时间线

这个想法是获取 MFCC 样本的时间线

import librosa
import python_speech_features

audio_file = r'sample.wav'

samples,sample_rate = librosa.core.load(audio_file,sr=16000, mono= True)

timeline = np.arange(0,len(samples))/sample_rate # prints timeline of sample.wav

print(timeline)

mfcc_feat = python_speech_features.mfcc(samples, sample_rate)

【问题讨论】：

标签： python audio audio-processing librosa mfcc

【解决方案1】：

python_speech_features.mfcc(...) 接受多个附加参数。其中之一是winstep，它指定了特征帧之间的次数，即mfcc特征。默认值为 0.01s = 10ms。在其他情况下，例如librosa，这也称为hop_length，然后在示例中指定。

要找到您的时间线，您必须弄清楚功能的数量和功能率。使用winstep=0.01，您的特征/秒（您的特征或帧速率）为 100 Hz。你拥有的帧数是len(mfcc_feat)。

所以你最终会得到：

import librosa
import python_speech_features
import numpy as np

audio_file = r'sample.wav'

samples, sample_rate = librosa.core.load(audio_file, sr=16000, mono=True)

timeline = np.arange(0, len(samples))/sample_rate # prints timeline of sample.wav

print(timeline)

winstep = 0.01  # happens to be the default value
mfcc_feat = python_speech_features.mfcc(samples, sample_rate, winstep=winstep)

frame_rate = 1./winstep

timeline_mfcc = np.arange(0, len(mfcc_feat))/frame_rate
print(timeline_mfcc)

由于“帧”表示持续时间 0.01 秒，您可能希望将偏移量移动到帧的中心，即 0.005 秒。

【讨论】：