使用 writehtk 进行特征提取（说话人识别）答案

【问题标题】：Using writehtk for feature extraction (Speaker Identification)使用 writehtk 进行特征提取（说话人识别）
【发布时间】：2017-04-03 09:23:07
【问题描述】：

我正在研究说话人识别，发现this post on stackoverflow.com 非常有用。

虽然代码运行良好，但我只是有点怀疑：

答案中给出的代码

fRate = 0.010 * fs; 
....
writehtk(featureFilename, mfc', 100000, 9);

来自 Voicebox 的函数writehtk

function writehtk(file,d,fp,tc)
%WRITEHTK write data in HTK format []=(FILE,D,FP,TC)
%
% Inputs:
%    FILE = name of file to write (no default extension)
%       D = data to write: one row per frame
%      FP = frame period in seconds
%      TC = type code = the sum of a data type and (optionally) one or more of the listed modifiers

writehtk 函数需要以秒为单位的帧周期，但在代码中单位是别的东西。

谁能解释一下这个值是怎么得到的？

【问题讨论】：

标签： matlab signal-processing

【解决方案1】：

post you linked to 有一点混乱。

使用fRate 作为melcepst 的参数表明作者打算将fRate 表示转换为多个样本的帧之间的10 毫秒间隔（而不是帧速率）。这也与作者使用100000 作为FP 参数一致，如果此参数以100ns 为单位（这是执行writehtk function already does internally 转换的错误尝试）。

我个人会将变量 fRate 重命名为 fInterval 以避免混淆速率（通常以 Hz 为单位）和时间间隔（通常在指定采样率时以秒或样本数给出）：

fInterval = 0.010 * fs; % in samples
...
mfc = melcepst(s, fs, '0dD', nCeps, nChan, fSize, fInterval, fL, fH);

那么以秒为单位的帧持续时间就是 0.01，或者根据之前定义的变量 fInterval/fs 给出：

writehtk(featureFilename, mfc', fInterval/fs, 9);

【讨论】：

感谢 SleuthEye！如果作者的代码已经直接用于编写htk文件，（考虑到不正确的尝试进行转换），htk文件能正确生成吗？
主要是。帧时长（FP）参数只写在htk文件开头附近，对保存的系数没有影响。