SFSpeechRecognizer 可以识别几个命令词而不是整个短语？答案

【问题标题】：SFSpeechRecognizer that recognizes few command words instead of a whole phrase?SFSpeechRecognizer 可以识别几个命令词而不是整个短语？
【发布时间】：2016-11-04 08:47:25
【问题描述】：

我从 Apple 的示例应用程序中设置了 SFSpeechRecognizer https://developer.apple.com/library/content/samplecode/SpeakToMe/Introduction/Intro.html

我想知道是否可以让识别器识别与其他先前识别的单词无关的单个单词。

例如，现在的识别器会在说出“滚动”时尝试形成一个句子，然后找到有意义的单词的最佳转录，因此当说出“停止”时，它会将其更改为类似“ Down”在前一个单词的上下文中更有意义。

但这不是我想要的，因为我希望我的应用程序在收听时将单个单词作为调用函数的命令收听。

有没有什么方法可以实现框架，使其持续监听单词并仅捕获说出的单个单词？

【问题讨论】：

Local spoken command recognition on mobile devices的可能重复

标签： ios swift speech-to-text sfspeechrecognizer

【解决方案1】：

是的。您可以通过设置recognitionRequest.shouldReportPartialResults = YES 扫描部分结果的传入单词，然后多次调用结果回调。

然后您可以随时处理结果，在获得最终结果之前扫描关键字/关键短语（即忽略result.isFinal）。当您找到您正在寻找的关键字/关键短语时，然后取消识别。

我已经成功地在Speaking Email 中使用这种方法实现了语音命令，作为修改后的Cordova 插件（来源here）。

例子：

- (void) recordAndRecognizeWithLang:(NSString *) lang
{
        NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:lang];
        self.sfSpeechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
        if (!self.sfSpeechRecognizer) {
                [self sendErrorWithMessage:@"The language is not supported" andCode:7];
        } else {

                // Cancel the previous task if it's running.
                if ( self.recognitionTask ) {
                        [self.recognitionTask cancel];
                        self.recognitionTask = nil;
                }

                [self initAudioSession];

                self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
                self.recognitionRequest.shouldReportPartialResults = [[self.command argumentAtIndex:1] boolValue];

                self.recognitionTask = [self.sfSpeechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult *result, NSError *error) {

                        if (error) {
                                NSLog(@"error");
                                [self stopAndRelease];
                                [self sendErrorWithMessage:error.localizedFailureReason andCode:error.code];
                        }

                        if (result) {
                                NSMutableArray * alternatives = [[NSMutableArray alloc] init];
                                int maxAlternatives = [[self.command argumentAtIndex:2] intValue];
                                for ( SFTranscription *transcription in result.transcriptions ) {
                                        if (alternatives.count < maxAlternatives) {
                                                float confMed = 0;
                                                for ( SFTranscriptionSegment *transcriptionSegment in transcription.segments ) {
                                                        NSLog(@"transcriptionSegment.confidence %f", transcriptionSegment.confidence);
                                                        confMed +=transcriptionSegment.confidence;
                                                }
                                                NSMutableDictionary * resultDict = [[NSMutableDictionary alloc]init];
                                                [resultDict setValue:transcription.formattedString forKey:@"transcript"];
                                                [resultDict setValue:[NSNumber numberWithBool:result.isFinal] forKey:@"final"];
                                                [resultDict setValue:[NSNumber numberWithFloat:confMed/transcription.segments.count]forKey:@"confidence"];
                                                [alternatives addObject:resultDict];
                                        }
                                }
                                [self sendResults:@[alternatives]];
                                if ( result.isFinal ) {
                                        [self stopAndRelease];
                                }
                        }
                }];

                AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

                [self.audioEngine.inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
                        [self.recognitionRequest appendAudioPCMBuffer:buffer];
                }],

                [self.audioEngine prepare];
                [self.audioEngine startAndReturnError:nil];
        }
}

【讨论】：

如何获取每个单词的时间戳，以便将其用作视频的字幕？？请提出建议。