【问题标题】:Continuous speech recogn. with SFSpeechRecognizer (ios10-beta)连续语音识别。使用 SFSpeechRecognizer (ios10-beta)
【发布时间】:2016-10-15 18:22:00
【问题描述】:

我正在尝试继续。在 iOS 10 测试版上使用 AVCapture 进行语音识别。我已设置captureOutput(...) 以不断获取CMSampleBuffers。我将这些缓冲区直接放入我之前设置的SFSpeechAudioBufferRecognitionRequest 中:

... do some setup
  SFSpeechRecognizer.requestAuthorization { authStatus in
    if authStatus == SFSpeechRecognizerAuthorizationStatus.authorized {
      self.m_recognizer = SFSpeechRecognizer()
      self.m_recognRequest = SFSpeechAudioBufferRecognitionRequest()
      self.m_recognRequest?.shouldReportPartialResults = false
      self.m_isRecording = true
    } else {
      print("not authorized")
    }
  }
.... do further setup


func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

if(!m_AV_initialized) {
  print("captureOutput(...): not initialized !")
  return
}
if(!m_isRecording) {
  return
}

let formatDesc = CMSampleBufferGetFormatDescription(sampleBuffer)
let mediaType = CMFormatDescriptionGetMediaType(formatDesc!)
if (mediaType == kCMMediaType_Audio) {
  // process audio here
  m_recognRequest?.appendAudioSampleBuffer(sampleBuffer)
}
return
}

整个过程只持续了几秒钟。然后不再调用 captureOutput 。如果我注释掉 appendAudioSampleBuffer(sampleBuffer) 行,那么只要应用程序运行(如预期的那样),就会调用 captureOutput。显然,将样本缓冲区放入语音识别引擎会以某种方式阻止进一步的执行。我猜想可用的缓冲区会在一段时间后被消耗掉,并且进程会以某种方式停止,因为它无法再获得缓冲区???

我应该提到,在前 2 秒内记录的所有内容都会导致正确识别。我只是不知道 SFSpeech API 是如何工作的,因为 Apple 没有在 beta 文档中添加任何文本。顺便说一句:如何使用 SFSpeechAudioBufferRecognitionRequest.endAudio() ?

这里有人知道吗?

谢谢 克里斯

【问题讨论】:

标签: ios swift beta ios10


【解决方案1】:

我将语音识别 WWDC 开发人员谈话中的 SpeakToMe 示例 Swift 代码转换为 Objective-C,它对我有用。对于 Swift,请参见 https://developer.apple.com/videos/play/wwdc2016/509/,对于 Objective-C,请参见下文。

- (void) viewDidAppear:(BOOL)animated {

_recognizer = [[SFSpeechRecognizer alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en-US"]];
[_recognizer setDelegate:self];
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus authStatus) {
    switch (authStatus) {
        case SFSpeechRecognizerAuthorizationStatusAuthorized:
            //User gave access to speech recognition
            NSLog(@"Authorized");
            break;

        case SFSpeechRecognizerAuthorizationStatusDenied:
            //User denied access to speech recognition
            NSLog(@"SFSpeechRecognizerAuthorizationStatusDenied");
            break;

        case SFSpeechRecognizerAuthorizationStatusRestricted:
            //Speech recognition restricted on this device
            NSLog(@"SFSpeechRecognizerAuthorizationStatusRestricted");
            break;

        case SFSpeechRecognizerAuthorizationStatusNotDetermined:
            //Speech recognition not yet authorized

            break;

        default:
            NSLog(@"Default");
            break;
    }
}];

audioEngine = [[AVAudioEngine alloc] init];
_speechSynthesizer  = [[AVSpeechSynthesizer alloc] init];         
[_speechSynthesizer setDelegate:self];
}


-(void)startRecording
{
[self clearLogs:nil];

NSError * outError;

AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory:AVAudioSessionCategoryRecord error:&outError];
[audioSession setMode:AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation  error:&outError];

request2 = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

inputNode = [audioEngine inputNode];

if (request2 == nil) {
    NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
}

if (inputNode == nil) {

    NSLog(@"Unable to created a inputNode object");
}

request2.shouldReportPartialResults = true;

_currentTask = [_recognizer recognitionTaskWithRequest:request2
                delegate:self];

[inputNode installTapOnBus:0 bufferSize:4096 format:[inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
    NSLog(@"Block tap!");

    [request2 appendAudioPCMBuffer:buffer];

}];

    [audioEngine prepare];
    [audioEngine startAndReturnError:&outError];
    NSLog(@"Error %@", outError);
}

- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result {

NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition");
NSString * translatedString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

[self log:translatedString];

if ([result isFinal]) {
    [audioEngine stop];
    [inputNode removeTapOnBus:0];
    _currentTask = nil;
    request2 = nil;
}
}

【讨论】:

  • 问题被标记为“Swift”。你为什么在 Swift 问题上发布从 Swift 翻译的 Objective-C 代码?!
  • 因为看这个问题的人可能对 Objective-C 有完全相同的问题,并且有一个完全独立的问题将是多余的。
  • 很好的答案,虽然忘记了你说文本的部分 // 要求所有识别,包括非最终假设 - (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didHypothesizeTranscription:(SFTranscription *)transcription { NSString * translateString = [转录格式化字符串]; NSLog(@"%@", translateString); [self.speechSynthesizer speakUtterance:[AVSpeechUtterance speechUtteranceWithString:translatedString]]; }
  • 其生成错误:AVAudioEngineGraph 所需条件为假:NULL != tap
【解决方案2】:

我已成功连续使用 SFSpeechRecognizer。 重点是使用AVCaptureSession来捕捉音频并传输到SpeechRecognizer。 对不起,我的 Swift 很差,所以只有 ObjC 版本。

这是我的示例代码(省略了一些 UI 代码,一些重要的已标记):

@interface ViewController ()<AVCaptureAudioDataOutputSampleBufferDelegate,SFSpeechRecognitionTaskDelegate>
@property (nonatomic, strong) AVCaptureSession *capture;
@property (nonatomic, strong) SFSpeechAudioBufferRecognitionRequest *speechRequest;
@end

@implementation ViewController
- (void)startRecognizer
{
    [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
        if (status == SFSpeechRecognizerAuthorizationStatusAuthorized){
            NSLocale *local =[[NSLocale alloc] initWithLocaleIdentifier:@"fr_FR"];
            SFSpeechRecognizer *sf =[[SFSpeechRecognizer alloc] initWithLocale:local];
            self.speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
            [sf recognitionTaskWithRequest:self.speechRequest delegate:self];
            // should call startCapture method in main queue or it may crash
            dispatch_async(dispatch_get_main_queue(), ^{
                [self startCapture];
            });
        }
    }];
}

- (void)endRecognizer
{
    // END capture and END voice Reco
    // or Apple will terminate this task after 30000ms.
    [self endCapture];
    [self.speechRequest endAudio];
}

- (void)startCapture
{
    NSError *error;
    self.capture = [[AVCaptureSession alloc] init];
    AVCaptureDevice *audioDev = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
    if (audioDev == nil){
        NSLog(@"Couldn't create audio capture device");
        return ;
    }

    // create mic device
    AVCaptureDeviceInput *audioIn = [AVCaptureDeviceInput deviceInputWithDevice:audioDev error:&error];
    if (error != nil){
        NSLog(@"Couldn't create audio input");
        return ;
    }

    // add mic device in capture object
    if ([self.capture canAddInput:audioIn] == NO){
        NSLog(@"Couldn't add audio input");
        return ;
    }
    [self.capture addInput:audioIn];
    // export audio data
    AVCaptureAudioDataOutput *audioOutput = [[AVCaptureAudioDataOutput alloc] init];
    [audioOutput setSampleBufferDelegate:self queue:dispatch_get_main_queue()];
    if ([self.capture canAddOutput:audioOutput] == NO){
        NSLog(@"Couldn't add audio output");
        return ;
    }
    [self.capture addOutput:audioOutput];
    [audioOutput connectionWithMediaType:AVMediaTypeAudio];
    [self.capture startRunning];
}

-(void)endCapture
{
    if (self.capture != nil && [self.capture isRunning]){
        [self.capture stopRunning];
    }
}

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    [self.speechRequest appendAudioSampleBuffer:sampleBuffer];
}
// some Recognition Delegate
@end

【讨论】:

  • 在我的情况下无法调用委托方法.. 这是代码 - (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result { NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)任务 didFinishRecognition"); NSString * translateString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; NSLog(@"说:%@", translateString); }
  • 它的工作时间是否超出了语音框架的 1 分钟限制?否则,一旦发生这种情况,您将需要重新启动识别器以实现“连续”识别器行为。
  • requestAuthorization 不调用
  • 知道连续语音识别是否会让应用程序被 Apple 拒绝?
  • 通过不断的识别,你能超过 1 分钟吗?
【解决方案3】:

这是@cube 答案的 Swift (3.0) 实现:

import UIKit
import Speech
import AVFoundation


class ViewController: UIViewController  {
  @IBOutlet weak var console: UITextView!

  var capture: AVCaptureSession?
  var speechRequest: SFSpeechAudioBufferRecognitionRequest?
  override func viewDidLoad() {
    super.viewDidLoad()
  }
  override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)
    startRecognizer()
  }

  func startRecognizer() {
    SFSpeechRecognizer.requestAuthorization { (status) in
      switch status {
      case .authorized:
        let locale = NSLocale(localeIdentifier: "fr_FR")
        let sf = SFSpeechRecognizer(locale: locale as Locale)
        self.speechRequest = SFSpeechAudioBufferRecognitionRequest()
        sf?.recognitionTask(with: self.speechRequest!, delegate: self)
        DispatchQueue.main.async {

        }
      case .denied:
        fallthrough
      case .notDetermined:
        fallthrough
      case.restricted:
        print("User Autorization Issue.")
      }
    }

  }

  func endRecognizer() {
    endCapture()
    speechRequest?.endAudio()
  }

  func startCapture() {

    capture = AVCaptureSession()

    guard let audioDev = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeAudio) else {
      print("Could not get capture device.")
      return
    }

    guard let audioIn = try? AVCaptureDeviceInput(device: audioDev) else {
      print("Could not create input device.")
      return
    }

    guard true == capture?.canAddInput(audioIn) else {
      print("Couls not add input device")
      return
    }

    capture?.addInput(audioIn)

    let audioOut = AVCaptureAudioDataOutput()
    audioOut.setSampleBufferDelegate(self, queue: DispatchQueue.main)

    guard true == capture?.canAddOutput(audioOut) else {
      print("Could not add audio output")
      return
    }

    capture?.addOutput(audioOut)
    audioOut.connection(withMediaType: AVMediaTypeAudio)
    capture?.startRunning()


  }

  func endCapture() {

    if true == capture?.isRunning {
      capture?.stopRunning()
    }
  }
}

extension ViewController: AVCaptureAudioDataOutputSampleBufferDelegate {
  func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    speechRequest?.appendAudioSampleBuffer(sampleBuffer)
  }

}

extension ViewController: SFSpeechRecognitionTaskDelegate {

  func speechRecognitionTask(_ task: SFSpeechRecognitionTask, didFinishRecognition recognitionResult: SFSpeechRecognitionResult) {
    console.text = console.text + "\n" + recognitionResult.bestTranscription.formattedString
  }
}

不要忘记在info.plist文件中为NSSpeechRecognitionUsageDescription添加一个值,否则它会崩溃。

【讨论】:

  • 你还需要在info.plist中使用麦克风
  • 你应该打电话给startCapture()DispatchQueue.main.async { }
  • @Carpsen90,我不确定。应该很容易尝试。
【解决方案4】:

事实证明,Apple 的新本机语音识别不会自动检测语音静音结束(错误?),这对您的情况很有用,因为语音识别激活了近一分钟(最长期限,允许苹果的服务)。 因此,基本上,如果您需要连续 ASR,则必须在您的委托触发时重新启动语音识别:

func speechRecognitionTask(task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) //wether succesfully= true or not

这是我使用的录音/语音识别 SWIFT 代码,它运行良好。如果您不需要,请忽略我计算麦克风音量平均功率的部分。我用它来动画波形。不要忘记设置 SFSpeechRecognitionTaskDelegate,是委托方法,如果需要额外代码,请告诉我。

func startNativeRecording() throws {
        LEVEL_LOWPASS_TRIG=0.01
        //Setup Audio Session
        node = audioEngine.inputNode!
        let recordingFormat = node!.outputFormatForBus(0)
        node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
            self.nativeASRRequest.appendAudioPCMBuffer(buffer)

 //Code to animate a waveform with the microphone volume, ignore if you don't need it:
            var inNumberFrames:UInt32 = buffer.frameLength;
            var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
            var avgValue:Float32 = 0;
            vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
            //vDSP_maxmgv returns peak values
            //vDSP_meamgv returns mean magnitude of a vector

            let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
            var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
            print("AVG. POWER: "+averagePower.description)
            dispatch_async(dispatch_get_main_queue(), { () -> Void in
                //print("VU: "+vu.description)
                var fAvgPwr=CGFloat(averagePower)
                print("AvgPwr: "+fAvgPwr.description)

                var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
                if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
                self.waveview.hidden=false
                self.waveview.updateWithLevel(waveformFriendlyValue)
            })
        }
        audioEngine.prepare()
        try audioEngine.start()
        isNativeASRBusy=true
        nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
        nativeSpeechRecognizer?.delegate=self
  //I use this timer to track no speech timeouts, ignore if not neeeded:
        self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector:  #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
    }

【讨论】:

  • 这只是一个介于 -1 和 1 之间的参数,我使用了 -0.2 的值来调整麦克风音量图的大小以适应我的应用程序的 UI。如果你不需要绘制麦克风音量,那么你可以给它一个零值,或者简单地取出那部分代码。 @MarkusRautopuro
  • avgValue 实际上是最大值而不是平均值,考虑重命名它
  • 这里的LEVEL_LOWPASS_TRIG 是什么?
  • @aBikis 我不确定,我在网上的公式上看到了,将 Level_Lowpass_Trig 设置为 0.01 对我有用。
  • 如果您可以在 Apple Watch 上打个招呼,那么您只需复制粘贴该代码并调用该方法即可。是的,Swift 和 Watch OS 发生了变化,因此您可能需要修复 2 或 3 行已弃用的代码。 @lya
【解决方案5】:

如果您启用仅在设备上识别,它不会在 1 分钟后自动停止语音识别。

.requiresOnDeviceRecognition = true

更多关于 requiresOnDeviceRecognition ;

https://developer.apple.com/documentation/speech/sfspeechrecognitionrequest/3152603-requiresondevicerecognition

【讨论】:

    【解决方案6】:

    这在我的应用程序中完美运行。 您可以通过 saifurrahman3126@gmail.com 进行查询 苹果不允许用户连续翻译超过一分钟。 https://developer.apple.com/documentation/speech/sfspeechrecognizercheck here

    “计划将音频持续时间限制为一分钟。语音识别对电池寿命和网络使用造成了相对较高的负担。为了尽量减少这种负担,框架会停止持续时间超过一分钟的语音识别任务。这个限制是类似于键盘相关的听写。” 这是 Apple 在其文档中所说的。

    目前,我已经发出了 40 秒的请求,如果你在 40 秒之前说话然后暂停,我会重新连接它,录音会重新开始。

    @objc  func startRecording() {
        
        self.fullsTring = ""
        audioEngine.reset()
        
        if recognitionTask != nil {
            recognitionTask?.cancel()
            recognitionTask = nil
        }
        
        let audioSession = AVAudioSession.sharedInstance()
        do {
            try audioSession.setCategory(.record)
            try audioSession.setMode(.measurement)
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
            try audioSession.setPreferredSampleRate(44100.0)
            
            if audioSession.isInputGainSettable {
                let error : NSErrorPointer = nil
                
                let success = try? audioSession.setInputGain(1.0)
                
                guard success != nil else {
                    print ("audio error")
                    return
                }
                if (success != nil) {
                    print("\(String(describing: error))")
                }
            }
            else {
                print("Cannot set input gain")
            }
        } catch {
            print("audioSession properties weren't set because of an error.")
        }
        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        
        let inputNode = audioEngine.inputNode
        guard let recognitionRequest = recognitionRequest else {
            fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
        }
        
        recognitionRequest.shouldReportPartialResults = true
        self.timer4 = Timer.scheduledTimer(timeInterval: TimeInterval(40), target: self, selector: #selector(againStartRec), userInfo: nil, repeats: false)
        
        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error ) in
            
            var isFinal = false  //8
            
            if result != nil {
                self.timer.invalidate()
                self.timer = Timer.scheduledTimer(timeInterval: TimeInterval(2.0), target: self, selector: #selector(self.didFinishTalk), userInfo: nil, repeats: false)
                
                let bestString = result?.bestTranscription.formattedString
                self.fullsTring = bestString!
                
                self.inputContainerView.inputTextField.text = result?.bestTranscription.formattedString
                
                isFinal = result!.isFinal
                
            }
            if error == nil{
                
            }
            if  isFinal {
                
                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)
                
                self.recognitionRequest = nil
                self.recognitionTask = nil
                isFinal = false
                
            }
            if error != nil{
                URLCache.shared.removeAllCachedResponses()
                
                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)
                
                guard let task = self.recognitionTask else {
                    return
                }
                task.cancel()
                task.finish()
            }
        })
        audioEngine.reset()
        inputNode.removeTap(onBus: 0)
        
        let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
            self.recognitionRequest?.append(buffer)
        }
        
        audioEngine.prepare()
        
        do {
            try audioEngine.start()
        } catch {
            print("audioEngine couldn't start because of an error.")
        }
        
        self.hasrecorded = true
    }
    
    @objc func againStartRec(){
        
        self.inputContainerView.uploadImageView.setBackgroundImage( #imageLiteral(resourceName: "microphone") , for: .normal)
        self.inputContainerView.uploadImageView.alpha = 1.0
        self.timer4.invalidate()
        timer.invalidate()
        self.timer.invalidate()
        
        if ((self.audioEngine.isRunning)){
            
            self.audioEngine.stop()
            self.recognitionRequest?.endAudio()
            self.recognitionTask?.finish()
        }
        self.timer2 = Timer.scheduledTimer(timeInterval: 2, target: self, selector: #selector(startRecording), userInfo: nil, repeats: false)
    }
    
    @objc func didFinishTalk(){
        
        if self.fullsTring != ""{
            
            self.timer4.invalidate()
            self.timer.invalidate()
            self.timer2.invalidate()
            
            if ((self.audioEngine.isRunning)){
                self.audioEngine.stop()
                guard let task = self.recognitionTask else {
                    return
                }
                task.cancel()
                task.finish()
            }
        }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-04-06
      • 1970-01-01
      • 2011-03-10
      • 1970-01-01
      • 1970-01-01
      • 2023-03-27
      • 2013-06-07
      • 1970-01-01
      相关资源
      最近更新 更多