【问题标题】:'Media Format' for '.caf' file in Amazon TranscribeAmazon Transcribe 中“.caf”文件的“媒体格式”
【发布时间】:2026-01-15 09:40:01
【问题描述】:

我有一个使用 expo-av 库捕获音频的 React Native (Expo) 应用程序。

然后它将音频文件上传到 Amazon S3,然后在 Amazon Transcribe 中转录。

对于 Android,我将音频保存为“.m4a”文件,并将 Amazon Transcribe API 调用为:

transcribe_client.start_transcription_job(TranscriptionJobName = job_name,
                                          Media={'MediaFileUri' : file_uri},
                                          MediaFormat='mp4',
                                          LanguageCode='en-US')

从 iOS 设备上传的“MediaFormat”应该是什么,通常是“.caf”文件?

Amazon Transcribe 仅允许这些媒体格式

 MP3, MP4, WAV, FLAC, AMR, OGG, and WebM

【问题讨论】:

    标签: ios react-native expo aws-transcribe expo-av


    【解决方案1】:

    可能的解决方案:

    1. 创建一个 API 来为您完成转换。
      例如,您可以使用 FFMPEG python 库轻松创建一个。

    2. 使用已有的 API。
      通过使用cloudconvert API,您可以轻松转换文件,但前提是您需要付费。

    3. 使用不同的库来录制 IOS 音频。
      有一个名为 react-native-record-audio-ios 的模块完全是为 IOS 制作的,可以在 .caf.m4a.wav 中录制音频。

    4. 使用LAME api 转换它。
      正如here 所说,您可以通过创建native module 来将.caf 文件转换为.mp3 文件,这样可以运行:

    FILE *pcm = fopen("file.caf", "rb");
    FILE *mp3 = fopen("file.mp3", "wb");
    const int PCM_SIZE = 8192;
    const int MP3_SIZE = 8192;
    
    short int pcm_buffer[PCM_SIZE*2];
    unsigned char mp3_buffer[MP3_SIZE];
    
    lame_t lame = lame_init();
    lame_set_in_samplerate(lame, 44100);
    lame_set_VBR(lame, vbr_default);
    lame_init_params(lame);
    
    do {
      read = fread(pcm_buffer, 2*sizeof(short int), PCM_SIZE, pcm);
      if (read == 0)
        write = lame_encode_flush(lame, mp3_buffer, MP3_SIZE);
      else
        write = lame_encode_buffer_interleaved(lame, pcm_buffer, read, mp3_buffer, MP3_SIZE);
      fwrite(mp3_buffer, write, 1, mp3);
    } while (read != 0);
    
    lame_close(lame);
    fclose(mp3);
    fclose(pcm);
    
    1. 创建一个运行 this objective-c 代码的本机模块:
    -(void) convertToWav
    {
    // set up an AVAssetReader to read from the iPod Library
    
    NSString *cafFilePath=[[NSBundle mainBundle]pathForResource:@"test" ofType:@"caf"];
    
    NSURL *assetURL = [NSURL fileURLWithPath:cafFilePath];
    AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
    
    NSError *assetError = nil;
    AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:songAsset
                                                               error:&assetError]
    ;
    if (assetError) {
        NSLog (@"error: %@", assetError);
        return;
    }
    
    AVAssetReaderOutput *assetReaderOutput = [AVAssetReaderAudioMixOutput
                                              assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
                                              audioSettings: nil];
    if (! [assetReader canAddOutput: assetReaderOutput]) {
        NSLog (@"can't add reader output... die!");
        return;
    }
    [assetReader addOutput: assetReaderOutput];
    
    NSString *title = @"MyRec";
    NSArray *docDirs = NSSearchPathForDirectoriesInDomains (NSDocumentDirectory, NSUserDomainMask, YES);
    NSString *docDir = [docDirs objectAtIndex: 0];
    NSString *wavFilePath = [[docDir stringByAppendingPathComponent :title]
                             stringByAppendingPathExtension:@"wav"];
    if ([[NSFileManager defaultManager] fileExistsAtPath:wavFilePath])
    {
        [[NSFileManager defaultManager] removeItemAtPath:wavFilePath error:nil];
    }
    NSURL *exportURL = [NSURL fileURLWithPath:wavFilePath];
    AVAssetWriter *assetWriter = [AVAssetWriter assetWriterWithURL:exportURL
                                                          fileType:AVFileTypeWAVE
                                                             error:&assetError];
    if (assetError)
    {
        NSLog (@"error: %@", assetError);
        return;
    }
    
    AudioChannelLayout channelLayout;
    memset(&channelLayout, 0, sizeof(AudioChannelLayout));
    channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
    NSDictionary *outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
                                    [NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
                                    [NSNumber numberWithFloat:44100.0], AVSampleRateKey,
                                    [NSNumber numberWithInt:2], AVNumberOfChannelsKey,
                                    [NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)], AVChannelLayoutKey,
                                    [NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
                                    [NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
                                    [NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
                                    [NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
                                    nil];
    AVAssetWriterInput *assetWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
                                                                              outputSettings:outputSettings];
    if ([assetWriter canAddInput:assetWriterInput])
    {
        [assetWriter addInput:assetWriterInput];
    }
    else
    {
        NSLog (@"can't add asset writer input... die!");
        return;
    }
    
    assetWriterInput.expectsMediaDataInRealTime = NO;
    
    [assetWriter startWriting];
    [assetReader startReading];
    
    AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
    CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
    [assetWriter startSessionAtSourceTime: startTime];
    
    __block UInt64 convertedByteCount = 0;
    dispatch_queue_t mediaInputQueue = dispatch_queue_create("mediaInputQueue", NULL);
    
    [assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
                                            usingBlock: ^
     {
    
         while (assetWriterInput.readyForMoreMediaData)
         {
             CMSampleBufferRef nextBuffer = [assetReaderOutput copyNextSampleBuffer];
             if (nextBuffer)
             {
                 // append buffer
                 [assetWriterInput appendSampleBuffer: nextBuffer];
                 convertedByteCount += CMSampleBufferGetTotalSampleSize (nextBuffer);
                 CMTime progressTime = CMSampleBufferGetPresentationTimeStamp(nextBuffer);
    
                 CMTime sampleDuration = CMSampleBufferGetDuration(nextBuffer);
                 if (CMTIME_IS_NUMERIC(sampleDuration))
                     progressTime= CMTimeAdd(progressTime, sampleDuration);
                 float dProgress= CMTimeGetSeconds(progressTime) / CMTimeGetSeconds(songAsset.duration);
                 NSLog(@"%f",dProgress);
             }
             else
             {
    
                 [assetWriterInput markAsFinished];
                 //              [assetWriter finishWriting];
                 [assetReader cancelReading];
    
             }
         }
     }];
    }
    

    但是,正如here所说:

    因为 iPhone 不应该真正用于处理器密集型的事情,例如音频转换。

    所以我向您推荐第三种解决方案,因为它更简单,而且看起来不像 Iphone 处理器的密集任务。

    【讨论】:

    • 我尽量避免使用原生模块,因为我想要 Expo 的便利。
    • 我还可以尝试将 Expo 中的“RecordingOptions”设置为适用于 Android 的 m4a,以及适用于 iOS 的 AWS。