OpenAI 的语音识别 Whisper 很厉害，于是尝试为 Youtube 生成字幕

介绍

感谢所有观看最后一篇文章的人，感谢那些喜欢和收藏它的人！
在不知不觉中，它在qiita的趋势中排名第三♪（截至2022/09/24）
趁着这个势头，这次我想尝试一下“自动为视频生成字幕”，这是我在找到 Whisper 时就想做的。

[参考]
官方网站：https://openai.com/blog/whisper
纸：https://cdn.openai.com/papers/whisper.pdf
GitHub：https://github.com/openai/whisper

上一篇文章

你想做的事

我想为上传到 Youtube 的视频自动生成字幕！

作为在 Youtube 上创建视频时的工作，有字幕创建和字幕生成。
我有时会使用premire pro和vrew之类的软件来转录文本，但是当涉及到日语时，准确性有时会很低。
我认为我可以在这种情况下使用它，用耳语转录是。

以下是我用来帮助我实现这一目标的参考资料。

Whisper 不仅可以转录 mp3 和 wav，还可以直接转录 mp4，所以这次我将尝试使用 mp4 数据创建字幕！

执行

提前准备

这次我在content文件夹下上传了一个2分32秒的视频文件，名为001.mp4！
OpenAIの音声認識Whisperがすごいので，Youtube用に字幕生成してみた

另外，像以前一样设置耳语基础模型。运行时类型设置为 GPU。

# ! pip install git+https://github.com/openai/whisper.git

import whisper
model = whisper.load_model("base")

接下来，要将写入字幕的文本文件保存到文件夹中，下载创建一个名为 .

import os

# Add directory into content folder
checkDownLoadFolder = os.path.exists("download")
if not checkDownLoadFolder:
  os.mkdir("download")

语音识别

这是一个简单的逻辑，将这次准备的所有音频数据都转录下来。

fileName = "001.mp4"

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio(f"/content/{fileName}")

outputTextsArr = []
while audio.size > 0:
  tirmedAudio = whisper.pad_or_trim(audio)
  # trimedArray.append(tirmedAudio)
  startIdx = tirmedAudio.size
  audio = audio[startIdx:]

  # make log-Mel spectrogram and move to the same device as the model
  mel = whisper.log_mel_spectrogram(tirmedAudio).to(model.device)

  # detect the spoken language
  _, probs = model.detect_language(mel)
  # print(f"Detected language: {max(probs, key=probs.get)}")

  # decode the audio
  options = whisper.DecodingOptions()
  result = whisper.decode(model, mel, options)

  # print the recognized text
  outputTextsArr.append(result.text)

outputTexts = ' '.join(outputTextsArr)
print(outputTexts)

# Write into a text file
with open(f"download/{fileName}.txt", "w") as f:
  f.write(f"▼ Transcription of {fileName}\n")
  f.write(outputTexts)

因此，现在可以以 30 秒的间隔转录所有内容。
此外，我能够以指定的文件名（fileName =“001.mp4”）导出到下载文件夹。

from google.colab import files

!zip -r download.zip download
files.download("download.zip")

最后将其下载为 zip 文件，您就完成了。您可以获得转录视频音频的 txt 文件。

结果

作为这次的样本数据，我使用了某位喜欢狼人的 YouTuber 档案的一部分。
OpenAIの音声認識Whisperがすごいので，Youtube用に字幕生成してみた

剩下要做的就是将此数据输入到 YouTube Studio 的字幕中，您就完成了！取决于 YouTuber 的姓名、术语和语速，很难正确转录，但能够直接从视频文件中转录全文，这在以前只能以 30 秒的间隔进行转录，这是开创性的。不是！

下一篇文章见！

原创声明：本文系作者授权爱码网发表，未经许可，不得转载;

原文地址：https://www.likecs.com/show-308626948.html