使用 PocketSphinx 识别多个关键字答案

【问题标题】：Recognizing multiple keywords using PocketSphinx使用 PocketSphinx 识别多个关键字
【发布时间】：2014-09-09 15:11:29
【问题描述】：

我已经安装了 PocketSphinx 演示，它在 Ubuntu 和 Eclipse 下运行良好，但尽管尝试了，但我无法弄清楚如何添加对多个单词的识别。

我想要的只是让代码识别单个单词，然后我可以在代码中switch()，例如“上下左右”。我不想识别句子，只识别单个单词。

对此的任何帮助将不胜感激。我发现其他用户也有类似的问题，但到目前为止没有人知道答案。

令我困惑的一件事是为什么我们需要使用“唤醒”常量？

private static final String KWS_SEARCH = "wakeup";
private static final String KEYPHRASE = "oh mighty computer";
.
.
.
recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

wakeup 与什么有什么关系？

我已经取得了一些进展（？）：使用addGrammarSearch 我可以使用.gram 文件来列出我的话，例如up,down,left,right,forwards,backwards，如果我说的只是那些特定的词，这似乎很有效。但是，任何其他词都会导致系统将所说的内容与所说的“最接近”的词相匹配。理想情况下，如果说出的单词不在.gram 文件中，我不希望发生识别...

【问题讨论】：

我读了这个问题，但我找不到我的答案。我也做了很多搜索。请问大家有什么可以帮助我的，请看stackoverflow.com/q/37629636/3671748
我读过这个，但我的问题是如何定义新的 KEYWORD -e.g.我的手机——也是。玩具请检查我的问题吗？ stackoverflow.com/q/37629636/3671748
你能帮帮我吗？：stackoverflow.com/questions/39506271/…

标签： android speech-recognition cmusphinx

【解决方案1】：

感谢 Nikolay 的提示（请参阅上面的答案），我开发了以下代码，该代码运行良好，除非它们在列表中，否则无法识别单词。您可以将其直接复制并粘贴到 PocketSphinxDemo 代码中的主类上：

public class PocketSphinxActivity extends Activity implements RecognitionListener
{
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

@Override
public void onCreate(Bundle state)
{
    super.onCreate(state);

    setContentView(R.layout.main);

    ((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer");

    try
    {
        Assets assets = new Assets(PocketSphinxActivity.this);
        File assetDir = assets.syncAssets();
        setupRecognizer(assetDir);
    }
    catch (IOException e)
    {
        // oops
    }

    ((TextView) findViewById(R.id.caption_text)).setText("Say up, down, left, right, forwards, backwards");

    reset();
}

@Override
public void onPartialResult(Hypothesis hypothesis)
{
}

@Override
public void onResult(Hypothesis hypothesis)
{
    ((TextView) findViewById(R.id.result_text)).setText("");

    if (hypothesis != null)
    {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
    }
}

@Override
public void onBeginningOfSpeech()
{
}

@Override
public void onEndOfSpeech()
{
    reset();
}

private void setupRecognizer(File assetsDir)
{
    File modelsDir = new File(assetsDir, "models");

    recognizer = defaultSetup().setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
                               .setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
                               .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
                               .getRecognizer();

    recognizer.addListener(this);

    File digitsGrammar = new File(modelsDir, "grammar/digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}

private void reset()
{
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
}

您的digits.gram 文件应该类似于：

up /1e-1/
down /1e-1/
left /1e-1/
right /1e-1/
forwards /1e-1/
backwards /1e-1/

您应该试验双斜杠// 中的阈值以获得性能，其中1e-1 代表0.1（我认为）。我认为最大值是1.0。

现在是下午 5.30，所以我现在可以停止工作了。结果。

【讨论】：

谢谢哥们！！这些行有所不同，我没有看到 addKeywordSearch（不添加关键字搜索，oin 复数）： File digitsGrammar = new File(modelsDir, "grammar/digits.gram");识别器.addKeywordSearch（DIGITS_SEARCH，digitsGrammar）； } 私人无效重置（） { 识别器。停止（）；识别器.startListening(DIGITS_SEARCH); } }
@pbs：感谢您分享您的解决方案，它对我帮助很大！我有一个问题。您修改后的 digits.gram 是否包含其他任何内容，或者仅包含带有 // 的关键字？因为我在尝试打开和解析 digits.gram 文件时遇到了异常。
你可以试试up /1/ down /1/ left /1/ right /1/，在/1/的后面加上回车。
现在它运行了，但我仍然有问题，如果我说一些完全不同的东西，它不在我的语法文件中，它仍然会尝试匹配最接近的匹配，因此无论我说什么我都会得到匹配, 这不是太用户友好。这就是我的 digits.gram 文件的样子：#JSGF V1.0;语法数字；公共 = /1/ 开始 | /1/ 停止 | /1/ 框架；
我发现我的错误...我没有使用“addKeywordSearch”，我使用的是 addGrammarSearch...现在我将我的语法文件更改为您在上面的帖子中所拥有的内容并且它运行了。 ..但不幸的是，我仍然得到假阳性结果......所以如果我说的话，即使我说的完全不同，也会总是匹配。

【解决方案2】：

您可以使用addKeywordSearch 用于归档关键短语。每行一个短语，每个短语在 // 中都有阈值，例如

up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/

必须选择阈值以避免误报。

【讨论】：

你能分享你的 .gram 文件中的整个文本吗？我觉得缺少了其他东西。我是语法文件的新手。
没有什么要更新的，这个文件是一个关键字发现文件，你不应该添加任何东西。而且它不是语法文件，语法不同。要了解关键字发现，请访问 CMUSphinx 页面cmusphinx.sourceforge.net/wiki/tutoriallm
假设我使用带有 pocketsphinx_continuous 这样的文件，我将使用-kws 提供文件路径。然后我可以使用 cmudict-en-us.dict 和包含的 16 位 PTM en-us ARPA 模型吗？如果我为这 5 个单词创建一个新词典，准确性会提高吗？
en-us-ptm 是一个声学模型，它不是 arpa 模型。它是 16khz，而不是 16 位。创建新字典不会提高准确性，尽管它可能会为您节省一些内存（大约 3mb）。
阈值取决于单词，为了获得最佳检测，您需要使用特定于单词的阈值。由于单词“forwards”有两个音节，它很可能需要不同的阈值。如果您愿意，可以使用 0.1。

【解决方案3】：

正在更新对 PocketSphinx 演示的 Antinous 修正，以使其能够在 Android Studio 上运行。这是我目前所拥有的，

//Note: change MainActivity to PocketSphinxActivity for demo use...
public class MainActivity extends Activity implements RecognitionListener {
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

/* Used to handle permission request */
private static final int PERMISSIONS_REQUEST_RECORD_AUDIO = 1;

@Override
public void onCreate(Bundle state) {
    super.onCreate(state);

    setContentView(R.layout.main);
    ((TextView) findViewById(R.id.caption_text))
            .setText("Preparing the recognizer");

    // Check if user has given permission to record audio
    int permissionCheck = ContextCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.RECORD_AUDIO);
    if (permissionCheck != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, PERMISSIONS_REQUEST_RECORD_AUDIO);
        return;
    }

    new AsyncTask<Void, Void, Exception>() {
        @Override
        protected Exception doInBackground(Void... params) {
            try {
                Assets assets = new Assets(MainActivity.this);
                File assetDir = assets.syncAssets();
                setupRecognizer(assetDir);
            } catch (IOException e) {
                return e;
            }
            return null;
        }
        @Override
        protected void onPostExecute(Exception result) {
            if (result != null) {
                ((TextView) findViewById(R.id.caption_text))
                        .setText("Failed to init recognizer " + result);
            } else {
                reset();
            }
        }
    }.execute();
    ((TextView) findViewById(R.id.caption_text)).setText("Say one, two, three, four, five, six...");
}

/**
 * In partial result we get quick updates about current hypothesis. In
 * keyword spotting mode we can react here, in other modes we need to wait
 * for final result in onResult.
 */

@Override
public void onPartialResult(Hypothesis hypothesis) {
    if (hypothesis == null) {
        return;
    } else if (hypothesis != null) {
        if (recognizer != null) {
            //recognizer.rapidSphinxPartialResult(hypothesis.getHypstr());
            String text = hypothesis.getHypstr();
            if (text.equals(DIGITS_SEARCH)) {
                recognizer.cancel();
                performAction();
                recognizer.startListening(DIGITS_SEARCH);
            }else{
                //Toast.makeText(getApplicationContext(),"Partial result = " +text,Toast.LENGTH_SHORT).show();
            }
        }
    }
}
@Override
public void onResult(Hypothesis hypothesis) {
    ((TextView) findViewById(R.id.result_text)).setText("");
    if (hypothesis != null) {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), "Hypothesis" +text, Toast.LENGTH_SHORT).show();
    }else if(hypothesis == null){
        makeText(getApplicationContext(), "hypothesis = null", Toast.LENGTH_SHORT).show();
    }
}
@Override
public void onDestroy() {
    super.onDestroy();
    recognizer.cancel();
    recognizer.shutdown();
}
@Override
public void onBeginningOfSpeech() {
}
@Override
public void onEndOfSpeech() {
   reset();
}
@Override
public void onTimeout() {
}
private void setupRecognizer(File assetsDir) throws IOException {
    // The recognizer can be configured to perform multiple searches
    // of different kind and switch between them
    recognizer = defaultSetup()
            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
            .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
            // .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
            .getRecognizer();
    recognizer.addListener(this);

    File digitsGrammar = new File(assetsDir, "digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}
private void reset(){
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
@Override
public void onError(Exception error) {
    ((TextView) findViewById(R.id.caption_text)).setText(error.getMessage());
}

public void performAction() {
    // do here whatever you want
    makeText(getApplicationContext(), "performAction done... ", Toast.LENGTH_SHORT).show();
}
}

请注意：这是一项正在进行的工作。过一会再来检查。建议将不胜感激。

【讨论】：