如何使用 Kinect 語音辨識 (語音助理 Mini Siri)

今天我們要透過 Kinect 語音辨識的功能,
來實作一個迷你版的 Siri (智慧語音助理系統)。
可以辨識使用者說的英文句子,且可以跟使用者對話。
(目前微軟尚未推出中文語音辨識套件><""")
註:
Siri 是內建在 iPhone 4S內的人工智慧助理軟體。此軟體使用到自然語言處理技術,使用者可以使用自然的對話與手機進行互動,完成搜尋資料、查詢天氣、設定手機日曆、設定鬧鈴等服務。(此定義來自: 维基百科 - Siri )
所以我們這一次學習目標是:
1.Kinect 語音辨識使用者說的句子
2.透過文字轉語音的功能(TTS,Text-to-speech),讓電腦可以跟你對話。
首先我們需先看一下,Kinect for Windows SDK Release Notes ,
裡面Audio & Speech的這一段:

KT重點翻譯一下:
1.Kinect SDK V1版加入最新語音組件,且提高識別準確率。
2.初始化語音組件時,需等待4秒鐘。
(所以等一下寫code時,我們要強制等待4秒鐘,再開始使用,原因在這裡)
// 在"SDK Release Notes"裡有提到:語音初始化,需要等待4秒才能就緒 this.readyTimer = new DispatcherTimer(); this.readyTimer.Tick += this.ReadyTimerTick; this.readyTimer.Interval = new TimeSpan(0, 0, 4);//等待4秒 this.readyTimer.Start();
而如何使用Kinect 語音辨識功能,
可以在 Kinect For Windows SDK V1 程式指南手冊,
裡搜尋「Speech C# How To」就可以看到完整微軟官方原文的教學與定義。

用功的同學們,KT這邊建議熟讀一下。
而這邊可以看到 Kinect 語音辨識建立的六大步驟:
1.加入參考語音辨識組件 (Add a reference to the speech recognition assembly)
2.初始化語音訊號 (Initialize the audio source)
3.初始化語音辨識 (Initialize speech recognition)
4.建立語音辨識引擎 (Create a speech recognition engine)
5.監聽使用者語音資料 (Listen to user speech)
6.回應使用者 (Respond to user speech)
所以只要掌握好這六點,迷你版的 Siri很快就可以實作出來了~
細節瑣碎的部分,麻煩大家參考說明文件或此範例程式碼註解部分。
再來我們來看一下,這一次KT設計的範例程式畫面:

- 加入參考語音辨識組件
"System.Speech.dll"=>文字轉語音)
- 建立語音辨識引擎(文法字句)
1. "I Love you !"
2. "What's your name ?"
3. "How are you ?"
所以語音辨識系統只認的這三句,而你可以隨自己喜好再增加
//===============================================
//建立文法字句
GrammarBuilder gBuilder = new GrammarBuilder();
gBuilder.Culture = ri.Culture;
gBuilder.Append(new Choices("I", "What's", "How"));
gBuilder.Append(new Choices("love", "your","are"));
gBuilder.Append(new Choices("you", "name","you"));
//===============================================
var g = new Grammar(gBuilder);
sre.LoadGrammar(g);//載入文法字句
當然如果你要建立一隻完整的Siri,就是要建立一套語句資料庫,然後再匯進來。
- 回應使用者
case "I LOVE YOU": Siri_Text = "I love you too"; break; case "WHAT'S YOUR NAME": Siri_Text = "I am Mini Siri"; break; case "HOW ARE YOU": Siri_Text = "I am so good"; break; default: Siri_Text = "I don't know what you mean ?"; break;
- 文字轉成語音
此類別隸屬在System.Speech.dll,所以要記得加入這個參考。
//要加入參考"System.Speech" using System.Speech.Synthesis; private SpeechSynthesizer synthesizer;//文字轉語音 synthesizer = new SpeechSynthesizer();//宣告一個新的文字語音合成 //設定合成音量大小與講話速度 synthesizer.Volume = 100;//聲音大小(0 ~ 100) synthesizer.Rate = -2;//聲音速度(-10 ~ 10) Siri_Text="I love HKT" synthesizer.Speak(Siri_Text);//電腦喇叭,會唸出I love HKT
結果展示影片:
C# 完整程式碼:
using System;
using System.Windows;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using Microsoft.Kinect;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;
using System.IO;
using System.Threading;
using System.Linq;
using System.Windows.Threading;
using System.Speech.Synthesis;
using System.Windows.Media.Animation;
using System.Windows.Controls;
namespace KinectMiniSiri_Demo
{
public partial class MainWindow : Window
{
//===變數宣告區===
KinectSensor sensor = KinectSensor.KinectSensors[0];
private SpeechRecognitionEngine speechRecognizer;
private DispatcherTimer readyTimer;
private SpeechSynthesizer synthesizer;//文字轉語音
private Storyboard my_sb;
private String Siri_Text=null;
public MainWindow()
{
InitializeComponent();
this.Loaded += new RoutedEventHandler(MainWindow_Loaded);//視窗開啟事件
this.Unloaded += new RoutedEventHandler(MainWindow_Unloaded);//視窗關閉事件
}
//視窗關閉事件
void MainWindow_Unloaded(object sender, RoutedEventArgs e)
{
if (this.speechRecognizer != null && sensor != null)
{
sensor.AudioSource.Stop();
sensor.Stop();
this.speechRecognizer.RecognizeAsyncCancel();
this.speechRecognizer.RecognizeAsyncStop();
}
if (this.readyTimer != null)
{
this.readyTimer.Stop();
this.readyTimer = null;
}
}
//視窗開啟事件
void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
sensor.Start();//開啟Kinect
synthesizer = new SpeechSynthesizer();//宣告一個新的文字語音合成
Siri_Speech();//設定文字語音合成音量與速度
this.speechRecognizer = this.CreateSpeechRecognizer();//初始化語音辨識,建立文法字句
if (this.speechRecognizer != null && sensor != null)
{
// 在"SDK Release Notes"裡有提到:語音初始化,需要等待4秒才能就緒
this.readyTimer = new DispatcherTimer();
this.readyTimer.Tick += this.ReadyTimerTick;
this.readyTimer.Interval = new TimeSpan(0, 0, 4);//等待4秒
this.readyTimer.Start();
this.ReportSpeechStatus("初始化語音串流中...(請稍後)");
this.UpdateInstructionsText(string.Empty);
}
}
//建立語音辨識,建立文法字句
private SpeechRecognitionEngine CreateSpeechRecognizer()
{
RecognizerInfo ri = GetKinectRecognizer();//取得 Kinect 語音識別
if (ri == null)
{
MessageBox.Show(
@"初始化語音識別有問題",
"無法載入語音識別",
MessageBoxButton.OK,
MessageBoxImage.Error);
this.Close();
return null;
}
SpeechRecognitionEngine sre;//建立語音識別引擎
try
{
sre = new SpeechRecognitionEngine(ri.Id);
}
catch
{
MessageBox.Show(
@"初始化語音識別有問題",
"無法載入語音識別",
MessageBoxButton.OK,
MessageBoxImage.Error);
this.Close();
return null;
}
//========================================================
//建立文法字句
GrammarBuilder gBuilder = new GrammarBuilder();
gBuilder.Culture = ri.Culture;
gBuilder.Append(new Choices("I", "What's", "How"));
gBuilder.Append(new Choices("love", "your","are"));
gBuilder.Append(new Choices("you", "name","you"));
//===============================================
// Create the actual Grammar instance, and then load it into the speech recognizer.
var g = new Grammar(gBuilder);
sre.LoadGrammar(g);//載入文法字句
sre.SpeechRecognized += this.SreSpeechRecognized;//接受語音事件
sre.SpeechHypothesized += this.SreSpeechHypothesized;//推斷語音事件
sre.SpeechRecognitionRejected += this.SreSpeechRecognitionRejected;//拒絕語音事件
return sre;
}
//初始化語音辨識
private static RecognizerInfo GetKinectRecognizer()
{
Func matchingFunc = r =>
{
string value;
r.AdditionalInfo.TryGetValue("Kinect", out value);
return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
};
return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
}
//===拒絕語音事件===
private void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
this.RejectSpeech(e.Result);
}
private void RejectSpeech(RecognitionResult result)
{
string status = "拒絕語句: " + (result == null ? string.Empty : result.Text + " 肯定度:" + result.Confidence);
this.ReportSpeechStatus(status);
Animation_Start();
}
//推斷語音事件
private void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
this.ReportSpeechStatus("推斷語句: " + e.Result.Text + " 肯定度:" + e.Result.Confidence);
Animation_Start();
}
//接受語音事件
private void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Confidence < 0.6)//肯定度低於0.6,判為錯誤語句
{
this.RejectSpeech(e.Result);
return;
}
switch (e.Result.Text.ToUpperInvariant())
{
case "I LOVE YOU":
Siri_Text = "I love you too";
break;
case "WHAT'S YOUR NAME":
Siri_Text = "I am Mini Siri";
break;
case "HOW ARE YOU":
Siri_Text = "I am so good";
break;
default:
Siri_Text = "I don't know what you mean ?";
break;
}
////Siri圖案動畫
Animation_Start();
string status = "You: " + e.Result.Text + "\n Siri: " + Siri_Text + "\n===============";
listBox.Items.Add(status);
synthesizer.Speak(Siri_Text);
}
//文字合成音
void Siri_Speech()
{
synthesizer.Volume = 100;//聲音大小(0 ~ 100)
synthesizer.Rate = -2;//聲音速度(-10 ~ 10)
}
//目前語音狀態顯示
private void ReportSpeechStatus(string status)
{
Dispatcher.BeginInvoke(new Action(() => { tbSpeechStatus.Text = status; }), DispatcherPriority.Normal);
}
private void UpdateInstructionsText(string instructions)
{
Dispatcher.BeginInvoke(new Action(() => { tbTips.Text = instructions; }), DispatcherPriority.Normal);
}
//播放Siri圖案動畫動畫
private void Animation_Start()
{
Dispatcher.BeginInvoke(new Action(() =>
{
my_sb = (Storyboard)this.FindResource("SiriStoryboard");
my_sb.Begin(this);
}), DispatcherPriority.Normal);
}
private void ReadyTimerTick(object sender, EventArgs e)
{
this.Start();//讀取使用者語音
this.ReportSpeechStatus("語音識別裝置已就緒");
this.UpdateInstructionsText("提示:目前只有英文語音" + "\n1. I Love you" + "\n2. What's your name" + "\n3. How are you");
this.readyTimer.Stop();
this.readyTimer = null;
}
//初始化語音訊號
private void Start()
{
var audioSource = sensor.AudioSource;
audioSource.EchoCancellationMode = EchoCancellationMode.None; // No AEC for this sample
audioSource.AutomaticGainControlEnabled = false; // Important to turn this off for speech recognition
var kinectStream = audioSource.Start();//開啟Kinect語音串流
Stream s = kinectStream;
this.speechRecognizer.SetInputToAudioStream(
s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
this.speechRecognizer.RecognizeAsync(RecognizeMode.Multiple);
}
//自動捲動 listBox至底
private void m_cStatusList_ScrollChanged(object sender, ScrollChangedEventArgs e)
{
if (e.ExtentHeightChange > 0.0)
((ScrollViewer)e.OriginalSource).ScrollToEnd();
}
}
}
範例程式碼下載:
相關文章參考:
HKT線上教學教室 - Kinect 教學目錄
微軟官方Kinect SDK V1 內附語音相關範例程式(共三個):
1.Microsoft_Sample_KinectAudioDemo (圖形化-語音辯位與辨識)

2.Microsoft_Sample_RecordAudio (文字模式-語音辯位與辨識)

3.Microsoft_Sample_Speech (語音辨識)