如何使用 Kinect 語音辨識 (語音助理 Mini Siri)
今天我們要透過 Kinect 語音辨識的功能,
來實作一個迷你版的 Siri (智慧語音助理系統)。
可以辨識使用者說的英文句子,且可以跟使用者對話。
(目前微軟尚未推出中文語音辨識套件><""")
註:
Siri 是內建在 iPhone 4S內的人工智慧助理軟體。此軟體使用到自然語言處理技術,使用者可以使用自然的對話與手機進行互動,完成搜尋資料、查詢天氣、設定手機日曆、設定鬧鈴等服務。(此定義來自: 维基百科 - Siri )
所以我們這一次學習目標是:
1.Kinect 語音辨識使用者說的句子
2.透過文字轉語音的功能(TTS,Text-to-speech),讓電腦可以跟你對話。
首先我們需先看一下,Kinect for Windows SDK Release Notes ,
裡面Audio & Speech的這一段:
KT重點翻譯一下:
1.Kinect SDK V1版加入最新語音組件,且提高識別準確率。
2.初始化語音組件時,需等待4秒鐘。
(所以等一下寫code時,我們要強制等待4秒鐘,再開始使用,原因在這裡)
// 在"SDK Release Notes"裡有提到:語音初始化,需要等待4秒才能就緒 this.readyTimer = new DispatcherTimer(); this.readyTimer.Tick += this.ReadyTimerTick; this.readyTimer.Interval = new TimeSpan(0, 0, 4);//等待4秒 this.readyTimer.Start();
而如何使用Kinect 語音辨識功能,
可以在 Kinect For Windows SDK V1 程式指南手冊,
裡搜尋「Speech C# How To」就可以看到完整微軟官方原文的教學與定義。
用功的同學們,KT這邊建議熟讀一下。
而這邊可以看到 Kinect 語音辨識建立的六大步驟:
1.加入參考語音辨識組件 (Add a reference to the speech recognition assembly)
2.初始化語音訊號 (Initialize the audio source)
3.初始化語音辨識 (Initialize speech recognition)
4.建立語音辨識引擎 (Create a speech recognition engine)
5.監聽使用者語音資料 (Listen to user speech)
6.回應使用者 (Respond to user speech)
所以只要掌握好這六點,迷你版的 Siri很快就可以實作出來了~
細節瑣碎的部分,麻煩大家參考說明文件或此範例程式碼註解部分。
再來我們來看一下,這一次KT設計的範例程式畫面:
- 加入參考語音辨識組件
"System.Speech.dll"=>文字轉語音)
- 建立語音辨識引擎(文法字句)
1. "I Love you !"
2. "What's your name ?"
3. "How are you ?"
所以語音辨識系統只認的這三句,而你可以隨自己喜好再增加
//=============================================== //建立文法字句 GrammarBuilder gBuilder = new GrammarBuilder(); gBuilder.Culture = ri.Culture; gBuilder.Append(new Choices("I", "What's", "How")); gBuilder.Append(new Choices("love", "your","are")); gBuilder.Append(new Choices("you", "name","you")); //=============================================== var g = new Grammar(gBuilder); sre.LoadGrammar(g);//載入文法字句
當然如果你要建立一隻完整的Siri,就是要建立一套語句資料庫,然後再匯進來。
- 回應使用者
case "I LOVE YOU": Siri_Text = "I love you too"; break; case "WHAT'S YOUR NAME": Siri_Text = "I am Mini Siri"; break; case "HOW ARE YOU": Siri_Text = "I am so good"; break; default: Siri_Text = "I don't know what you mean ?"; break;
- 文字轉成語音
此類別隸屬在System.Speech.dll,所以要記得加入這個參考。
//要加入參考"System.Speech" using System.Speech.Synthesis; private SpeechSynthesizer synthesizer;//文字轉語音 synthesizer = new SpeechSynthesizer();//宣告一個新的文字語音合成 //設定合成音量大小與講話速度 synthesizer.Volume = 100;//聲音大小(0 ~ 100) synthesizer.Rate = -2;//聲音速度(-10 ~ 10) Siri_Text="I love HKT" synthesizer.Speak(Siri_Text);//電腦喇叭,會唸出I love HKT
結果展示影片:
C# 完整程式碼:
using System; using System.Windows; using System.Windows.Media; using System.Windows.Media.Imaging; using Microsoft.Kinect; using Microsoft.Speech.AudioFormat; using Microsoft.Speech.Recognition; using System.IO; using System.Threading; using System.Linq; using System.Windows.Threading; using System.Speech.Synthesis; using System.Windows.Media.Animation; using System.Windows.Controls; namespace KinectMiniSiri_Demo { public partial class MainWindow : Window { //===變數宣告區=== KinectSensor sensor = KinectSensor.KinectSensors[0]; private SpeechRecognitionEngine speechRecognizer; private DispatcherTimer readyTimer; private SpeechSynthesizer synthesizer;//文字轉語音 private Storyboard my_sb; private String Siri_Text=null; public MainWindow() { InitializeComponent(); this.Loaded += new RoutedEventHandler(MainWindow_Loaded);//視窗開啟事件 this.Unloaded += new RoutedEventHandler(MainWindow_Unloaded);//視窗關閉事件 } //視窗關閉事件 void MainWindow_Unloaded(object sender, RoutedEventArgs e) { if (this.speechRecognizer != null && sensor != null) { sensor.AudioSource.Stop(); sensor.Stop(); this.speechRecognizer.RecognizeAsyncCancel(); this.speechRecognizer.RecognizeAsyncStop(); } if (this.readyTimer != null) { this.readyTimer.Stop(); this.readyTimer = null; } } //視窗開啟事件 void MainWindow_Loaded(object sender, RoutedEventArgs e) { sensor.Start();//開啟Kinect synthesizer = new SpeechSynthesizer();//宣告一個新的文字語音合成 Siri_Speech();//設定文字語音合成音量與速度 this.speechRecognizer = this.CreateSpeechRecognizer();//初始化語音辨識,建立文法字句 if (this.speechRecognizer != null && sensor != null) { // 在"SDK Release Notes"裡有提到:語音初始化,需要等待4秒才能就緒 this.readyTimer = new DispatcherTimer(); this.readyTimer.Tick += this.ReadyTimerTick; this.readyTimer.Interval = new TimeSpan(0, 0, 4);//等待4秒 this.readyTimer.Start(); this.ReportSpeechStatus("初始化語音串流中...(請稍後)"); this.UpdateInstructionsText(string.Empty); } } //建立語音辨識,建立文法字句 private SpeechRecognitionEngine CreateSpeechRecognizer() { RecognizerInfo ri = GetKinectRecognizer();//取得 Kinect 語音識別 if (ri == null) { MessageBox.Show( @"初始化語音識別有問題", "無法載入語音識別", MessageBoxButton.OK, MessageBoxImage.Error); this.Close(); return null; } SpeechRecognitionEngine sre;//建立語音識別引擎 try { sre = new SpeechRecognitionEngine(ri.Id); } catch { MessageBox.Show( @"初始化語音識別有問題", "無法載入語音識別", MessageBoxButton.OK, MessageBoxImage.Error); this.Close(); return null; } //======================================================== //建立文法字句 GrammarBuilder gBuilder = new GrammarBuilder(); gBuilder.Culture = ri.Culture; gBuilder.Append(new Choices("I", "What's", "How")); gBuilder.Append(new Choices("love", "your","are")); gBuilder.Append(new Choices("you", "name","you")); //=============================================== // Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gBuilder); sre.LoadGrammar(g);//載入文法字句 sre.SpeechRecognized += this.SreSpeechRecognized;//接受語音事件 sre.SpeechHypothesized += this.SreSpeechHypothesized;//推斷語音事件 sre.SpeechRecognitionRejected += this.SreSpeechRecognitionRejected;//拒絕語音事件 return sre; } //初始化語音辨識 private static RecognizerInfo GetKinectRecognizer() { FuncmatchingFunc = r => { string value; r.AdditionalInfo.TryGetValue("Kinect", out value); return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase); }; return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault(); } //===拒絕語音事件=== private void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { this.RejectSpeech(e.Result); } private void RejectSpeech(RecognitionResult result) { string status = "拒絕語句: " + (result == null ? string.Empty : result.Text + " 肯定度:" + result.Confidence); this.ReportSpeechStatus(status); Animation_Start(); } //推斷語音事件 private void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e) { this.ReportSpeechStatus("推斷語句: " + e.Result.Text + " 肯定度:" + e.Result.Confidence); Animation_Start(); } //接受語音事件 private void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e) { if (e.Result.Confidence < 0.6)//肯定度低於0.6,判為錯誤語句 { this.RejectSpeech(e.Result); return; } switch (e.Result.Text.ToUpperInvariant()) { case "I LOVE YOU": Siri_Text = "I love you too"; break; case "WHAT'S YOUR NAME": Siri_Text = "I am Mini Siri"; break; case "HOW ARE YOU": Siri_Text = "I am so good"; break; default: Siri_Text = "I don't know what you mean ?"; break; } ////Siri圖案動畫 Animation_Start(); string status = "You: " + e.Result.Text + "\n Siri: " + Siri_Text + "\n==============="; listBox.Items.Add(status); synthesizer.Speak(Siri_Text); } //文字合成音 void Siri_Speech() { synthesizer.Volume = 100;//聲音大小(0 ~ 100) synthesizer.Rate = -2;//聲音速度(-10 ~ 10) } //目前語音狀態顯示 private void ReportSpeechStatus(string status) { Dispatcher.BeginInvoke(new Action(() => { tbSpeechStatus.Text = status; }), DispatcherPriority.Normal); } private void UpdateInstructionsText(string instructions) { Dispatcher.BeginInvoke(new Action(() => { tbTips.Text = instructions; }), DispatcherPriority.Normal); } //播放Siri圖案動畫動畫 private void Animation_Start() { Dispatcher.BeginInvoke(new Action(() => { my_sb = (Storyboard)this.FindResource("SiriStoryboard"); my_sb.Begin(this); }), DispatcherPriority.Normal); } private void ReadyTimerTick(object sender, EventArgs e) { this.Start();//讀取使用者語音 this.ReportSpeechStatus("語音識別裝置已就緒"); this.UpdateInstructionsText("提示:目前只有英文語音" + "\n1. I Love you" + "\n2. What's your name" + "\n3. How are you"); this.readyTimer.Stop(); this.readyTimer = null; } //初始化語音訊號 private void Start() { var audioSource = sensor.AudioSource; audioSource.EchoCancellationMode = EchoCancellationMode.None; // No AEC for this sample audioSource.AutomaticGainControlEnabled = false; // Important to turn this off for speech recognition var kinectStream = audioSource.Start();//開啟Kinect語音串流 Stream s = kinectStream; this.speechRecognizer.SetInputToAudioStream( s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); this.speechRecognizer.RecognizeAsync(RecognizeMode.Multiple); } //自動捲動 listBox至底 private void m_cStatusList_ScrollChanged(object sender, ScrollChangedEventArgs e) { if (e.ExtentHeightChange > 0.0) ((ScrollViewer)e.OriginalSource).ScrollToEnd(); } } }
範例程式碼下載:
相關文章參考:
HKT線上教學教室 - Kinect 教學目錄
微軟官方Kinect SDK V1 內附語音相關範例程式(共三個):
1.Microsoft_Sample_KinectAudioDemo (圖形化-語音辯位與辨識)
2.Microsoft_Sample_RecordAudio (文字模式-語音辯位與辨識)
3.Microsoft_Sample_Speech (語音辨識)