如何使用 Kinect 語音辨識 (語音助理 Mini Siri)

Photobucket

今天我們要透過 Kinect 語音辨識的功能,
來實作一個迷你版的 Siri (智慧語音助理系統)。


可以辨識使用者說的英文句子,且可以跟使用者對話。
(目前微軟尚未推出中文語音辨識套件><""")

註:
Siri 是內建在 iPhone 4S內的人工智慧助理軟體。此軟體使用到自然語言處理技術,使用者可以使用自然的對話與手機進行互動,完成搜尋資料、查詢天氣、設定手機日曆、設定鬧鈴等服務。(此定義來自: 维基百科 - Siri )


所以我們這一次學習目標是:
1.Kinect 語音辨識使用者說的句子
2.透過文字轉語音的功能(TTS,Text-to-speech),讓電腦可以跟你對話。


首先我們需先看一下,Kinect for Windows SDK Release Notes ,

裡面Audio & Speech的這一段:

Photobucket

KT重點翻譯一下:
1.Kinect SDK V1版加入最新語音組件,且提高識別準確率。
2.初始化語音組件時,需等待4秒鐘。
(所以等一下寫code時,我們要強制等待4秒鐘,再開始使用,原因在這裡)
// 在"SDK Release Notes"裡有提到:語音初始化,需要等待4秒才能就緒
this.readyTimer = new DispatcherTimer();
this.readyTimer.Tick += this.ReadyTimerTick;
this.readyTimer.Interval = new TimeSpan(0, 0, 4);//等待4秒
this.readyTimer.Start();   


而如何使用Kinect 語音辨識功能,
可以在 Kinect For Windows SDK V1  程式指南手冊
裡搜尋「Speech C# How To」就可以看到完整微軟官方原文的教學與定義。

Photobucket

用功的同學們,KT這邊建議熟讀一下。




而這邊可以看到 Kinect 語音辨識建立的六大步驟:

1.加入參考語音辨識組件 (Add a reference to the speech recognition assembly)
2.初始化語音訊號 (Initialize the audio source)
3.初始化語音辨識 (Initialize speech recognition)
4.建立語音辨識引擎 (Create a speech recognition engine)
5.監聽使用者語音資料 (Listen to user speech)
6.回應使用者 (Respond to user speech)

所以只要掌握好這六點,迷你版的 Siri很快就可以實作出來了~

細節瑣碎的部分,麻煩大家參考說明文件或此範例程式碼註解部分。




再來我們來看一下,這一次KT設計的範例程式畫面:
Photobucket




  • 加入參考語音辨識組件
(加入參考: "Microsoft.Speech.dll"=>語音辨識 和
"System.Speech.dll"=>文字轉語音)
Photobucket



  • 建立語音辨識引擎(文法字句)
KT在這一範例只建立三個句子:

1. "I Love you !"
2. "What's your name ?"
3. "How are you ?"

所以語音辨識系統只認的這三句,而你可以隨自己喜好再增加

//===============================================
//建立文法字句
GrammarBuilder gBuilder = new GrammarBuilder();
gBuilder.Culture = ri.Culture;

gBuilder.Append(new Choices("I", "What's", "How"));
gBuilder.Append(new Choices("love", "your","are"));
gBuilder.Append(new Choices("you", "name","you"));
//===============================================

var g = new Grammar(gBuilder);
sre.LoadGrammar(g);//載入文法字句

當然如果你要建立一隻完整的Siri,就是要建立一套語句資料庫,然後再匯進來。



  • 回應使用者
如:收到"I love you",Siri 回答"I love you too"

case "I LOVE YOU":
 Siri_Text = "I love you too";                                       
 break;
case "WHAT'S YOUR NAME":
 Siri_Text = "I am Mini Siri";                                      
 break;
case "HOW ARE YOU":
 Siri_Text = "I am so good";
 break;
default:
 Siri_Text = "I don't know what you mean ?";                              
 break;




  • 文字轉成語音
KT這邊使用SpeechSynthesizer 讓文字可以轉成語音,

此類別隸屬在System.Speech.dll,所以要記得加入這個參考。
//要加入參考"System.Speech"

using System.Speech.Synthesis;

private SpeechSynthesizer synthesizer;//文字轉語音

synthesizer = new SpeechSynthesizer();//宣告一個新的文字語音合成

//設定合成音量大小與講話速度
synthesizer.Volume = 100;//聲音大小(0 ~ 100)      
synthesizer.Rate = -2;//聲音速度(-10 ~ 10)

Siri_Text="I love HKT"
synthesizer.Speak(Siri_Text);//電腦喇叭,會唸出I love HKT


結果展示影片:




XAML CODE 完整程式碼:

 
  
   
    
    
    
    
   
  
 
 
  
   
  
 
    
        
        
        
         
          
           
           
           
           
          
         
        
        
    






C# 完整程式碼:
using System;
using System.Windows;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using Microsoft.Kinect;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;
using System.IO;
using System.Threading;
using System.Linq;
using System.Windows.Threading;
using System.Speech.Synthesis;
using System.Windows.Media.Animation;
using System.Windows.Controls;


namespace KinectMiniSiri_Demo
{
    public partial class MainWindow : Window
    {
        //===變數宣告區===
        KinectSensor sensor = KinectSensor.KinectSensors[0];
        private SpeechRecognitionEngine speechRecognizer;
        private DispatcherTimer readyTimer;
        private SpeechSynthesizer synthesizer;//文字轉語音
        private Storyboard my_sb;
        private String Siri_Text=null;
        
        public MainWindow()
        {
            InitializeComponent();
           
            this.Loaded += new RoutedEventHandler(MainWindow_Loaded);//視窗開啟事件
            this.Unloaded += new RoutedEventHandler(MainWindow_Unloaded);//視窗關閉事件
        }

        //視窗關閉事件
        void MainWindow_Unloaded(object sender, RoutedEventArgs e)
        {            
            if (this.speechRecognizer != null && sensor != null)
            {
                sensor.AudioSource.Stop();
                sensor.Stop();
                this.speechRecognizer.RecognizeAsyncCancel();
                this.speechRecognizer.RecognizeAsyncStop();
            }

            if (this.readyTimer != null)
            {
                this.readyTimer.Stop();
                this.readyTimer = null;
            }
        }

        //視窗開啟事件
        void MainWindow_Loaded(object sender, RoutedEventArgs e)
        {
            sensor.Start();//開啟Kinect
            synthesizer = new SpeechSynthesizer();//宣告一個新的文字語音合成
            Siri_Speech();//設定文字語音合成音量與速度

            this.speechRecognizer = this.CreateSpeechRecognizer();//初始化語音辨識,建立文法字句

            if (this.speechRecognizer != null && sensor != null)
            {
                // 在"SDK Release Notes"裡有提到:語音初始化,需要等待4秒才能就緒
                this.readyTimer = new DispatcherTimer();
                this.readyTimer.Tick += this.ReadyTimerTick;
                this.readyTimer.Interval = new TimeSpan(0, 0, 4);//等待4秒
                this.readyTimer.Start();

                this.ReportSpeechStatus("初始化語音串流中...(請稍後)");
                this.UpdateInstructionsText(string.Empty);
            }

        }

        //建立語音辨識,建立文法字句
        private SpeechRecognitionEngine CreateSpeechRecognizer()
        {
            RecognizerInfo ri = GetKinectRecognizer();//取得 Kinect 語音識別
            if (ri == null)
            {
                MessageBox.Show(
                    @"初始化語音識別有問題",
                    "無法載入語音識別",
                    MessageBoxButton.OK,
                    MessageBoxImage.Error);
                this.Close();
                return null;
            }

            SpeechRecognitionEngine sre;//建立語音識別引擎
            try
            {
                sre = new SpeechRecognitionEngine(ri.Id);
            }
            catch
            {
                MessageBox.Show(
                    @"初始化語音識別有問題",
                    "無法載入語音識別",
                    MessageBoxButton.OK,
                    MessageBoxImage.Error);
                this.Close();
                return null;
            }

            //========================================================
            //建立文法字句
            GrammarBuilder gBuilder = new GrammarBuilder();
            gBuilder.Culture = ri.Culture;

            gBuilder.Append(new Choices("I", "What's", "How"));
            gBuilder.Append(new Choices("love", "your","are"));
            gBuilder.Append(new Choices("you", "name","you"));
            //===============================================

            // Create the actual Grammar instance, and then load it into the speech recognizer.
            var g = new Grammar(gBuilder);

            sre.LoadGrammar(g);//載入文法字句
            sre.SpeechRecognized += this.SreSpeechRecognized;//接受語音事件
            sre.SpeechHypothesized += this.SreSpeechHypothesized;//推斷語音事件
            sre.SpeechRecognitionRejected += this.SreSpeechRecognitionRejected;//拒絕語音事件

            return sre;
        }

        //初始化語音辨識
        private static RecognizerInfo GetKinectRecognizer()
        {
            Func matchingFunc = r =>
            {
                string value;
                r.AdditionalInfo.TryGetValue("Kinect", out value);
                return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
            };
            return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
        }

        //===拒絕語音事件===
        private void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
        {
            this.RejectSpeech(e.Result);
        }
        private void RejectSpeech(RecognitionResult result)
        {
            string status = "拒絕語句: " + (result == null ? string.Empty : result.Text + " 肯定度:" + result.Confidence);
            this.ReportSpeechStatus(status);
            Animation_Start();
        }

        //推斷語音事件
        private void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            this.ReportSpeechStatus("推斷語句: " + e.Result.Text + " 肯定度:" + e.Result.Confidence);
            Animation_Start();
        }

        //接受語音事件
        private void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {        
            if (e.Result.Confidence < 0.6)//肯定度低於0.6,判為錯誤語句
            {
                this.RejectSpeech(e.Result);                
                return;
            }

            switch (e.Result.Text.ToUpperInvariant())
            {               

                case "I LOVE YOU":
                    Siri_Text = "I love you too";                                       
                    break;
                case "WHAT'S YOUR NAME":
                    Siri_Text = "I am Mini Siri";                                      
                    break;
                case "HOW ARE YOU":
                    Siri_Text = "I am so good";
                    break;
                default:
                    Siri_Text = "I don't know what you mean ?";                                 
                    break;
            }

            ////Siri圖案動畫
            Animation_Start();

            string status = "You: " + e.Result.Text + "\n Siri: " + Siri_Text + "\n===============";
            listBox.Items.Add(status);
            synthesizer.Speak(Siri_Text);

        }

        //文字合成音 
        void Siri_Speech()       
        {
            synthesizer.Volume = 100;//聲音大小(0 ~ 100)      
            synthesizer.Rate = -2;//聲音速度(-10 ~ 10)
        }

        //目前語音狀態顯示
        private void ReportSpeechStatus(string status)
        {
            Dispatcher.BeginInvoke(new Action(() => { tbSpeechStatus.Text = status; }), DispatcherPriority.Normal);
        }
        private void UpdateInstructionsText(string instructions)
        {
            Dispatcher.BeginInvoke(new Action(() => { tbTips.Text = instructions; }), DispatcherPriority.Normal);
        }

        //播放Siri圖案動畫動畫
        private void Animation_Start()
        {           
            Dispatcher.BeginInvoke(new Action(() =>
            { 
                my_sb = (Storyboard)this.FindResource("SiriStoryboard"); 
                my_sb.Begin(this); 
            }), DispatcherPriority.Normal);
        }

        private void ReadyTimerTick(object sender, EventArgs e)
        {
            this.Start();//讀取使用者語音
            this.ReportSpeechStatus("語音識別裝置已就緒");
            this.UpdateInstructionsText("提示:目前只有英文語音" + "\n1. I Love you" + "\n2. What's your name" + "\n3. How are you");
            this.readyTimer.Stop();
            this.readyTimer = null;
        }

        //初始化語音訊號
        private void Start()
        {
            var audioSource = sensor.AudioSource;           
            audioSource.EchoCancellationMode = EchoCancellationMode.None; // No AEC for this sample
            audioSource.AutomaticGainControlEnabled = false; // Important to turn this off for speech recognition
            var kinectStream = audioSource.Start();//開啟Kinect語音串流           

            Stream s = kinectStream;
            
                this.speechRecognizer.SetInputToAudioStream(
                            s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
            
            this.speechRecognizer.RecognizeAsync(RecognizeMode.Multiple);          
            
        }

        //自動捲動 listBox至底
        private void m_cStatusList_ScrollChanged(object sender, ScrollChangedEventArgs e)
        {
            if (e.ExtentHeightChange > 0.0)
                ((ScrollViewer)e.OriginalSource).ScrollToEnd();
        }
         
    
    }

}






範例程式碼下載:



相關文章參考:
HKT線上教學教室 - Kinect 教學目錄

微軟官方Kinect SDK V1 內附語音相關範例程式(共三個):


1.Microsoft_Sample_KinectAudioDemo (圖形化-語音辯位與辨識)

Photobucket



2.Microsoft_Sample_RecordAudio (文字模式-語音辯位與辨識)

Photobucket



3.Microsoft_Sample_Speech (語音辨識)

Photobucket

這個網誌中的熱門文章

2023 最新入門零基礎 Kotlin教學【從零開始學 Kotlin 程式設計】Kotlin 教學課程目錄 (Android Kotlin, IntelliJ IDEA, Android Studio, Android APP 開發教學)

16天記下7000單字

nano 文字編輯器

最新入門零基礎 Java 教學【從零開始學 Java 程式設計】Java教學課程目錄 (IntelliJ IDEA 開發教學)

2022 最新入門零基礎 Flutter教學 【Flutter 程式設計入門實戰 30 天】Flutter 教學課程目錄 (IntelliJ IDEA 開發教學)