VoiceBridge
VoiceBridge is an open source speech recognition C++ toolkit (AI-TOOLKIT Open Source License - Apache 2.0 based, very permissive and allows commercial use), optimized for MS Windows 64-bit (can be easily modified to compile on other operating systems).
Download VoiceBridge Source Code
You can download the full source code from the GitHub software repository: VoiceBridge GitHub.
You will also need the example projects: VoiceBridgeProjcets GitHub.
VoiceBridge Getting Started Guide
VoiceBridge is the MS Windows counterpart of KALDI (speech recognition software for Unix like operating systems) with the following differences and extensions:
- VoiceBridge is C++ only code without any scripts. Kaldi depends heavily on several scripting languages (Bash, Perl, and Python).
- The aim of VoiceBridge is to make writing high quality professional and fast speech recognition software very easy. VoiceBridge does not include all of the available models in Kaldi but a selection of models which provide very good accuracy and are fast. Kaldi is a research system and will always have more models available. VoiceBridge may add new models in the future if they provide significant accuracy and/or speed improvement.
The following speech recognition models are currently available in VoiceBridge:
- GMM Mono-phone,
- GMM Tri-phone,
- MFCC,
- MFCC + delta + delta-delta,
- MFCC + delta + delta-delta + Pitch,
- SAT,
- LDA+MLLT,
- LDA+MLLT+SAT.
- VoiceBridge includes the following extra modules not included in Kaldi:
- Automatic language model generation.
- Automatic pronunciation lexicon generation.
- Semi-automatic speaker group separation.
Thanks to these modules VoiceBridge only requires a limited number of input:- Wav files.
- Text transcription files for each wav file.
- Reference language dictionary (available in the VoiceBridge distribution).
Your speech recognition job has just become much easier!
- VoiceBridge is hardware accelerated in two ways:
- Automatic parallel processing by automatic CPU/core detection and work distribution. More processors/processor cores mean faster processing!
- VoiceBridge makes use of the Intel Math Kernel Library (MKL) which further accelerates processing by making use of special processor instruction sets.
Note: VoiceBridge currently does not support grid computing and CUDA and there are also no plans to add these in the near future.
- The VoiceBridge C++ code is organized in 1 DLL library. This is a huge difference between Kaldi and VoiceBridge because Kaldi includes hundreds of exe and script files. For this reason it is very easy to distribute your software built upon VoiceBridge. VoiceBridge is aimed to be fast, high accuracy and easy to use professional production ready system.
- VoiceBridge includes two complete examples which demonstrate how to use the library. Both examples are also available in Kaldi. This makes the learning of VoiceBridge for Kaldi users much easier.
- One of the examples is the Yes-No example. This is a very simple speech recognition example in which we train a model to recognize people saying ‘yes’ or ‘no’. The WER (word error rate) of this example in VoiceBridge is 2% (98% accuracy) and the training + testing takes about 8 seconds (with 4 processor cores).
- The second example is the so called LibriSpeech example, a real world speech recognition application in which several hours of English speech learning and recognition are included. The WER (word error rate) of this example in VoiceBridge is 5.92% (94% accuracy) and the training + testing takes about 25 minutes (with 4 processor cores).
Both examples are ready to use code templates for your speech recognition projects! More examples may be added later.
- One of the examples is the Yes-No example. This is a very simple speech recognition example in which we train a model to recognize people saying ‘yes’ or ‘no’. The WER (word error rate) of this example in VoiceBridge is 2% (98% accuracy) and the training + testing takes about 8 seconds (with 4 processor cores).
- Everything is included in the VoiceBridge distribution except the Intel MKL library which can be downloaded for free from this website: Intel MKL: https://software.intel.com/en-us/mkl.
- Compilation: VoiceBridge compilation can be done with the included MS Visual Studio 2017 projects. As you probably know MS VS 2017 is free software. VoiceBridge only supports 64-bit compilation because 64-bit systems are faster and can use more memory.Please follow these steps for the compilation:
- a. Download and install the Intel MKL library. Note the location of the library. For example:
C:\IntelSWTools\compilers_and_libraries_2018\windows\mkl
- b. Adjust the MKL library location in the ‘SettingsVoiceBridge.props’ file in the root directory of VoiceBridge. Do not modify anything else because VoiceBridge is setup with relative paths and therefore you do not need to adjust any more settings.
- c. Compile the openfst project located in ‘VoiceBridge\openfst-win-1.6’. It is best to compile both Debug and Release versions.Important: Whole program optimization must be OFF for the library!
- d. Compile the Kaldi project located in ‘D:_WORK1\VoiceBridge\kaldi-master’. It is best to compile both Debug and Release versions.Important: Whole program optimization must be OFF for the library!
- e. Compile the VoiceBridge DLL located in ‘VoiceBridge\VoiceBridge\VoiceBridge’.Important: Whole program optimization must be OFF! This option could result in 2-3% speed improvement but then the DLL should be cut in peaces because VS can not handle the optimization of so much code.Note: Please note that there is a shortcut to all of the above mentioned VS2017 projects in the root directory of the distribution.
- f. In the TestDll example you can select which example you want to run. Choose between ‘TestYesNo();’ or ‘TestLibriSpeech();’ or run both after each other.Important: You must make sure that the path to the example projects is correct in both example cpp files (YesNo.cpp, LibriSpeech.cpp). E.g. for the Yes-No example the path is set with the following command:
fs::path project(exepath.branch_path() / "../../../../../VoiceBridgeProjects/YesNo");
Do this after downloading the example projects from the Github repository ‘VoiceBridgeProjects’ (the data is ~600 MB). If you put the example projects into a directory called VoiceBridgeProjects at the same level as the VoiceBridge directory (e.g.: C:\VoiceBridge and C:\ VoiceBridgeProjects) then you do not need to change anything. In this case the input directory for the Yes-No project would be located in: ‘C:\VoiceBridgeProjects\YesNo\input’. - g. Compile the test project located in ‘VoiceBridge\VoiceBridge\TestDll’.
- h. Run the example.
- Redistribution: The directory ‘VoiceBridge\Redistributables’ contains all the necessary dll’s which need to be redistributed with any software built with the use of VoiceBridge. Most of them are for the Intel MKL library and one is for OpenMP support. You may of course need to distribute some more dll’s required by your compiler (MS VS2017) for example for the C++ runtime.Note: Please note that the MKL dll’s are from the w_mkl_2018.1.156 distribution. You may need to replace these if you download a more recent version later!
- Documentation: In speech recognition technical matters please refer to the in the VoiceBridge distribution included e-books in PDF format and to the Kaldi documentation here: Kaldi Documentation: http://kaldi-asr.org/doc/about.html. For all other subjects concerning the VoiceBridge library and options please consult this website and the heavily documented source code. You may use VoiceBridge on GitHub for discussion or for reporting any issue or request.
Attribution
VoiceBridge would not have been possible without the work of the following people and companies:
- Daniel Povey – Dan is the main developer of Kaldi (http://kaldi-asr.org/doc/about.html) and an exceptional researcher and person. Dan was a great help during the making of VoiceBridge.
- Many people who contributed to Kaldi. Please consult the Kaldi website for a full list of names.
- Josef Robert Novak – Josef has developed Phonetisaurus on which the automatic VoiceBridge pronunciation generator is based on.
- Massachusetts Institute of Technology (MIT) – Several people at MIT contributed to the MITLM project on which the VoiceBridge automatic language model generator is based on.
- Microsoft Corporation – many of the Kaldi modules (also written by Dan while working at Microsoft) are included in VoiceBridge.
- Johns Hopkins University – several people contributed to the Kaldi project.
- Google Inc.
- Arash Partow – Arash has developed the indispensable String Toolkit Library (http://www.partow.net/programming/strtk/index.html).
- Boost developers (www.boost.org)
- There are most probably still many people and companies who contributed to projects who are not mentioned here above but their names can be found in the source code. If you feel that you are a major contributor and I have forgot to mention your name then please let me know and I will add your name.
Software License
The use of VoiceBridge is subject to the AI-TOOLKIT Open Source Software License:
Learn about the application of Artificial Intelligence and Machine Learning from the book "The Application of Artificial Intelligence | Step-by-Step Guide from Beginner to Expert", Springer 2020 (~400 pages) (ISBN 978-3-030-60031-0). Unique, understandable view of machine learning using many practical examples. Introduces AI-TOOLKIT, freely available software that allows the reader to test and study the examples in the book. No programming or scripting skills needed! Suitable for self-study by professionals, also useful as a supplementary resource for advanced undergraduate and graduate courses on AI. More information can be found at the Springer website: Springer book: The Application of Artificial Intelligence. |