Welcome

Bavieca is an open-source speech recognition toolkit intended for speech research and as a platform for the development of speech-enabled solutions by non-speech experts. It supports common acoustic modeling and adaptation techniques based on continuous density hidden Markov models (CD-HMMs), including discriminative training. Bavieca includes two efficient decoders based on dynamic and static expansion of the search space that can operate in batch and live recognition modes. The Bavieca toolkit offers a simple and modular design with an emphasis on efficiency, scalability and reusability. Bavieca exhibits competitive results on standard benchmarks and has been successfully used on a number of research projects addressing both read and conversational children's speech as well as conversational adult's speech.

Bavieca has been entirely developed by Daniel Bolaños at Boulder Language Technologies and it is currently used in several research projects and products. Bavieca is currently at the Beta development stage and its development is still a work in progress. An article introducing the Bavieca speech recognition toolkit has been pusblished in the IEEE Spoken Language Technology Workshop 2012

D. Bolaños, “The Bavieca Open-Source Speech Recognition Toolkit”. In Proceedings of IEEE Workshop on Spoken Language Technology (SLT), December 2-5, 2012, Miami, FL.

Design

The list below summarizes some of the main design principles of Bavieca

  • Written entirely in C++ making extensive use of the Standard Template Library (STL)
  • Small code base, ≈ 30,000 lines of code and ≈ 100 C++ classes
  • Reduced set of command line tools (about 25) that serve specific purposes such as accumulating sufficient statistics or estimating model parameters according to some estimation criterion
  • Extensive reuse of code across tools
  • Application Programming Interface (API) that enables develoment of stand-alone applications that exploit Bavieca's speech recognition capabilities
  • Linear Algebra support through BLAS and LAPACK

Features

The list below summarizes the main features of the Bavieca toolkit.

  • Large vocabulary continuous speech recognition
    • Dynamic search decoder with support for cross-word triphone and pentaphone HMMs
    • Weighted Finite State Acceptor (WFSA) based speech decoder and efficient WFSA network builder (cross-word triphones)
    • Efficient computation of emission probabilities thanks to the use of the nearest neighbor approximation, partial distance elimination and support for Single Instruction Multiple Data (SIMD) parallel computation (x86 architecture only)
    • Lattice generation (both decoders)
    • Hypothesis files in NIST formats (SCLITE can be use for scoring hypotheses)
  • Acoustic modeling
    • Acoustic models based on continuous density Hidden Markov Models (CD-HMMs) with emission probabilities modeled using mixtures of Gaussian distributions (GMMs)
    • HMM topology fixed to three states left to right
    • Variable number of Gaussian components per HMM-state
    • No explicit modeling of transition probabilities
    • Diagonal and full covariance modeling
    • Cross-word context dependency modeling using triphone, pentaphones, heptaphones, etc
    • Maximum Likelihood Estimation criterion
    • Discriminative training using boosted Maximum Mutual Information (bMMI) criterion with I-smoothing and cancellation of statistics
    • Parallel accumulation of sufficient statistics for both Maximum Likelihood and Discriminative Training criteria
    • Linear algebra support through template classes (Matrix, Vector, etc) wrapping third party libraries (BLAS and LAPACK)
  • Language modeling
    • Support for n-gram language models in ARPA and binary formats
    • Support for any n-gram order (zerogram, unigram, bigram, trigram, fourgram, etc)
    • Language models are internally represented as Finite State Machines
  • Speaker adaptation
    • Model space Maximum Likelihood Linear Regression (MLLR) using regression trees to automatically determine the number of transforms to be used and how adaptation data is shared among transforms
    • Feature space Maximum Likelihood Linear Regression (fMLLR)
    • Vocal Tract Length Normalization (VTLN)
  • Feature extraction
    • Mel Frequency Cepstral Coefficients (MFCC) features
    • Cesptral Mean Normalization (CMN) and Cepstral Mean Variance Normalization (CMVN) at both utterance or session level
    • Feature decorrelation and dimensionality reduction using Heteroscedastic Linear Discriminant Analysis (HLDA)
    • Support for spliced features and third order derivatives
  • Lattice processing and n-best list generation
    • Lattice rescoring using different criteria: maximum likelihood or posterior probabilities
    • N-best generation (from lattices) using different criteria: maximum likelihood or posterior probabilities
    • Lattice word error rate (WER) computation (oracle)
    • Lattice alignment and HMM-state marking
    • Attach LM-scores to lattice edges according to a given language model
    • Lattice-based posterior probability computation
    • Confidence annotation
    • Lattice path-insertion (discriminative training)
    • Lattices are processed in binary format but text format is available for readability purposes
  • Speech activity detection
    • HMM-based speech activity detection

License

The Bavieca speech recognition toolkit is an open source project distributed under the highly unrestricted Apache 2.0 license, and is freely available on SourceForge.

Benchmarks

The recognition accuracy and real time performance of Bavieca has been measured on different tasks.

WSJ Nov'92 Evaluation (microphone read speech)

The table below summarizes the accuracy of Bavieca in the WSJ Nov'92 task compared to known speech recognition systems. Additional details about this evaluation as well as data regarding the real-time performance of Bavieca on this task can be found in the research article "The Bavieca Speech Recognition Toolkit" presented at IEEE SLT 2012.

5k20k
system bigram trigram bigram trigram gender dep.
HTK 5.1 3.2 11.1 9.5 yes
Limsi 4.8 3.1 11.0 9.1 yes
Kaldi 11.8 no
Bavieca 4.7 3.1 10.6 8.7 no
Bavieca+bMMI* 2.8 8.2 no
Table. WER (%) on the WSJ Nov'92 for Bavieca and other speech recognition toolkits.

* This sysem configuration uses discriminatively trained acoustic models, so it is not directly comparable to the rest of the systems in the table, which use acoustic models trained under Maximum Likelihood.

Publications

Bavieca's main publication is:

D. Bolaños, “The Bavieca Open-Source Speech Recognition Toolkit”. In Proceedings of IEEE Workshop on Spoken Language Technology (SLT), December 2-5, 2012, Miami, FL.

Additionally, Bavieca has been utilized in several research projects, mainly in the education and speech recognition fields. Below there is a list of journal articles describing research that made use of the Bavieca speech recogniton toolkit.

Download

Bavieca's source code can be freely downloaded from SourceForge. The following command downloads Bavieca's source code from the git repository at SourceForge:

git clone git://git.code.sf.net/p/bavieca/code bavieca-code

Install

Prerequisites

Bavieca makes use of the BLAS and LAPACK third-party libraries to perform linear algebra, thus, it is necessary to download and compile these libraries before installing Bavieca. These libraries can be obtained from netlib. Once the libraries are installed, some variables defined in "/src/Makefile.defines" need to be modified to point to the installed libraries. The following variables need to be set:

Installation

Currently there are no binary distributions of Bavieca, thus it is necessary to compile Bavieca for the target architecture to produce the actual binaries. Bavieca's source code has been compiled under Linux (32 and 64 bits) and Windows (64 bits) x86 architectures. However, at the moment only some of its functionality (mainly the functionality exposed in Bavieca's API) has been tested on Windows. For this reason the current version of the software in the repository does not contain support for compiling it under Windows. Future releases will incorporate full support for the Windows platform and possibly for other platforms.

Once Bavieca's dependencies have been installed, it is possible to compile the toolkit by navigating to the directory "/src" and typing "make". This command will produce an executable file for each command line tool in Bavieca and the Bavieca API library.

Prerequisites