-
Notifications
You must be signed in to change notification settings - Fork 18
MATLAB functions that interface with the HTK Speech Recognition Toolkit (http://htk.eng.cam.ac.uk/) for training HMMs, GMMs and simple speech recognizers.
License
ronw/matlab_htk
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This package contains a set of functions for calling interfacing with HTK from Matlab. Right now its mostly limited to training GMMs and HMMs. It converts your Matlab data into a format that HTK understands and calls HTK command line programs. The path to the HTK binaries is hardcoded in get_htk_path.m The portions written by me are distributed under the terms of the GNU General Public License. See the file COPYING for details. Unforuntately this has not been very thoroughly tested, but it has been working for me. If anyone wants to write some unit tests using http://mlunit.sf.net or something similar I won't say no. * Functions They are all reasonably commented... - The important ones: - train_hmm_htk - use HTK to train an HMM - train_gmm_htk - use HTK to train a GMM - train_htk_recognizer - train a simple HTK recognizer - eval_htk_recognizer - recognize signal using HTK recognizer - Utility functions: - read_htk_hmm - read in an HTK HMM definition (text format only) - write_htk_hmm - write an HMM structure as an HTK HMM definition - htkread/htkwrite - read/write a Matlab matrix in HTK feature file format (written by Mark Hasegawa-Johnson) - compose_hmms - does FSM composition to form a big HMM from many small HMMs (i.e. get an HMM to recognize a sentence from a bunch of phone HMMs). - kmeans - uses k-means to learn clusters from data. Used to initialize GMMs and HMMs. - logsum - takes the sum of a matrix of log likelihoods - get_htk_path - centralized location to set the path to the HTK binaries * Data Structures The functions in this toolbox pass around the following structures: Note: all probabilities are stored as log probabilities ** GMM - gmm.nmix - number of components in the mixture - gmm.priors - array of prior log probabilities over each state - gmm.means - matrix of means (column x is mean of component x) - gmm.covars - matrix of covariance (column x is the diagonal of the covariance matrix of component x) ** HMM with GMM observations - hmm.name - - hmm.nstates - number of states in the HMM - hmm.emission_type - 'GMM' - hmm.start_prob - array of log probs P(first observation is state x) - hmm.end_prob - array of log probs P(last observation is state x) - hmm.transmat - matrix of transition log probs (transmat(x,y) = log(P(transition from state x to state y))) - hmm.labels - optional cell array of labels for each state in the HMM (for use in composing HMMs) - hmm.gmms - array of GMM structures ** HMM with Gaussian observations - hmm.nstates - number of states in the HMM - hmm.emission_type - 'gaussian' - hmm.start_prob - array of log probs P(first observation is state x) - hmm.end_prob - array of log probs P(last observation is state x) - hmm.transmat - matrix of transition log probs (transmat(x,y) = log(P(transition from state x to state y))) - hmm.labels - optional cell array of labels for each state in the HMM (for use in composing HMMs) - hmm.means - matrix of means (column x is mean of state x) - hmm.covars - matrix of means (column x is the diagonal of the covariance matrix of component x) Note that each row of exp(hmm.transmat) does not necessarily sum to 1 because for each state x there is some probability (exp(hmm.exit_prob(x))) that the next transition will be to a non-emitting exit state (i.e. the current observation is the last observation in the sequence). The correct invariant is: sum(exp(hmm.transmat, 2)) + exp(hmm.exit_prob) == ones(hmm.nstates, 1) 2007-11-06 Ron Weiss <[email protected]>
About
MATLAB functions that interface with the HTK Speech Recognition Toolkit (http://htk.eng.cam.ac.uk/) for training HMMs, GMMs and simple speech recognizers.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published