%0 Journal Article
%J Statistics & Probability Letters
%D 2006
%T Almost Sure Convergence of Titterington's Recursive Estimator for Finite Mixture Models
%A Y. Zhao
%A Shaojun Wang
%X Titterington proposed a recursive parameter estimation algorithm for finite mixture models. However, due to the well known problem of singularities and multiple maximum, minimum and saddle points that are possible on the likelihood surfaces, convergence analysis has seldom been made in the past years. In this paper, under mild conditions, we show the global convergence of Titterington's recursive estimator and its MAP variant for mixture models of full regular exponential family.
%B Statistics & Probability Letters
%P 2001-1006
%G eng
%0 Journal Article
%J Elsevier
%D 2006
%T Almost Sure Convergence of Titterington's Recursive Estimator for Finite Mixture Models
%A Shaojun Wang
%A Y. Zhao
%X Titterington proposed a recursive parameter estimation algorithm for finite mixture models. However, due to the well known problem of singularities and multiple maximum, minimum and saddle points that are possible on the likelihood surfaces, convergence analysis has seldom been made in the past years. In this paper, under mild conditions, we show the global convergence of Titterington's recursive estimator and its MAP variant for mixture models of full regular exponential family.
%B Elsevier
%P 2001-2006
%G eng
%0 Journal Article
%D 2005
%T Combining Statistical Language Models via the Latent Maximum Entropy Principle
%A F. Peng
%A Y. Zhao
%A Shaojun Wang
%A D. Schuurmans
%X We present a unified probabilistic framework for statistical language modeling which can simultaneously incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Our approach is based on a recent statistical inference principle we have proposed-the latent maximum entropy principle-which allows relationships over hidden features to be effectively captured in a unifiedmodel. Our work extends previous research on maximum entropy methods for language modeling, which only allow observed features to be modeled. The ability to conveniently incorporate hidden variables allows us to extend the expressiveness of language models while alleviating the necessity of pre-processing the data to obtain explicitly observed features.We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then use these techniques to combine two standard forms of language models: local lexical models (Markov N-gram models) and global document-level semantic models (probabilistic latent semantic analysis). Our experimental results on the Wall Street Journal corpus show that we obtain a 18.5% reduction in perplexity compared to the baseline tri-gram model with Good-Turing smoothing.
%G eng
%0 Journal Article
%D 2004
%T Learning Mixture Models with the Regularized Latent Maximum Entropy Principle
%A D. Schuurmans
%A F. Peng
%A Y. Zhao
%A Shaojun Wang
%X We present a new approach to estimating mixture models based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes' maximum entropy principle and from standard maximum likelihood estimation. We demonstrate the LME principle by deriving new algorithms for mixture model estimation, and show how robust new variants of the EM algorithm can be developed. Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring latent variable models from small amounts of data.
%G eng
%0 Conference Paper
%B Boltzmann Machine Learning with the Latent Maximum Entropy Principle
%D 2003
%T Boltzmann Machine Learning with the Latent Maximum Entropy Principle
%A Y. Zhao
%A Shaojun Wang
%A D. Schuurmans
%A F. Peng
%B Boltzmann Machine Learning with the Latent Maximum Entropy Principle
%G eng
%0 Conference Paper
%B Learning Mixture Models with the Latent Maximum Entropy Principle
%D 2003
%T Learning Mixture Models with the Latent Maximum Entropy Principle
%A Y. Zhao
%A Shaojun Wang
%A F. Peng
%A D. Schuurmans
%B Learning Mixture Models with the Latent Maximum Entropy Principle
%G eng
%0 Conference Paper
%B Semantic N-gram Language Modeling with the Latent Maximum Entropy Principle
%D 2003
%T Semantic N-gram Language Modeling with the Latent Maximum Entropy Principle
%A D. Schuurmans
%A F. Peng
%A Y. Zhao
%A Shaojun Wang
%B Semantic N-gram Language Modeling with the Latent Maximum Entropy Principle
%G eng
%0 Conference Paper
%B The Latent Maximum Entropy Principle
%D 2002
%T The Latent Maximum Entropy Principle
%A Shaojun Wang
%A Y. Zhao
%A D. Schuurmans
%A R. Rosenfeld
%B The Latent Maximum Entropy Principle
%G eng
%0 Conference Paper
%B Almost Sure Convergence of Titterington's Recursive Estimator for Finite Mixture Models
%D 2001
%T Almost Sure Convergence of Titterington's Recursive Estimator for Finite Mixture Models
%A Y. Zhao
%A Shaojun Wang
%X Titterington proposed a recursive parameter estimation algorithm for finite mixture models. However, due to the well known problem of singularities and multiple maximum, minimum and saddle points that are possible on the likelihood surfaces, convergence analysis has seldom been made in the past years. In this paper, under mild conditions, we show the global convergence of Titterington's recursive estimator and its MAP variant for mixture models of full regular exponential family.
%B Almost Sure Convergence of Titterington's Recursive Estimator for Finite Mixture Models
%G eng
%0 Conference Paper
%D 2001
%T Almost Sure Convergence of Titterington's Recursive Estimator for Finite Mixture Models
%A Shaojun Wang
%A Y. Zhao
%G eng
%0 Conference Paper
%D 2001
%T Latent Maximum Entropy Principle for Statistical Language Modeling
%A Shaojun Wang
%A Y. Zhao
%A R. Rosenfeld
%G eng
%0 Journal Article
%D 2001
%T On-Line Bayesian Tree-Structured Transformation of HMMs with Optimal Model Selection for Speaker Adaptation
%A Y. Zhao
%A Shaojun Wang
%X This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small amount of adaptation data. By constructing a clustering tree of HMM Gaussian mixture components, the linear regression (LR) or affine transformation parameters for HMM Gaussian mixture components are dynamically searched. An online Bayesian learning technique is proposed for recursive maximum a posteriori (MAP) estimation of LR and affine transformation parameters. This technique has the advantages of being able to accommodate flexible forms of transformation functions as well as a priori probability density functions (pdfs). To balance between model complexity and goodness of fit to adaptation data, a dynamic programming algorithm is developed for selecting models using a Bayesian variant of the 'minimum description length' (MDL) principle. Speaker adaptation experiments with a 26-letter English alphabet vocabulary were conducted, and the results confirmed effectiveness of the online learning framework.
%G eng
%0 Conference Paper
%B Recursive Estimation of Time-Varying Environments for Robust Speech Recognition
%D 2001
%T Recursive Estimation of Time-Varying Environments for Robust Speech Recognition
%A K. Yen
%A Shaojun Wang
%A Y. Zhao
%B Recursive Estimation of Time-Varying Environments for Robust Speech Recognition
%G eng
%0 Conference Paper
%B On-Line Bayesian Speaker Adaptation By Using Tree-Structured Transformation and Robust Priors
%D 2000
%T On-Line Bayesian Speaker Adaptation By Using Tree-Structured Transformation and Robust Priors
%A Shaojun Wang
%A Y. Zhao
%B On-Line Bayesian Speaker Adaptation By Using Tree-Structured Transformation and Robust Priors
%G eng
%0 Conference Paper
%B Optimal On-Line Bayesian Model Selection for Speaker Adaptation
%D 2000
%T Optimal On-Line Bayesian Model Selection for Speaker Adaptation
%A Shaojun Wang
%A Y. Zhao
%B Optimal On-Line Bayesian Model Selection for Speaker Adaptation
%G eng
%0 Conference Paper
%B On-Line Bayesian Tree-Structured Transformation of Hidden Markov Models for Speaker Adaptation
%D 1999
%T On-Line Bayesian Tree-Structured Transformation of Hidden Markov Models for Speaker Adaptation
%A Shaojun Wang
%A Y. Zhao
%B On-Line Bayesian Tree-Structured Transformation of Hidden Markov Models for Speaker Adaptation
%G eng
%0 Conference Paper
%B A Unifed Framework for Recursive Maximum Likelihood Estimation of Hidden Markov Models
%D 1999
%T A Unifed Framework for Recursive Maximum Likelihood Estimation of Hidden Markov Models
%A Shaojun Wang
%A Y. Zhao
%B A Unifed Framework for Recursive Maximum Likelihood Estimation of Hidden Markov Models
%G eng
%0 Conference Paper
%B On Convergence of Maximum Likelihood Estimation of Binary HMMs by EM Algorithm
%D 1998
%T On Convergence of Maximum Likelihood Estimation of Binary HMMs by EM Algorithm
%A Shaojun Wang
%A M. Li
%A Y. Zhao
%B On Convergence of Maximum Likelihood Estimation of Binary HMMs by EM Algorithm
%G eng