TRAJECTORY CLUSTERING FOR AUTOMATIC SPEECH RECOGNITION (MonPmOR6)
Author(s) :
Yan Han (Radboud University Nijmegen, Netherlands)
Johan De Veth (Radboud University Nijmegen, Netherlands)
Louis Boves (Radboud University Nijmegen, Netherlands)
Abstract : In this paper, we present an approach for automatic clustering of multi-dimensional dynamic trajectories corresponding to speech data that is based on Trajectory Clustering (TC). TC uses the Expectation Maximization algorithm (EM) for clustering with the mixtures of Multiple Linear Regression model. Since the initial values of the model parameters are critical to the clustering performance, a successive splitting algorithm was developed to incrementally increase the number of clusters. We define multipath HMM topologies using the trajectory clusters found. Based on the hypothesis that pronunciation variation in speech is more systematic at a unit level that is longer than a phone, we used modelling units defined in terms of Head-Body-Tail (HBT) models for connected digit recognition for the Dutch language. It appears that multi-path HMM topologies based on TC clusters outperform multi-path HMM topologies based on prior knowledge about speaker gender and speaking rate.
Menu