Dimitri Kanevsky - IEEE Xplore Author Profile

Showing 1-25 of 34 results

Results

Several methods have been proposed recently for modeling posterior representations derived from local classifiers [1, 2]. In recent work, Sainath et al. have proposed the use of a tied-mixture-based posterior modeling approach [3] to enhance exemplar-based posterior representations for phone recognition tasks. In this work, we conduct a detailed evaluation to determine the effectiveness of this te...Show More
Optimization techniques have been used for many years in the formulation and solution of computational problems arising in speech and language processing. Such techniques are found in the Baum-Welch, extended Baum-Welch (EBW), Rprop, and GIS algorithms, for example. Additionally, the use of regularization terms has been seen in other applications of sparse optimization. This paper outlines a range...Show More
Solving real-world classification and recognition problems requires a principled way of modeling the physical phenomena generating the observed data and the uncertainty in it. The uncertainty originates from the fact that many data generation aspects are influenced by nondirectly measurable variables or are too complex to model and hence are treated as random fluctuations. For example, in speech p...Show More
In this paper we present a novel compressed sensing (CS) algorithm for the recovery of compressible, possibly time-varying, signal from a sequence of noisy observations. The newly derived scheme is based on the acclaimed unscented Kalman filter (UKF), and is essentially self reliant in the sense that no peripheral optimization or CS algorithm is required for identifying the underlying signal suppo...Show More
Over the past few decades, a variety of specialized approaches have been proposed to solve large problems in speech recognition. Conventional optimization techniques have not been widely applied, because the problems do not readily admit an objective for evaluating a given set of parameters and because of the large number of parameters. This situation is changing, due to recent developments in alg...Show More
In this paper, we propose a novel exemplar based technique for classification problems where for every new test sample the classification model is re-estimated from a subset of relevant samples of the training data.We formulate the exemplar-based classification paradigm as a sparse representation (SR) problem, and explore the use of convex hull constraints to enforce both regularization and sparsi...Show More
We introduce the Line Search A-Function (LSAF) technique that generalizes the Extended-Baum Welch technique in order to provide an effective optimization technique for a broader set of functions. We show how LSAF can be applied to functions of various probability density and distribution functions by demonstrating that these probability functions have an A-function. We also show that sparse repres...Show More
Constrained discriminative linear transform (CDLT) optimized with Extended Baum-Welch (EBW) has been presented in the literature as a discriminative speaker adaptation method that outperforms the conventional maximum likelihood algorithm. Defining the controlling parameter of EBW to achieve the best performance of speaker adaptation, however, still remains an open question. This paper presents an ...Show More
Exemplar-based techniques, such as k-nearest neighbors (kNNs) and Sparse Representations (SRs), can be used to model a test sample from a few training points in a dictionary set. In past work, we have shown that using a SR approach for phonetic classification allows for a higher accuracy than other classification techniques. These phones are the basic units of speech to be recognized. Motivated by...Show More
The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been limited. Exemplar-based techniques utilize information about individual training examples and are computationally expensive, making it particularly difficult to investigate these methods on large-vocabulary continuous speech...Show More
Compressed sensing is a new emerging field dealing with the reconstruction of a sparse or, more precisely, a compressed representation of a signal from a relatively small number of observations, typically less than the signal dimension. In our previous work we have shown how the Kalman filter can be naturally applied for obtaining an approximate Bayesian solution for the compressed sensing problem...Show More
In this paper, we introduce a novel Bayesian compressive sensing (CS) technique for phonetic classification. CS is often used to characterize a signal from a few support training examples, similar to k-nearest neighbor (kNN) and Support Vector Machines (SVMs). However, unlike SVMs and kNNs, CS allows the number of supports to be adapted to the specific signal being characterized. On the TIMIT phon...Show More
Compressive sensing (CS) is a popular technique used to reconstruct a signal from few training examples, a problem which arises in many machine learning applications. In this paper, we introduce a technique to guarantee that our data obeys certain isometric properties. In addition, we introduce a bayesian approach to compressive sensing, which we call ABCS, allowing us to obtain complete statistic...Show More
We present two simple methods for recovering sparse signals from a series of noisy observations. The theory of compressed sensing (CS) requires solving a convex constrained minimization problem. We propose solving this optimization problem by two algorithms that rely on a Kalman filter (KF) endowed with a pseudo-measurement (PM) equation. Compared to a recently-introduced KF-CS method, which invol...Show More
Recently proposed l1-regularized maximum-likelihood optimization methods for learning sparse Markov networks result into convex problems that can be solved optimally and efficiently. However, the accuracy of such methods can be very sensitive to the choice of regularization parameter, and optimal selection of this parameter remains an open problem. Herein, we propose a maximum a posteriori probabi...Show More
The extended Baum-Welch (EBW) transformations is one of a variety of techniques to estimate parameters of Gaussian mixture models. In this paper, we provide a theoretical framework for general parameter estimation and show the relationship between these different techniques. We introduce a general family of model parameter updates that generalizes a Baum-Welch (BW) recursive process to an arbitrar...Show More
In many pattern recognition tasks, given some input data and a family of models, the "best" model is defined as the one which maximizes the likelihood of the data given the model. Extended Baum- Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures. In this paper, we use the EBW transformations to derive a novel gradient ste...Show More
Government agencies, corporations, and police departments are plagued by information overload. The inability of fully analyze fragments of data scattered across the organizations reduces productivity, and more and more of these fragments are being gathered every day thanks to tools like the Internet and digital audio/video recorders. However, since much of this information is stored in computer sy...Show More
We present a modified form of the maximum mutual information (MMI) objective function which gives improved results for discriminative training. The modification consists of boosting the likelihoods of paths in the denominator lattice that have a higher phone error relative to the correct transcript, by using the same phone accuracy function that is used in Minimum Phone Error (MPE) training. We co...Show More
In many pattern recognition tasks, given some input data and a model, a probabilistic likelihood score is often computed to measure how well the model describes the data. Extended Baum-Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures, though recently they have been used to derive a gradient steepness measurement to eval...Show More
Audio segmentation has applications in a variety of contexts, such as audio information retrieval, automatic sound analysis, and as a pre-processing step in speech recognition. Extended Baum-Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures. In this paper, we derive an unsupervised audio segmentation approach using these...Show More
We provide a unified architecture, called SPACe, for secure, privacy-aware, and contextual multimedia systems in organizations. Many key and important architectural components already exist which contribute to a unified platform, including the classic data mining, security, and privacy-preserving components in conventional intelligent systems. After presenting an overview of our unified architectu...Show More
Accessibility in the workplace and in academic settings has increased dramatically for users with disabilities, driven by greater awareness, legislative mandate, and technological improvements. Gaps, however, remain. For persons who are deaf and hard of hearing in particular, full participation requires complete access to audio materials, both for live settings and for prerecorded audio and visual...Show More
The discrimination technique for estimating the parameters of Gaussian mixtures that is based on the extended Baum transformations (EB) has had significant impact on the speech recognition community. There appear to be no published proofs that definitively show that these transformations increase the value of an objective function with iteration (i.e., so-called "growth transformations"). The proo...Show More
We describe extensions and improvements to IBM's system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic training data, 80 hours more than for the system described in Chen et al. (1998). In addition to improvements obtained in 1997 we made a number of changes and algorithmic enhancements. Among these were changing the acoustic vocabulary, r...Show More