Andrew Senior - IEEE Xplore Author Profile

IEEE.org
IEEE Xplore
IEEE SA
IEEE Spectrum
More Sites

- Donate
- Personal Sign In

Access provided by:

MIT Libraries

Access provided by:

MIT Libraries

ADVANCED SEARCH

Author details

Andrew Senior

Also published under: A. Senior, A. W. Senior

Publications

51

Citations

11,879

Publications by Year

19932022

Co-Authors:

Triantafyllos AfourasN. AhujaF. AlmenarezMichiel BacchianiS. Basu

Show All Co-Authors (105)

Andrew Senior

Also published under: A. Senior, A. W. Senior

Affiliation

Google DeepMind, London, United Kingdom

Publication Topics

Deep Neural Network,
Speech Recognition,
Word Error Rate,
Acoustic Model,
Input Sequence,
Neural Network,
Speech Recognition Systems,
Acoustic Data,
Attention Mechanism,
Audio Input,
Audio Stream,
Automatic Speech Recognition System

Biography

Andrew Senior received the Ph.D. degree from Cambridge University, Cambridge, U.K., for his thesis “Recurrent Neural Networks for Offline Cursive Handwriting Recognition.” He is currently a Research Scientist in deep learning at Google DeepMind, London. He has worked on research into deep and recurrent neural networks for acoustic modeling in Googles speech recognition system. Before joining Google, he worked in IBM Research at the areas of handwriting, audio-visual speech, face, and fingerprint recognition as well as video privacy protection and visual tracking. He has taught at Columbia University, written more than 100 papers and holds 49 patents.(Based on document published on 20 February 2017).

Publications

51

Citations

11,879

Publications by Year

19932022

Co-Authors:

Triantafyllos Afouras
N. Ahuja
F. Almenarez
Michiel Bacchiani
S. Basu

Show All Co-Authors (105)

Author's Published Works

Search History

Showing 1-25 of 51 results

Conferences (40)

Journals (6)

Magazines (5)

Sort

Filter Results

Show

Subscribed Content

Open Access Only

Range
Single Year
Andrew Senior(24)
A. Senior(17)
L. Brown(9)
A.W. Senior(9)
A. Hampapur(9)
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA(11)
Google(4)
Google Inc., New York(3)
Google, Inc., USA(3)
IBM Thomas J. Watson Research Center, Hawthorne, NY, USA(3)
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(5)
2013 IEEE International Conference on Acoustics, Speech and Signal Processing(4)
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(3)
IEEE Transactions on Pattern Analysis and Machine Intelligence(3)
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings(2)
IEEE(51)
Media(1)
Florence, Italy(5)
Vancouver, BC, Canada(5)
South Brisbane, QLD, Australia(3)
Atlanta, GA, USA(2)
Copenhagen, Denmark(2)
Deep Neural Network(20)
Neural Network(19)
Speech Recognition(19)
Word Error Rate(17)
Acoustic Model(16)

Select All on Page

Sort By

Results

Deep Audio-Visual Speech Recognition

Triantafyllos Afouras;Joon Son Chung;Andrew Senior;Oriol Vinyals;Andrew Zisserman

IEEE Transactions on Pattern Analysis and Machine Intelligence

Year: 2022 | Volume: 44, Issue: 12 | Journal Article |

Cited by: Papers (305)

HTML

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) we compare two models for lip reading, on...Show More

Deep Audio-Visual Speech Recognition

Triantafyllos Afouras;Joon Son Chung;Andrew Senior;Oriol Vinyals;Andrew Zisserman

IEEE Transactions on Pattern Analysis and Machine Intelligence

Year: 2022 | Volume: 44, Issue: 12 | Journal Article |

Lip Reading Sentences in the Wild

Joon Son Chung;Andrew Senior;Oriol Vinyals;Andrew Zisserman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Year: 2017 | Conference Paper |

Cited by: Papers (439) | Patents (3)

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a Watch, Listen, Attend and Spell (WLAS) ...Show More

Lip Reading Sentences in the Wild

Joon Son Chung;Andrew Senior;Oriol Vinyals;Andrew Zisserman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Year: 2017 | Conference Paper |

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

Tara N. Sainath;Ron J. Weiss;Kevin W. Wilson;Bo Li;Arun Narayanan;Ehsan Variani;Michiel Bacchiani;Izhak Shafran;Andrew Senior;Kean Chin;Ananya Misra;Chanwoo Kim

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2017 | Volume: 25, Issue: 5 | Journal Article |

Cited by: Papers (163)

HTML

Multichannel automatic speech recognition (ASR) systems commonly separate speech enhancement, including localization, beamforming, and postfiltering, from acoustic modeling. In this paper, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different m...Show More

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

Tara N. Sainath;Ron J. Weiss;Kevin W. Wilson;Bo Li;Arun Narayanan;Ehsan Variani;Michiel Bacchiani;Izhak Shafran;Andrew Senior;Kean Chin;Ananya Misra;Chanwoo Kim

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2017 | Volume: 25, Issue: 5 | Journal Article |

Flat start training of CD-CTC-SMBR LSTM RNN acoustic models

Kanishka Rao;Andrew Senior;Haşim Sak

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2016 | Conference Paper |

Cited by: Papers (7)

HTML

We present a recipe for training acoustic models with context dependent (CD) phones from scratch using recurrent neural networks (RNNs). First, we use the connectionist temporal classification (CTC) technique to train a model with context independent (CI) phones directly from the written-domain word transcripts by aligning with all possible phonetic verbalizations. Then, we devise a mechanism to g...Show More

Flat start training of CD-CTC-SMBR LSTM RNN acoustic models

Kanishka Rao;Andrew Senior;Haşim Sak

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2016 | Conference Paper |

Acoustic modelling with CD-CTC-SMBR LSTM RNNS

Andrew Senior;Haşim Sak;Félix de Chaumont Quitry;Tara Sainath;Kanishka Rao

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Year: 2015 | Conference Paper |

Cited by: Papers (13) | Patents (3)

HTML

This paper describes a series of experiments to extend the application of Context-Dependent (CD) long short-term memory (LSTM) recurrent neural networks (RNNs) trained with Connectionist Temporal Classification (CTC) and sMBR loss. Our experiments, on a noisy, reverberant voice search task, include training with alternative pronunciations and the application to child speech recognition; combinatio...Show More

Acoustic modelling with CD-CTC-SMBR LSTM RNNS

Andrew Senior;Haşim Sak;Félix de Chaumont Quitry;Tara Sainath;Kanishka Rao

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Year: 2015 | Conference Paper |

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Tara N. Sainath;Ron J. Weiss;Kevin W. Wilson;Arun Narayanan;Michiel Bacchiani;Andrew Senior

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Year: 2015 | Conference Paper |

Cited by: Papers (31) | Patents (11)

HTML

Multichannel ASR systems commonly use separate modules to perform speech enhancement and acoustic modeling. In this paper, we present an algorithm to do multichannel enhancement jointly with the acoustic model, using a raw waveform convolutional LSTM deep neural network (CLDNN). We will show that our proposed method offers ~5% relative improvement in WER over a log-mel CLDNN trained on multiple ch...Show More

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Tara N. Sainath;Ron J. Weiss;Kevin W. Wilson;Arun Narayanan;Michiel Bacchiani;Andrew Senior

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Year: 2015 | Conference Paper |

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

Tara N. Sainath;Oriol Vinyals;Andrew Senior;Haşim Sak

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2015 | Conference Paper |

Cited by: Papers (920) | Patents (44)

HTML

Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more s...Show More

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

Tara N. Sainath;Oriol Vinyals;Andrew Senior;Haşim Sak

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2015 | Conference Paper |

Learning acoustic frame labeling for speech recognition with recurrent neural networks

Haşim Sak;Andrew Senior;Kanishka Rao;Ozan İrsoy;Alex Graves;Françoise Beaufays;Johan Schalkwyk

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2015 | Conference Paper |

Cited by: Papers (104) | Patents (19)

HTML

We explore alternative acoustic modeling techniques for large vocabulary speech recognition using Long Short-Term Memory recurrent neural networks. For an acoustic frame labeling task, we compare the conventional approach of cross-entropy (CE) training using fixed forced-alignments of frames and labels, with the Connectionist Temporal Classification (CTC) method proposed for labeling unsegmented s...Show More

Learning acoustic frame labeling for speech recognition with recurrent neural networks

Haşim Sak;Andrew Senior;Kanishka Rao;Ozan İrsoy;Alex Graves;Françoise Beaufays;Johan Schalkwyk

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2015 | Conference Paper |

Context dependent phone models for LSTM RNN acoustic modelling

Andrew Senior;Haşim Sak;Izhak Shafran

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2015 | Conference Paper |

Cited by: Papers (25) | Patents (9)

HTML

Long Short Term Memory Recurrent Neural Networks (LSTM RNNs), combined with hidden Markov models (HMMs), have recently been show to outperform other acoustic models such as Gaussian mixture models (GMMs) and deep neural networks (DNNs) for large scale speech recognition. We argue that using multi-state HMMs with LSTM RNN acoustic models is an unnecessary vestige of GMM-HMM and DNN-HMM modelling si...Show More

Context dependent phone models for LSTM RNN acoustic modelling

Andrew Senior;Haşim Sak;Izhak Shafran

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2015 | Conference Paper |

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling;Shi-Yin Kang;Heiga Zen;Andrew Senior;Mike Schuster;Xiao-Jun Qian;Helen M. Meng;Li Deng

IEEE Signal Processing Magazine

Year: 2015 | Volume: 32, Issue: 3 | Magazine Article |

Cited by: Papers (166) | Patents (1)

HTML

Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inp...Show More

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling;Shi-Yin Kang;Heiga Zen;Andrew Senior;Mike Schuster;Xiao-Jun Qian;Helen M. Meng;Li Deng

IEEE Signal Processing Magazine

Year: 2015 | Volume: 32, Issue: 3 | Magazine Article |

A Real-Time End-to-End Multilingual Speech Recognition Architecture

Javier Gonzalez-Dominguez;David Eustis;Ignacio Lopez-Moreno;Andrew Senior;Françoise Beaufays;Pedro J. Moreno

IEEE Journal of Selected Topics in Signal Processing

Year: 2015 | Volume: 9, Issue: 4 | Journal Article |

Cited by: Papers (31)

HTML

Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the mon...Show More

A Real-Time End-to-End Multilingual Speech Recognition Architecture

Javier Gonzalez-Dominguez;David Eustis;Ignacio Lopez-Moreno;Andrew Senior;Françoise Beaufays;Pedro J. Moreno

IEEE Journal of Selected Topics in Signal Processing

Year: 2015 | Volume: 9, Issue: 4 | Journal Article |

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Heiga Zen;Andrew Senior

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Cited by: Papers (101)

HTML

Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, t...Show More

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Heiga Zen;Andrew Senior

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Fine context, low-rank, softplus deep neural networks for mobile speech recognition

Andrew Senior;Xin Lei

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Cited by: Papers (11) | Patents (1)

HTML

We investigate the use of large state inventories and the softplus nonlinearity for on-device neural network based mobile speech recognition. Large state inventories are achieved by less aggressive context-dependent state tying, and made possible by using a bottleneck layer to contain the number of parameters. We investigate alternative approaches to the bottleneck layer, demonstrate the superiori...Show More

Fine context, low-rank, softplus deep neural networks for mobile speech recognition

Andrew Senior;Xin Lei

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

GMM-free DNN acoustic model training

Andrew Senior;Georg Heigold;Michiel Bacchiani;Hank Liao

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Cited by: Papers (15) | Patents (6)

HTML

While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better match...Show More

GMM-free DNN acoustic model training

Andrew Senior;Georg Heigold;Michiel Bacchiani;Hank Liao

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Improving DNN speaker independence with I-vector inputs

Andrew Senior;Ignacio Lopez-Moreno

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Cited by: Papers (98) | Patents (17)

HTML

We propose providing additional utterance-level features as inputs to a deep neural network (DNN) to facilitate speaker, channel and background normalization. Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs). The algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reducti...Show More

Improving DNN speaker independence with I-vector inputs

Andrew Senior;Ignacio Lopez-Moreno

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Asynchronous stochastic optimization for sequence training of deep neural networks

Georg Heigold;Erik McDermott;Vincent Vanhoucke;Andrew Senior;Michiel Bacchiani

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Cited by: Papers (37) | Patents (13)

HTML

This paper explores asynchronous stochastic optimization for sequence training of deep neural networks. Sequence training requires more computation than frame-level training using pre-computed frame data. This leads to several complications for stochastic optimization, arising from significant asynchrony in model updates under massive parallelization, and limited data shuffling due to utterance-ch...Show More

Asynchronous stochastic optimization for sequence training of deep neural networks

Georg Heigold;Erik McDermott;Vincent Vanhoucke;Andrew Senior;Michiel Bacchiani

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2014 | Conference Paper |

Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

Hank Liao;Erik McDermott;Andrew Senior

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

Year: 2013 | Conference Paper |

Cited by: Papers (94) | Patents (9)

HTML

YouTube is a highly visited video sharing website where over one billion people watch six billion hours of video every month. Improving accessibility to these videos for the hearing impaired and for search and indexing purposes is an excellent application of automatic speech recognition. However, YouTube videos are extremely challenging for automatic speech recognition systems. Standard adapted Ga...Show More

Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

Hank Liao;Erik McDermott;Andrew Senior

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

Year: 2013 | Conference Paper |

Multilingual acoustic models using distributed deep neural networks

G. Heigold;V. Vanhoucke;A. Senior;P. Nguyen;M. Ranzato;M. Devin;J. Dean

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

Cited by: Papers (137) | Patents (20)

HTML

Today's speech recognition technology is mature enough to be useful for many practical applications. In this context, it is of paramount importance to train accurate acoustic models for many languages within given resource constraints such as data, processing power, and time. Multilingual training has the potential to solve the data issue and close the performance gap between resource-rich and res...Show More

Multilingual acoustic models using distributed deep neural networks

G. Heigold;V. Vanhoucke;A. Senior;P. Nguyen;M. Ranzato;M. Devin;J. Dean

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

An empirical study of learning rates in deep neural networks for speech recognition

Andrew Senior;Georg Heigold;Marc'Aurelio Ranzato;Ke Yang

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

Cited by: Papers (93) | Patents (2)

HTML

Recent deep neural network systems for large vocabulary speech recognition are trained with minibatch stochastic gradient descent but use a variety of learning rate scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based on our analysis of its limitations, we propose a new variant `AdaDec' that decouples long-term learning-rate scheduling from per-parameter learnin...Show More

An empirical study of learning rates in deep neural networks for speech recognition

Andrew Senior;Georg Heigold;Marc'Aurelio Ranzato;Ke Yang

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

On rectified linear units for speech processing

M.D. Zeiler;M. Ranzato;R. Monga;M. Mao;K. Yang;Q.V. Le;P. Nguyen;A. Senior;V. Vanhoucke;J. Dean;G.E. Hinton

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

Cited by: Papers (277) | Patents (31)

HTML

Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems. The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function. In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic...Show More

On rectified linear units for speech processing

M.D. Zeiler;M. Ranzato;R. Monga;M. Mao;K. Yang;Q.V. Le;P. Nguyen;A. Senior;V. Vanhoucke;J. Dean;G.E. Hinton

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

Statistical parametric speech synthesis using deep neural networks

Heiga Zen;Andrew Senior;Mike Schuster

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

Cited by: Papers (428) | Patents (15)

HTML

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This a...Show More

Statistical parametric speech synthesis using deep neural networks

Heiga Zen;Andrew Senior;Mike Schuster

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Year: 2013 | Conference Paper |

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey Hinton;Li Deng;Dong Yu;George E. Dahl;Abdel-rahman Mohamed;Navdeep Jaitly;Andrew Senior;Vincent Vanhoucke;Patrick Nguyen;Tara N. Sainath;Brian Kingsbury

IEEE Signal Processing Magazine

Year: 2012 | Volume: 29, Issue: 6 | Magazine Article |

Cited by: Papers (7022) | Patents (159)

HTML

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of...Show More

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey Hinton;Li Deng;Dong Yu;George E. Dahl;Abdel-rahman Mohamed;Navdeep Jaitly;Andrew Senior;Vincent Vanhoucke;Patrick Nguyen;Tara N. Sainath;Brian Kingsbury

IEEE Signal Processing Magazine

Year: 2012 | Volume: 29, Issue: 6 | Magazine Article |

Learning improved linear transforms for speech recognition

Andrew Senior;Youngmin Cho;Jason Weston

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2012 | Conference Paper |

Cited by: Papers (1)

HTML

This paper explores a novel large margin approach to learning a linear transform for dimensionality reduction in speech recognition. The method assumes a trained Gaussian mixture model for each class to be discriminated and trains a dimensionality-reducing linear transform with respect to the fixed model, optimizing a hinge loss on the difference between the distance to the nearest in- and out-of-...Show More

Learning improved linear transforms for speech recognition

Andrew Senior;Youngmin Cho;Jason Weston

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2012 | Conference Paper |

Translation-Inspired OCR

Dmitriy Genzel;Ashok C. Popat;Nemanja Spasojevic;Michael Jahr;Andrew Senior;Eugene Ie;Frank Yung-Fong Tang

2011 International Conference on Document Analysis and Recognition

Year: 2011 | Conference Paper |

Cited by: Papers (3) | Patents (1)

HTML

Optical character recognition is carried out using techniques borrowed from statistical machine translation. In particular, the use of multiple simple feature functions in linear combination, along with minimum-error-rate training, integrated decoding, and N-gram language modeling is found to be remarkably effective, across several scripts and languages. Results are presented using both synthetic ...Show More

Translation-Inspired OCR

Dmitriy Genzel;Ashok C. Popat;Nemanja Spasojevic;Michael Jahr;Andrew Senior;Eugene Ie;Frank Yung-Fong Tang

2011 International Conference on Document Analysis and Recognition

Year: 2011 | Conference Paper |

Privacy enablement in a surveillance system

2008 15th IEEE International Conference on Image Processing

Year: 2008 | Conference Paper |

Cited by: Papers (7) | Patents (1)

HTML

This paper presents mechanisms for privacy protection in a distributed, multicamera surveillance system. The design choices and alternatives for providing privacy protection while delivering meaningful surveillance data for security and retail environments are described, followed by performance metrics to evaluate the effectiveness of privacy protection measures and experiments to evaluate these i...Show More

Privacy enablement in a surveillance system

2008 15th IEEE International Conference on Image Processing

Year: 2008 | Conference Paper |

IEEE Personal Account

Change username/password

Purchase Details

Payment Options
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

Follow

About IEEE Xplore | Contact Us | Help | Accessibility | Terms of Use | Nondiscrimination Policy | IEEE Ethics Reporting | Sitemap | IEEE Privacy Policy

A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

© Copyright 2025 IEEE - All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies.

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

About IEEE Xplore
Contact Us
Help
Accessibility
Terms of Use
Nondiscrimination Policy
Sitemap
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.
© Copyright 2025 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.