A Deep Neural Network Approach to Speech Enhancement
Georgia Institute of Technology –
In contrast to the conventional minimum mean square error (MMSE) based noise reduction techniques, we formulate speech enhancement as finding a mapping function between noisy and clean speech signals. In order to be able to handle a wide range of additive noises in real-world situations, a large training set, encompassing many possible combinations of speech and noise types, is first designed. Next a deep neural network (DNN) architecture is employed as a nonlinear regression function to ensure a powerful modeling capability. Several techniques have also been adopted to improve the DNN-based speech enhancement system, including global variance equalization to alleviate the over-smoothing problem of the regression model, and dropout and noise-aware training strategies to further improve the generalization capability of DNNs to unseen noise conditions. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the MMSE based techniques. It is also interesting to observe that the proposed DNN approach can well suppress the highly non-stationary noise, which is tough to handle in general. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world scenarios without generating the annoying musical artifact commonly observed in conventional enhancement methods.
[ Bio ]
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had 20 years of industrial experience ending in Bell Laboratories, Murray Hill, New Jersey, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 400 papers and 30 patents, and was highly cited for his original contributions with an amazing h-index of 66. He received numerous awards, including the Bell Labs President’s Gold Award in 1998. He won the SPS’s 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition”. In 2012 he was invited by ICASSP to give a plenary talk on the future of speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition”.
Date: 2014-May-26 Time: 15:30:00 Room: QA1.2 (IST Alameda)
For more information:
INESC-ID ESR Talks – February 2023
If you are a masters/PhD student or a postdoctoral fellow, come and present your work in an informal and friendly environment – and savour some tasty snacks!
Individual talks will be 10-15 minutes plus time for feedback. Enroll on your selected date by emailing pedro.ferreira[at]inesc-id.pt.
Happening on the second Wednesday of every month (4pm-5pm):
- 15 February (Alves Redol, Room 9)
- 15 March (Alves Redol, Room 9)
- 12 April (Alves Redol, Room 9)
- 10 May (Alves Redol, Room 9)
- 14 June (Alves Redol, Room 9)
- 12 July (Alves Redol, Room 9)
We hope to see you there!