Who spoke when
University of Ljubljana –
The thesis addresses the problem of structuring the audio data in terms of speakers, i.e., finding the regions in the audio streams that belong to one speaker and joining each region of the same speaker together. The task of organizing the audio data in this way is known as speaker diarization and was first introduced in the NIST project of Rich Transcription in “Who spoke when” evaluations. The speaker-diarization problem is composed of several tasks. This thesis addresses three of them: speech/non-speech segmentation, speaker- and background-change detection, and speaker clustering.
The main objectives in our research were to develop new representations of audio data that were more suitable for each task and to improve the accuracy and increase the robustness of standard approaches under various acoustic and environmental conditions. The motivation for the improvement of the existing methods and the development of new procedures for speaker-diarization tasks is the design of a system for the speaker-based audio indexing of broadcast news shows.
Date: 2006-Nov-23 Time: 15:00:00 Room: INESC ID, 4th floor meeting room
For more information:
INESC-ID ESR Talks – February 2023
If you are a masters/PhD student or a postdoctoral fellow, come and present your work in an informal and friendly environment – and savour some tasty snacks!
Individual talks will be 10-15 minutes plus time for feedback. Enroll on your selected date by emailing pedro.ferreira[at]inesc-id.pt.
Happening on the second Wednesday of every month (4pm-5pm):
- 15 February (Alves Redol, Room 9)
- 15 March (Alves Redol, Room 9)
- 12 April (Alves Redol, Room 9)
- 10 May (Alves Redol, Room 9)
- 14 June (Alves Redol, Room 9)
- 12 July (Alves Redol, Room 9)
We hope to see you there!