Speech Translation Advanced Research to and from Portuguese (PT-STAR)

Type: National Project Project

Duration: from 2009 May 01 to 2012 Jul 31

Financed by: FCT Carnegie Mellon

Prime Contractor: R - INESC-ID Lisboa (Other) - Lisboa, Portugal

Each year, more than a billion Euros is spent translating documents and interpreting speeches by European institutions. Also, about half of the Europeans speak only its own language. Just these two facts per se are a strong motivation for the fostering of Speech-to-Speech Machine Translation (S2SMT) technologies, which aim at enabling natural language communication between people that do not share the same language. S2SMT can be seen as a cascade of three major components: Automatic Speech Recognition, Machine Translation and Text-to-Speech Synthesis. One of the main problems of this multidisciplinary area, however, is the still weak integration between the three components. The main goal of PT-STAR (Speech Translation Advanced Research to and from Portuguese) is to improve speech translation systems for Portuguese by strengthening this integration. Within this project, several problems are envisaged, such as spontaneous speech translation – for which the performance of the automatic speech recognizer component seriously degrades – and voice conversion – which allows the synthesized speech to retain the characteristics of the original voice. Moreover, several major problems in statistical machine translation are addressed, as for instance the study of different methods to automatically extract bilingual lexicon from non-aligned parallel corpora and to update the translation model. Finally, PT-STAR targets the implementation of a proof of concept prototype. PT-STAR involves on the CMU side the Language Technologies Institute (LTI), and on the Portuguese side a consortium of universities and research centers: the Spoken Language Systems Lab (L2F) of INESC-ID Lisboa, the Center of Linguistics of the University of Lisbon (CLUL), and the University of Beira Interior (UBI). Additionally, a third language (Chinese) will be the target of a PhD thesis on machine translation, from University of Macau. The informal cooperation of this University in the framework of the current proposal will therefore contribute to enhance its scope, encompassing typologically different languages.

Partnerships

  • Fundação da Universidade de Lisboa - CLUL (Other) - Lisbon, Portugal
  • R - INESC-ID Lisboa (Other) - Lisboa, Portugal
  • U - Carnegie Mellon University (University) - Pittsburgh, PA, USA
  • Universidade da Beira Interior (University) - Covilhã, Portugal

Principal Investigators

Members