Machine Translation for Microblogs - Tradução Automática para Microblogues (M4TM)

Type: National Project

Duration: from 2015 Jan 01 to 2015 Dec 01

Financed by: CMUP-EPB/TIC/0026/2013

Prime Contractor: INESC-ID (Other)

Our proposed work will improve MT in the microblog domain, contributing to breaking down linguistic barriers that exist in this increasingly important variety of content. As a starting point, we use phrase.based normalizer which our team has previously developed and shown to be useful in microblog translation. We seek to improve this already state-of-the-art baseline by adding structured abstraction to the normalizer model and effectively integrating it in the MT module, by jointly optimizing the normalization and the translation models. Improvements will be measured through the number of post-edits done by humans enrolled in a crowd translation platform . Additionally, this data will be used as a source of future training data.

Partnerships

  • INESC-ID (Other)
  • Unbabel (Company) - Lisboa, Portugal

Principal Investigators

Members