MAIA - PT2020/45909/20 - BI| 2022/272
Type of Position: Research Fellowship (Bolsa de Investigação)
Type of Contract: Research grant
Duration: 9 Months
Closed at: 2022-May-13
Chit-chat dialogue systems are currently trained in an end-to-end fashion with large collections of text corpora, resulting in pre-trained models that can be fine-tuned to various dialogue tasks. However, and as stated by several authors, one of the bottlenecks of these systems is that they do not display a consistent profile/persona/personality, a characteristic that is essential in task-oriented dialogues, such as in customer support settings, where, for instance, the (in)formality of the conversation/bot should be constant. We believe that to improve the current state-of-the-art of chit-chat dialogue systems, more specific annotated datasets are needed. Nevertheless, there is a lack of such datasets. Exceptions are, for instance, the Persona-Chat dataset and subtitles datasets such as the Cornell Movie-Dialogs Corpus and the Friends dataset, that include basic profile information about the speakers. Nevertheless, all these corpora only exist for English, and are limited in the number of profiles/personas involved. Although not usually abundantly annotated, movie/series scripts contain information that could help improve chit-chat models: in movie scripts, each character line identifies the speaker, however the persona of the speaker is usually known. On the other hand, most subtitles datasets do not have this information, but exist in large quantities, for many languages, and are publicly available. In this work, the candidate will explore how to extract and transfer profile/persona traits from movie/series scripts/subtitles datasets with the purpose of improving chit-chat dialogue systems. We will take advantage of deep learning models, but we will also resort to rule-base systems if needed. Moreover, we will take advantage of recent studies using latent action representations (VAEs, GANs, etc.) to capture persona features and speaker’s characteristics, and thus transfer this learned knowledge to other dialogue tasks.
Maria Luísa Torres Ribeiro Marques da Silva Coheur