Sentence End and Punctuation Prediction in NLG Text at SwissText2021

A group of researchers from INESC-ID, in partnership with Unbabel, are one of the winning teams of the Shared Task “Sentence End and Punctuation Prediction” at SwissText2021.

“This was a very competitive and in the end we are really pleased to be one of the winning teams! For our team this was the icing on the cake, after 1 year of collaboration on the topic of Automatic Rich Transcription (ART)”, mentioned Ricardo Rei, one of the INESC-ID researchers involved.

7 teams participated, with the “Unbabel-INESC-ID” team taking first place, along with 2 other teams.

The goal of the shared task was to build models for identifying the end of a sentence by detecting an appropriate position for putting an appropriate punctuation mark. Specifically, we offer the following subtasks:

Subtask 1 (fully unpunctuated sentences-full stop detection): Given the textual content of an utterance where the full stops are fully removed, correctly detect the end of sentences by placing a full stop in appropriate positions.

Subtask 2 (fully unpunctuated sentences- full punctuation marks): Given the textual content of an utterance where all punctuation marks are fully removed, correctly predict all punctuation marks.

Due to the extremely close results of the first three solutions, the evaluators decided to announce the first three high-performing systems as the joint shared task winners.

“We tried to look at the results from different angles, but couldn’t find any decisive criteria to select a single winner based on the scores. Congratulations to the HTW+t2k, Onpoint, and Unbabel-INESC-ID teams for their fine works” mentioned the organizers Don Tuggener and Ahmad Aghaebrahimian.

The Unbabel-INESC-ID team is made up of researchers Ricardo Rei, Nuno Guerreiro, Luisa Coheur and Fernando Batista.

News

Sentence End and Punctuation Prediction in NLG Text at SwissText2021