Ivo Anastácio,

INESC-ID Lisboa and IST

Abstract:

This talk will present participation of the DMIR/INESC-ID team on the Monolingual Knowledge Base Population (KBP) track at the 2011 Text Analysis Conference (TAC-KBP — http://nlp.cs.qc.cuny.edu/kbp/2011/), for which we developed a supervised learning approach to the task of linking named entities in an english text to global unique identifiers. Formally, the KBP problem consists in linking named entities (queries) and the respective texts where they occur, to the corresponding entries in a knowledge base (i.e., a subset of the English Wikipedia). If there are no such entries, the systems are required to cluster together the queries referring to the same non-KB (NIL) real world entity. We modeled the aforementioned problem through three supervised learning tasks, namelly (a) ranking candidate knowledge base disambiguations for each named entity, (b) classifying the top-ranked disambiguations as correct or not (i.e., finding the NIL queries), and (c) classifying pairs of queries with an estimated incorrect disambiguation as referring to the same entity or not, so that the transitive closure of these pairs can form the set of equivalence classes (i.e., the NIL clusters). The talk we detail the features and the learning methods used for modeling the three tasks, also presenting an analysis of the obtained results.

 

Date: 2011-Oct-18     Time: 15:30:00     Room: 336


For more information: