João de Almeida Varela Graça,

D – Departamento de Engenharia Informática, Instituto Superior Técnico, Universidade de Lisboa

Abstract:

Natural Language processing (NLP) systems are typically characterized by a pipeline architecture, in which several independently develop NLP tools connected as a chain of filters apply successive transformations to the data that flows through the system. Hence when integrating such tools, one may face problems that lead to information loss, such as: i) tools discard information from their input which is required by other tools; ii) each tool has its own input/output format;

This work proposes a solution to these problems, by using a client server architecture, where the server acts as a blackboard where all tools add/consult the data. The data is kept in the repository under a conceptual model independent of the client tools, which allows the representation of a broad range of linguistic information.

The tools interact with the repository through a generic remote interface which allows the creation of new data and the navigation through all the existing data. Moreover, this work provides libraries implemented in several programming language that abstract the connection and communication protocol details between the NLP tools and the server, and provide several levels of functionality that simplify the creation of NLP tools.

Keywords: Natural Language processing systems, Natural Language tools integration, Repository, Linguistic Annotation, Data lineage, Information loss.

 

Date: 2006-Apr-26     Time: 14:30:00     Room: ANFITEATRO PA-3 DO EDÍFICIO DE PÓS-GRADUAÇÃO DO IST


For more information: