Speeding up information extraction using sub-optimal algorithms
Gonçalo Fernandes Simões,
INESC-ID Lisboa and IST –
Information Extraction (IE) proposes techniques capable of extracting, from unstructured text, relevant segments in a given domain and represent them in a structured format. Most of the scientific proposals in IE so far aim at increasing the accuracy of the extraction results. However, the existing IE techniques still have
efficiency problems when processing large data volumes. IE optimization aims at executing IE processes as fast as possible with minimal or no impact in the accuracy results.
In this talk, we will first describe the state of the art in IE optimization. Then, we will present a novel approach for IE optimization. The key idea is to make IE programs faster by using sub-optimal extraction algorithms, which are are typically fast but
may produce some erroneous results or not produce some of the results of traditional algorithms (thus, leading to a negative impact on the recall and precision values). We propose a cost model that is able to evaluate not only the expected execution time of a given IE execution plan but also the quality of the results produced, in terms of the expected number of good and bad tuples. Using this cost model, our solution is able to choose a fast execution that is able to fulfill a set of objectives imposed by a user (e.g., minimum number of good tuples desired, minimum precision desired). Finally, we will report the preliminary experimental results obtained with two data sets and three IE programs, that show the gains brought by our approach with respect to the state-of-the-art solutions.
Date: 2011-Apr-15 Time: 16:00:00 Room: N7.1
For more information:
Workshop “Metabolism and mathematical models: Two for a tango” – 2nd Edition
Title: Workshop Metabolism and mathematical models: Two for a tango – 2nd Edition
Dates: October 25-26, 2022
Location: This workshop will be held in a virtual way
The topic of this workshop is metabolism in general, with a special focus, although not exclusive, on parasitology. Besides an exploration of the biological, biochemical and biomedical aspects, the workshop will also aim at presenting some of the mathematical modelling, algorithmic theory and software development that have become crucial to explore such aspects.
This workshop is being organised in the context of two projects, both with the Inria European Team Erable. One of the projects involves a partnership with the University of São Paulo (USP), in São Paulo, Brazil, more specifically the Institute of Mathematics and Statistics (IME) and the Institute of Biomedical Sciences – Inria Associated Team Capoeira – and the other involves the Inesc-ID/IST in Portugal, ETH in Zürich and EMBL in Heidelberg – H2020 Twinning Project Olissipo.
The workshop is open to all members of these two projects but also, importantly, to the community in general.
The program and more details are available here.