Speeding up information extraction using sub-optimal algorithms
Gonçalo Fernandes Simões,
INESC-ID Lisboa and IST –
Abstract:
Information Extraction (IE) proposes techniques capable of extracting, from unstructured text, relevant segments in a given domain and represent them in a structured format. Most of the scientific proposals in IE so far aim at increasing the accuracy of the extraction results. However, the existing IE techniques still have
efficiency problems when processing large data volumes. IE optimization aims at executing IE processes as fast as possible with minimal or no impact in the accuracy results.
In this talk, we will first describe the state of the art in IE optimization. Then, we will present a novel approach for IE optimization. The key idea is to make IE programs faster by using sub-optimal extraction algorithms, which are are typically fast but
may produce some erroneous results or not produce some of the results of traditional algorithms (thus, leading to a negative impact on the recall and precision values). We propose a cost model that is able to evaluate not only the expected execution time of a given IE execution plan but also the quality of the results produced, in terms of the expected number of good and bad tuples. Using this cost model, our solution is able to choose a fast execution that is able to fulfill a set of objectives imposed by a user (e.g., minimum number of good tuples desired, minimum precision desired). Finally, we will report the preliminary experimental results obtained with two data sets and three IE programs, that show the gains brought by our approach with respect to the state-of-the-art solutions.
Date: 2011-Apr-15 Time: 16:00:00 Room: N7.1
For more information:
Upcoming Events
INESC Brussels HUB Winter Meeting 2023

This edition of the HUB Winter Meeting will be co-organised with Science Business and will take place on the 30 and 31 January, in Lisbon, at Instituto Superior Técnico, Department of Computer Science and Engineering.
Please see below a summary of the agenda, this will be updated on the INESC Brussels HUB website regularly (confirmed speakers and other relevant info). Places for onsite participation are limited so registration is mandatory. Online participants will be sent a ZOOM link for each specific session on the 27th January.
INESC Brussels HUB website: https://hub.inesc.pt/
Monday, 30 January
a) Digital Europe Programme & Chips Act: state of play and possibilities for INESC.
9h to 10h30 GMT
(Exclusive for INESC researchers and administrators).
b) Science Business: how can INESC tap into Science Business network, activities and communications tools.
(Exclusive for INESC researchers and administrators).
c) Networking Lunch (for all onsite participants).
d) Roundtable: From rhetoric to reality – Embedding international strategy in the DNA of research organisations.
(Closed-door, roundtable workshop, Chatham House rules, open to INESC researchers and administrators, external participants by invitation only).
e) Networking Dinner
(By invitation only – INESC researchers participating onsite in the event are elegible to join).
Tuesday, 31 January
f) Workshop: How they did it? Strategic positioning for structural success in Horizon Europe: a discussion of best practices.
(Exclusive for INESC researchers, administrators and international invited speakers).
g) The public consultation on European R&I Programmes: Towards FP10.
(Closed-door, roundtable workshop, Chatham House rules, open to INESC researchers and administrators, external participants by invitation only).
h) Networking Lunch (for all onsite participants).
i) Management Committee meeting (Directors and POB members)
The HUB Winter Meeting aims at bringing together researchers and administrators from the 5 INESC institutes, affiliated higher education institutions in Portugal and abroad, with key European and global players, to:
– Discuss key research and innovation issues at EU level.
– Inform institutional policy and strategy.
– Exchange best-practices about R&I management, career development and policy positioning.
– Promote, discuss and deliver vision, visibility, networking and impactful communication.
– Create, identify and deepen partnerships and collaboration opportunities for collaborative R&I.