DOTTED SUFFIX TREES: A STRUCTURE FOR APPROXIMATE TEXT INDEXING
Luis Coelho,
D – Departamento de Engenharia Informática, Instituto Superior Técnico, Universidade de Lisboa –
Abstract:
The problem we address is text indexing for approximate matching. We consider that we are given a text T which undergoes some preprocessing to generate an index. We can later query this index to identify the places where a string occurs up to a certain number of allowed errors k, where by error we mean the substitution, deletion or replacement of one character (edition or Levenstein distance).
We present a structure for indexing which occupies space O(n log^k n) in the average case, independent of alphabet size, n being the text size. This structure can be used to report the existence of a match with k errors in O (3k mk+1 ) and to report the occurrences in O(3^k m^{k+1} + ed) time, where m is the length of the pattern and where ed the number of matching edit scripts. These bounds are independent of alphabet size. The construction of the structure has time bound by O (k N |S|), where N is the number of nodes in the index and |S| the alphabet size.
Date: 2006-Jul-06 Time: 16:30:00 Room: 336
For more information:
Upcoming Events
INESC-ID ESR Talks – February 2023

If you are a masters/PhD student or a postdoctoral fellow, come and present your work in an informal and friendly environment – and savour some tasty snacks!
Individual talks will be 10-15 minutes plus time for feedback. Enroll on your selected date by emailing pedro.ferreira[at]inesc-id.pt.
Happening on the second Wednesday of every month (4pm-5pm):
- 15 February (Alves Redol, Room 9)
- 15 March (Alves Redol, Room 9)
- 12 April (Alves Redol, Room 9)
- 10 May (Alves Redol, Room 9)
- 14 June (Alves Redol, Room 9)
- 12 July (Alves Redol, Room 9)
We hope to see you there!