Luis Coelho,

R – INESC-ID Lisboa

Abstract:

The problem we address is text indexing for approximate matching. We consider that we are given a text T which undergoes some preprocessing to generate an index. We can later query this index to identify the places where a string occurs up to a certain number of allowed substitutions k. We present a structure for indexing which occupies space O(n log^k n) in the average case, independent of alphabet size, n being the text size. This structure support searching in O(m^{k+1}) time, for patterns of size m, again independent of alphabet size. The construction of the structure has time bound by O(kN|S|), where N is the number of nodes in the index and |S| the alphabet size.

 

Date: 2005-Nov-18     Time: 14:30:00     Room: 336


For more information: