Faculdade de Ciências de Universidade de Lisboa –
This seminar will discuss the application of text mining to automate the identiﬁcation
of the function of large sets of genes from the biomedical literature. An approach
will be presented to obtain this knowledge as annotations that associate biologic entities
to Gene Ontology terms. This approach was validated by building the APEG
(Arabidopsis Pollen Expressed Genes) database system, which integrates information
about 147 pollen selectively expressed genes of Arabidopsis thaliana, from various public
databases available on the Web. APEG operates with ProFAL, a text mining and
automatic database annotation tool. The eﬀectiveness of the automatic annotation
of the genes was evaluated by comparing the set of annotations discovered by Pro-
FAL with those obtained by domain experts scanning the same literature. Functional
annotations were extracted with an average precision and recall of 61% and 78%, respectively.
ProFAL has also identiﬁed 21 probable functions for 8 genes, which, to
the best of my knowledge, have not been documented. The validation of the proposed
approach was done using an interactive web interface with curator speciﬁc features.
The results show that mining the biomedical literature can eﬀectively increase our
knowledge about a set of genes or proteins of interest, leading to more conclusive
answers to the underlying biological problems.
For more information: