A Sequential Method for Discovering Probabilistic Motifs in Proteins

K. Blekas; D. I. Fotiadis; A. Likas

doi:10.1055/s-0038-1633414

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2004; 43(01): 9-12
DOI: 10.1055/s-0038-1633414

Original Article

Schattauer GmbH

A Sequential Method for Discovering Probabilistic Motifs in Proteins

Authors

K. Blekas

¹Department of Computer Science, University of Ioannina, and Biomedical Research Institute, Foundation for Research and Technology – Hellas, Ioannina, Greece
D. I. Fotiadis

¹Department of Computer Science, University of Ioannina, and Biomedical Research Institute, Foundation for Research and Technology – Hellas, Ioannina, Greece
A. Likas

¹Department of Computer Science, University of Ioannina, and Biomedical Research Institute, Foundation for Research and Technology – Hellas, Ioannina, Greece

Further Information

Publication History

Publication Date:
07 February 2018 (online)

Permissions and Reprints

Summary

Objectives: This paper proposes a greedy algorithm for learning a mixture of motifs model through likelihood maximization, in order to discover common substrings, known as motifs, from a given collection of related biosequences.

Methods: The approach sequentially adds a new motif component to a mixture model by performing a combined scheme of global and local search for appropriately initializing the component parameters. A hierarchical clustering scheme is also applied initially which leads to the identification of candidate motif models and speeds up the global searching procedure.

Results: The performance of the proposed algorithm has been studied in both artificial and real biological datasets. In comparison with the well-known MEME approach, the algorithm is advantageous since it identifies motifs with significant conservation and produces larger protein fingerprints.

Conclusion: The proposed greedy algorithm constitutes a promising approach for discovering multiple probabilistic motifs in biological sequences. By using an effective incremental mixture modeling strategy, our technique manages to successfully overcome the limitation of the MEME scheme which erases motif occurrences each time a new motif is discovered.

Keywords

Motif discovery - mixture of motifs - EM algorithm - protein fingerprints - MEME algorithm

References
1 Attwood TK, Croning MDR, Flower DR, Lewis AP, Mabey JE, Scordis P, Selley J, Wright W. PRINT-S: the database formerly known as PRINTS. Nucleic Acids Research 2000; 28 (01) 225-7.

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Rigoutsos I, Floratos A, Parida L, Gao Y, Platt D. The Emergency of Pattern Discovery Techniques in Computational Biology. Metabolic Engineering 2000; (02) 159-77.

PubMed Search in Google Scholar
Download RIS citation
3 Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwland AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 1993; 226: 208-14.

Search in Google Scholar
Download RIS citation
4 Bailey TL, Elkan C. Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning 1995; 21: 51-83.

Search in Google Scholar
Download RIS citation
5 McLachlan GM, Peel P. Finite Mixture Models. New York: John Wiley & Sons, Inc; 2001

Search in Google Scholar
Download RIS citation
6 Vlassis N, Likas A. A greedy EM algorithm for Gaussian mixture learning. Neural Processing Letters 2002; 15 (01) 77-87.

Crossref Search in Google Scholar
Download RIS citation
7 Bentley JL. Multidimensional binary search trees used for associative searching. Commun ACM 1975; 18 (09) 509-17.

Crossref Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

A Sequential Method for Discovering Probabilistic Motifs in Proteins

Authors

Publication History

Summary

Keywords

References