Insertion and deletion are operations that occur commonly in DNA processing and RNA editing. Since biological macromolecules can be viewed as symbols, gene sequences can be represented as strings and structures can be interpreted as languages. This suggests that the bio-molecular structures that occur at different levels can be theoretically studied by formal languages. In the literature, there is no unique grammar formalism that captures various bio-molecular structures. To overcome this deficiency, in this paper, we introduce a simple grammar model called the matrix insertion–deletion system, and using it we model several bio-molecular structures that occur at the intramolecular, intermolecular and RNA secondary levels.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Boullier P. and Sagot B. (2011). Multi-component tree insertion grammars in P. De Groote et al. (Eds.) Formal Grammar 2009 Lecture Notes in Artificial Intelligence Vol. 5591 Springer Berlin/Heidelberg pp. 31–46.
Brendel V. and Busse H.G. (1984). Genome structure described by formal languages Nucleic Acids Research12(5): 2561–2568.
Brown M. and Wilson C. (1995). RNA pseudoknot modelling using intersections of stochastic context free grammars with applications to database search Proceedings of the Pacific Symposium on Biocomputing Big Island HI USA pp. 109–125.
Cai L. Russell L. and Wu Y. (2003). Stochastic modelling of RNA pseudoknotted structures: A grammatical approach Bioinformatics19(1): 66–73.
Calude C.S. and Paŭn Gh. (2001). Computing with Cells and Atoms: An Introduction to Quantum DNA and Membrane Computing Taylor and Francis London.
Chiang D. Joshi A.K. and Searls D.B. (2006). Grammatical representations of macromolecular structure Journal of Computational Biology13(5): 1077–1100.
Dong S. and Searls D.B. (1994). Gene structure prediction by linguistic methods Genomics23(3): 540–551.
Dorigo M. and Stutzle T. (2004). Ant Colony Optimization MIT Press Cambridge MA.
Durbin R. Eddy S. Krogh A. and Mitchison G. (1998). Biological Sequence Analysis Cambridge University Press Cambridge.
Eiben A.E. and Smith J.E. (2003). Introduction to Evolutionary Computing Springer Berlin/Heidelberg.
Galiukschov B.S. (1981). Semicontextual grammars Matematicheskaya Logika i Matematicheskaya Lingvistika: 38–50 (in Russian).
Goldberg E.D. (1989). Genetic Algorithms in Search Optimization and Machine Learning Addison-Wesley Boston MA.
Haussler D. (1982). Insertion and Iterated Insertion as Operations on Formal Languages Ph.D. thesis University of Colorado Boulder CO.
Haussler D. (1983). Insertion languages Information Science131(1): 77–89.
Head T. (1987). Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors Bulletin of Mathematical Biology49(6): 737–750.
Kuppusamy L. Mahendran A. and Krishna S.N. (2011a). Matrix insertion–deletion systems for bio-molecular structures in R. Natarajan and A. Ojo (Eds.) ICDCIT-2011 Lecture Notes in Computer Science Vol. 6536 Springer Berlin/Heidelberg pp. 301–311.
Kuppusamy L. Mahendran A. and Clergerie E.V. (2011b). Modelling intermolecular structures and defining ambiguity in gene sequences using matrix insertion–deletion systems in biology computation and linguistics in G.B. Enguix et al. (Eds.) New Interdisciplinary Paradigms IOS Press Amsterdam pp. 71–85.
Lyngso R.B. Zuker M. and Pedersen C.N.S. (1999). Internal loops in RNA secondary structure prediction RECOMB99 Proceedings of the 3rd International Conference on Computational Molecular Biology Lyon France pp. 260–267.
Lyngso R.B. and Pedersen C.N.S. (2000). Pseudoknots in RNA secondary structure RECOMB00 Proceedings of the 4th Annual International Conference on Computational Molecular Biology Tokyo Japan pp. 201–209.
Mamitsuka H. and Abe N. (1994). Prediction of beta-sheet structures using stochastic tree grammars Proceedings of the 5th Workshop on Genome Informatics Yokohama Japan pp. 19–28.
Pardo M.A.A. Clergerie E.V. and Ferro M.V. (1997). Automata-based parsing in dynamic programming for LIG in A.S. Narinyani (Ed.) Proceedings of the DIALOGUE’97 Computational Linguistics and Its Applications Workshop Moscow Russia pp. 22–27.
Păun Gh. Rozenberg G. and Salomaa A. (1998). DNA Computing: New Computing Paradigms Springer Berlin/Heidelberg.
Păun Gh. (2002). Membrane Computing: An Introduction Springer Berlin/Heidelberg.
Petre I. and Verlan S. (2012). Matrix insertion–deletion systems Theoretical Computer Science456: 80–88.
Rivas E. and Eddy S.R. (2000). The language of RNA: A formal grammar that includes pseudoknots Bioinformatics16(4): 334–340.
Rozenberg G. and Salomaa A. (1997). Handbook of Formal Languages Vol. 1 Springer New York NY.
Sakakibara Y. Brown R. Hughey R. Mian I.S. Sjolander K. Underwood R.C. and Haussler D. (1996). Stochastic context-free grammars for tRNA modelling Nucleic Acids Research22(23): 5112–5120.
Sakakibara Y. (2003). Pair hidden Markov models on tree structures Bioinformatics19(1): 232–240.
Searls D.B. (1988). Representing genetic information with formal grammars Proceedings of the National Conference on Artificial Intelligence Saint Paul MN USA pp. 386–391.
Searls D.B. (1992). The linguistics of DNA American Scientist80(6): 579–591.
Searls D.B. (1993). The computational linguistics of biological sequences in L. Hunter (Ed.) Artificial Intelligence and Molecular Biology AAAI Press Paolo Alto CA pp. 47–120.
Searls D.B. (1995). Formal grammars for intermolecular structures 1st International IEEE Symposium on Intelligence and Biological Systems Washington DC USA pp. 30–37.
Searls D.B. (2002). The language of genes Nature420(6912): 211–217.
Theis C. Janssen S. and Giegerich R. (2010). Prediction of RNA secondary structure including kissing hairpin motifs Proceedings of WABI 2010 Liverpool UK pp. 52–64.
Uemura Y Hasegawa A. Kobayashi S. and Yokomori T. (1999). Tree adjoining grammars for RNA structure prediction Theoretical Computer Science210(2): 277–303.
Yuki S. and Kasami T. (2006). RNA pseudoknotted structure prediction using stochastic multiple context-free grammar IPSJ Transactions on Bioinformatics47: 12–21.