Towards an Encyclopaedia of Sequence Biology

Open access


In this review, I have presented several topics relevant to the present state and to the future state of the scientific field that I propose to call sequence biology (SB). In some pertinent publications, this field was called DNA linguistics. At the heart of SB lies a concept of a sequence code. In this review, I discussed three concepts: a concept of SB, a concept of encyclopaedia of genetic codes, and a concept of a corpus DNA linguistics.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Brendel V. Beckman J.S. Trifonov E.N. 1986. Linguistics of nucleotide sequences: Morphology and comparison of vocabularies. Journal of Biomolecular Structure and Dynamics 4 11–21.

  • [2] Trifonov E.N. Brendel V. 1987. Gnomic: Dictionary of genetic codes. Rehovot: Balaban Publishers 1986; Wiley-VCH Verlag GmbH.

  • [3] Pevzner P.A. Borodovsky M.Y. Mironov A.A. 1989. Linguistics of nucleotide sequences. I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. Journal of Biomolecular Structure and Dynamics 6 1013–1026.

  • [4] Bolshoy A. Volkovich Z. Kirzhner V. et al. 2010. Genome clustering: From linguistics models to classification of genetic texts Studies in Computational Intelligence Berlin: Springer-Verlag.

  • [5] Trifonov E.N. 1989. The multiple codes of nucleotide sequences. Bulletin of Mathematical Biology 51 417–432.

  • [6] Pevzner P. 2000. Computational molecular biology: An algorithmic approach. Cambridge MA: MIT Press.

  • [7] Brazma A. Jonassen I. Eidhammer I. et al. 1998. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5 279–305.

  • [8] Salama R.A. Stekel D.J. 2013. A nonindependent energy-based multiple sequence alignment improves prediction of transcription factor binding sites. Bioinformatics 29 2699–2704.

  • [9] Barbieri M. 2015. Code biology. A new science of life. Dordrecht: Springer.

  • [10] Trifonov E.N. 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16s rRNA nucleotide sequence. Journal of Molecular Biology 194 643–652.

  • [11] Trifonov E.N. 1999. Elucidating sequence codes: Three codes for evolution. Annals of the New York Academy of Sciences 870 330–338.

  • [12] Crick F.H. 1963. On the genetic code. Science 139 461–464.

  • [13] Turner B.M. 2000. Histone acetylation and an epigenetic code. Bioessays 22 836–845.

  • [14] Turner B.M. 2007. Defining an epigenetic code. Nature Cell Biology 9 2–6.

  • [15] Barbieri M. 2012. Code biology – a new science of life. Biosemiotics 5 411–437.

  • [16] Schrödinger E. 1944. What is life? The physical aspect of the living cell. Cambridge UK: Cambridge University Press.

  • [17] Crick F. 1989. What mad pursuit. London: Penguin.

  • [18] Watson J. 1981. The double helix. London: Weidenfeld and Nicholson.

  • [19] Derry J.F. 2004. Review of what is life? by Erwin Schrödinger. Human Nature Review 4 124–125.

  • [20] Cobb M. 2015. Life’s greatest secret: The race to crack the genetic code. London: Profile Books.

  • [21] McGeoch D.J. 1987. Books in brief TiBS 12 165.

  • [22] Kozak M. 1999. Initiation of translation in prokaryotes. Gene 234 187–208.

  • [23] Belinky F. Rogozin I.B. Koonin E.V. 2017. Selection on start codons in prokaryotes and potential compensatory nucleotide substitutions. Scientific Reports 7(1) 12422. doi: 10.1038/s41598-017-12619-6.

  • [24] Villegas A. Kropinski A.M. 2008. An analysis of initiation codon utilization in the domain bacteria– concerns about the quality of bacterial genome annotation. Microbiology 154 2559–2661.

  • [25] Rocha E. P. Danchin A. Viari A. 1999. Translation in Bacillus subtilis: Roles and trends of initiation and termination insights from a genome analysis. Nucleic Acids Research 27 3567–3576.

  • [26] Moll I. Grill S. Gualerzi C.O. et al. 2002. Leaderless mRNAs in bacteria: Surprises in ribosomal recruitment and translational control. Molecular Microbiology 43 (1) 239–246.

  • [27] Genome database available at: <>.

  • [28] Ambrosini G. Groux R. Bucher P. 2018. PWMScan: A fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics 34 2483–2484.

  • [29] Diaz de Arce A.J. Noderer W.L. Wang C.L. 2018. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons. Nucleic Acids Research 46(2):985–994.

  • [30] Kearse M.G. Wilusz J.E. 2017. Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Development 31(17) 1717–1731. doi: 10.1101/gad.305250.117.

Journal information
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 36 36 6
PDF Downloads 24 24 8