Towards an Encyclopaedia of Sequence Biology

Open access


In this review, I have presented several topics relevant to the present state and to the future state of the scientific field that I propose to call sequence biology (SB). In some pertinent publications, this field was called DNA linguistics. At the heart of SB lies a concept of a sequence code. In this review, I discussed three concepts: a concept of SB, a concept of encyclopaedia of genetic codes, and a concept of a corpus DNA linguistics.

[1] Brendel, V., Beckman, J.S., Trifonov, E.N., 1986. Linguistics of nucleotide sequences: Morphology and comparison of vocabularies. Journal of Biomolecular Structure and Dynamics, 4, 11–21.

[2] Trifonov, E.N., Brendel, V., 1987. Gnomic: Dictionary of genetic codes. Rehovot: Balaban Publishers, 1986; Wiley-VCH Verlag GmbH.

[3] Pevzner, P.A., Borodovsky, M.Y., Mironov, A.A., 1989. Linguistics of nucleotide sequences. I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. Journal of Biomolecular Structure and Dynamics, 6, 1013–1026.

[4] Bolshoy, A., Volkovich, Z., Kirzhner, V., et al., 2010. Genome clustering: From linguistics models to classification of genetic texts, Studies in Computational Intelligence, Berlin: Springer-Verlag.

[5] Trifonov, E.N., 1989. The multiple codes of nucleotide sequences. Bulletin of Mathematical Biology, 51, 417–432.

[6] Pevzner, P., 2000. Computational molecular biology: An algorithmic approach. Cambridge, MA: MIT Press.

[7] Brazma, A., Jonassen, I., Eidhammer, I., et al., 1998. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology, 5, 279–305.

[8] Salama, R.A., Stekel, D.J., 2013. A nonindependent energy-based multiple sequence alignment improves prediction of transcription factor binding sites. Bioinformatics, 29, 2699–2704.

[9] Barbieri, M., 2015. Code biology. A new science of life. Dordrecht: Springer.

[10] Trifonov, E.N., 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16s rRNA nucleotide sequence. Journal of Molecular Biology, 194, 643–652.

[11] Trifonov, E.N., 1999. Elucidating sequence codes: Three codes for evolution. Annals of the New York Academy of Sciences, 870, 330–338.

[12] Crick, F.H., 1963. On the genetic code. Science, 139, 461–464.

[13] Turner, B.M., 2000. Histone acetylation and an epigenetic code. Bioessays, 22, 836–845.

[14] Turner, B.M., 2007. Defining an epigenetic code. Nature Cell Biology, 9, 2–6.

[15] Barbieri, M., 2012. Code biology – a new science of life. Biosemiotics, 5, 411–437.

[16] Schrödinger, E., 1944. What is life? The physical aspect of the living cell. Cambridge, UK: Cambridge University Press.

[17] Crick, F., 1989. What mad pursuit. London: Penguin.

[18] Watson, J. 1981. The double helix. London: Weidenfeld and Nicholson.

[19] Derry, J.F., 2004. Review of what is life? by Erwin Schrödinger. Human Nature Review, 4, 124–125.

[20] Cobb, M. 2015. Life’s greatest secret: The race to crack the genetic code. London: Profile Books.

[21] McGeoch, D.J., 1987. Books in brief, TiBS, 12, 165.

[22] Kozak, M., 1999. Initiation of translation in prokaryotes. Gene, 234, 187–208.

[23] Belinky, F., Rogozin, I.B., Koonin, E.V., 2017. Selection on start codons in prokaryotes and potential compensatory nucleotide substitutions. Scientific Reports, 7(1), 12422. doi: 10.1038/s41598-017-12619-6.

[24] Villegas, A., Kropinski, A.M., 2008. An analysis of initiation codon utilization in the domain bacteria– concerns about the quality of bacterial genome annotation. Microbiology, 154, 2559–2661.

[25] Rocha, E. P., Danchin, A., Viari, A., 1999. Translation in Bacillus subtilis: Roles and trends of initiation and termination, insights from a genome analysis. Nucleic Acids Research, 27, 3567–3576.

[26] Moll, I., Grill, S., Gualerzi, C.O., et al., 2002. Leaderless mRNAs in bacteria: Surprises in ribosomal recruitment and translational control. Molecular Microbiology, 43 (1), 239–246.

[27] Genome database, available at: <>.

[28] Ambrosini, G., Groux, R., Bucher, P., 2018. PWMScan: A fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics, 34, 2483–2484.

[29] Diaz de Arce, A.J., Noderer, W.L., Wang, C.L., 2018. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons. Nucleic Acids Research, 46(2):985–994.

[30] Kearse, M.G., Wilusz, J.E., 2017. Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Development, 31(17), 1717–1731. doi: 10.1101/gad.305250.117.

Journal Information


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 47 47 16
PDF Downloads 30 30 11