Building the library of RNA 3D nucleotide conformations using the clustering approach

Open access


An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and understanding of RNA folding. However, this knowledge is stored in structural databases in a rather distributed form. Therefore, only automated methods for sampling the space of RNA structures can reveal plausible conformational representatives useful for further analysis. Here, we present a machine learning-based approach to inspect the dataset of RNA three-dimensional structures and to create a library of nucleotide conformers. A median neural gas algorithm is applied to cluster nucleotide structures upon their trigonometric description. The clustering procedure is two-stage: (i) backbone- and (ii) ribose-driven. We show the resulting library that contains RNA nucleotide representatives over the entire data, and we evaluate its quality by computing normal distribution measures and average RMSD between data points as well as the prototype within each cluster.

Adamiak, R., Blazewicz, J., Formanowicz, P., Gdaniec, Z., Kasprzak, M., Popenda, M. and Szachniuk, M. (2004). An algorithm for an automatic NOE pathways analysis in 2D NMR spectra of RNA duplexes, Journal of Computational Biology 42(11): 163-180.

Antczak, M., Zok, T., Popenda, M., Lukasiak, P., Adamiak, R., Blazewicz, J. and Szachniuk, M. (2014). RNApdbee-a webserver to derive secondary structures from PDB files of knotted and unknotted RNAs, Nucleic Acids Research 42(W1): W368-W372.

Berman, H., Olson, W., Beveridge, D., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S., Srinivasan, A. and Schneider, B. (1992). The Nucleic Acid Database: A comprehensive relational database of three-dimensional structures of nucleic acids, Biophysical Journal 3(63): 751-759.

Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I.N. and Bourne, P. E. (2000). The Protein Data Bank, Nucleic Acids Research 28(1): 235-42.

Blazewicz, J., Szachniuk, M. and Wojtowicz, A. (2004). Evolutionary approach to NOE paths assignment in RNA structure elucidation, Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, La Jolla, CA, USA, Vol. 1, pp. 206-213.

Cottrell, M., Hammer, B., Hasenfuss, A. and Villmann, T. (2006). Batch and median neural gas, Neural Networks 19(6): 762-771.

Dunbrack, Jr, R. (2002). Rotamer libraries in the 21st century, Current Opinion in Structural Biology 12(4): 431-440.

Dunbrack, Jr, R. and Karplus, M. (1993). Backbone-dependent rotamer library for proteins. Application to side-chain prediction, Journal of Molecular Biology 230(2): 543-574.

Frey, B. and Dueck, D. (2007). Clustering by passing messages between data points, Science 315(5814): 972-976.

Hamelryck, T., Kent, J. and Krogh, A. (2006). Sampling realistic protein conformations using local structural bias, PLoS Computational Biology 2(9): e131.

Humphris-Narayanan, E. and Pyle, A. (2012). Discrete RNA libraries from pseudo-torsional space, Journal of Molecular Biology 421(1): 6-26.

Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, 1st Edn., Wiley-Interscience, New York, NY.

Leontis, N. and Westhof, E. (2012). RNA 3D Structure Analysis and Prediction, Springer, Berlin/New York, NY.

Lloyd, S. (1982). Least squares quantization in PCM, IEEE Transactions on Information Theory 28(2): 129-137.

Lukasiak, P., Antczak, M., Ratajczak, T., Bujnicki, J.M., Szachniuk, M., Popenda, M., Adamiak, R. and Blazewicz, J. (2013). RNAlyzer-novel approach for quality analysis of RNA structural models, Nucleic Acids Research 12(41): 5978-5990.

Lukasiak, P., Blazewicz, J. and Milostan, M. (2010). Some operations research methods for analyzing protein sequences and structures, Annals of Operations Research 175(1): 9-35.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations, in L. LeCam and J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probability, University of California Press, Berkeley, CA, pp. 281-297.

Martinetz, T. and Shulten, K. (1991). A ”neural-gas” network learns topologies, in T. Kohonen et al. (Eds.), Artificial Neural Networks, Elsevier, Amsterdam, pp. 397-402.

Parisien, M. and Major, F. (2012). Determining RNA three-dimensional structures using low-resolution data, Journal of Structural Biology 179(3): 252-260.

Pekalska, E. and Duin, R. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications (Machine Perception and Artificial Intelligence), World Scientific Publishing Co., Inc., River Edge, NJ.

Popenda, L., Bielecki, L., Gdaniec, Z. and Adamiak, R.W. (2009). Structure and dynamics of adenosine bulged RNA duplex reveals formation of the dinucleotide platform in the C:G-A triple, Arkivoc 3: 130-144.

Popenda, M., Blazewicz, M., Szachniuk, M. and Adamiak, R. (2008). RNA FRABASE version 1.0: An engine with a database to search for the three-dimensional fragments within RNA structures, Nucleic Acids Research 36(1): D386-D391.

Puszy´nski, K., Jaksik, R. and Świerniak, A. (2012). Regulation of p53 by siRNA in radiation treated cells: Simulation studies, International Journal of Applied Mathematics and Computer Science 22(4): 1011-1018, DOI: 10.2478/v10006-012-0075-9.

Sabo, K. (2014). Center-based l1-clustering method, International Journal of Applied Mathematics and Computer Science 24(1): 151-163, DOI: 10.2478/amcs-2014-0012.

Steinhaus, H. (1956). Sur la division des corps matériels en parties, Bulletin de l’Academie Polonaise des Sciences IV(12): 801-804.

Szachniuk, M., Malaczynski, M., Pesch, E., Burke, E. and Blazewicz, J. (2013). MLP accompanied beam search for the resonance assignment problem, Journal of Heuristics 3(19): 443-464.

Villmann, T. (2005). Neural Maps and Learning Vector Quantization for Data Mining-Theory and Applications, Habilitation thesis, University of Leipzig, Leipzig.

Villmann, T., Geweniger, T., Kästner, M. and Lange, M. (2012). Fuzzy neural gas for unsupervised vector quantization, in L. Rutkowski et al. (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer, Berlin/Heidelberg, pp. 350-358.

Villmann, T. and Haase, S. (2011). Divergence based vector quantization, Neural Computation 23(5): 1343-1392.

Volkovich, Z., Barzily, Z. and Morozensky, L. (2008). A statistical model of cluster stability, Pattern Recognition 41(7): 2174-2188.

Weber, G.-W., Defterli, O., Gök, S.Z.A. and Kropat, E. (2011). Modeling, inference and optimization of regulatory networks based on time series data, European Journal of Operational Research 211(1): 1-14.

Zok, T., Popenda, M. and Szachniuk, M. (2014). MCQ4Structures to compute similarity of molecule structures, Central European Journal of Operations Research 22(3): 457-473.

International Journal of Applied Mathematics and Computer Science

Journal of the University of Zielona Góra

Journal Information

IMPACT FACTOR 2017: 1.694
5-year IMPACT FACTOR: 1.712

CiteScore 2017: 2.20

SCImago Journal Rank (SJR) 2017: 0.729
Source Normalized Impact per Paper (SNIP) 2017: 1.604

Mathematical Citation Quotient (MCQ) 2017: 0.13

Cited By


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 139 139 16
PDF Downloads 26 26 2