Simple scalable nucleotic FPGA based short read aligner for exhaustive search of substitution errors

Péter Fehér 1 , Ágnes Fülöp 2 , Gergely Debreczeni 3 , Máté Nagy-Egri 4  and György Vesztergombi 5
  • 1 Eötvös Loránd University
  • 2 Eötvös Loránd University
  • 3 Wigner Institute
  • 4 Wigner Institute
  • 5 Wigner Institute and Eötvös Loránd University

Abstract

With the advent of the new and continuously improving technologies, in a couple of years DNA sequencing can be as commonplace as a simple blood test. The growth of sequencing efficiency has a larger exponent than the Moore’s law of standard processors, hence alignment and further processing of sequenced data is the bottleneck. The usage of FPGA (Field Programmable Gate Arrays) technology may provide an efficient alternative. We propose a simple algorithm for DNA sequence alignment, which can be realized efficiently by nucleotic principal agents of Non.Neumann nature. The prototype FPGA implementation runs on a small Terasic DE1-SoC demo board with a Cyclone V chip. We present test results and furthermore analyse the theoretical scalability of this system, showing that the execution time is independent of the length of reference genome sequences. A special advantage of this parallel algorithm is that it performs exhaustive search producing all match variants up to a predetermined number of point (mutation) errors.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] L. B. Alexandrov, Serena Nik-Zainal, et al., Signatures of mutational processes in human cancer, Nature 500 (7463) (2013) 415-421. ⇒152

  • [2] J. Arram, K. H. Tsoi, Wayne Luk, P. Jiang, Hardware Acceleration of Genetic Sequence, Chapter: Reconfigurable Computing: Architectures, Tools and Applications, Lecture Notes in Comp. Sci., 7806 13–24. ⇒173

  • [3] K. Benkrid, Liu Ying, A. Benkrid, A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment, very large scale integration (VLSI) systems, IEEE Transactions 17, 4 (2009) 561–570. ⇒173

  • [4] M. Burrows, D. J. Wheeler, 124 (1994), A block sorting lossless data compression algorithm, Technical Report, Digital System Research Center. ⇒152, 155

  • [5] Y. S. Dandass, S. C. Burgess, M. Lawrence, S. M. Bridges, Accelerating string set matching in FPGA hardware for bioinformatics research, BMC Bioinformatics 9 (2008) 197. ⇒173

  • [6] R. K. Karanam, A. Ravindran, A. Mukherjee, C. Gibas, A. B. Wilkinson, Using fpga-based hybrid computers for bioinformatics applications, Xilinx Xcell Journal 58 (2006) 80–83. ⇒173

  • [7] B. Langmead, C. Trapnell, M. Pop, Sl. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biol. 10:R25., ⇒152

  • [8] E. Lederberg, Lysogenicity in Eescherichia coli strain K-12, Microbial Genetics Bulletin, 1 (1950) 5–8. ⇒182

  • [9] J. von Neumann, First Draft of a Report on the EDVAC pp. 149. University of Pennsylvania, June 30. 1945. ⇒165

  • [10] T. F. Smith, M. S. Waterman, Identification of common molecular subsequences, Journal of Molecular Biology 147 (1981) 195-197. ⇒154

  • [11] G. Vesztergombi, ’Iconic’ tracking algorithms for high energy physics using the trax-I massively parallel processor, CHEP, Computer Physics Communications, 57 (1989) 290–296. ⇒166

  • [12] G. Vesztergombi, One billion processors program’s demo on FPGA emulator board, IEEEXplore, ReConFigurable Computing and FPGAs (ReConFig), (8–10 Dec. 2014) International Conference, Cancun. ⇒166

  • [13] ∗∗∗ Burrows-Wheeler Transform Discussion and Implementation Homepage: http://michael.dipperstein.com/bwt/ ⇒158

  • [14] ∗∗∗ EXAMS project submitted to EU call: FET-Proactive – towards exascale high performance computing H2020-FETHPC-2014. Publication date 2013-12-11 Deadline Date 2014-11-25 17:00:00. Specific challenge: The challenge is to achieve, by 2020, the full range of technological capabilities needed for delivering a broad spectrum of extreme scale HPC systems. (Private communication) ⇒170

  • [15] ∗∗∗ Generated sample: https://github.com/lh3/wgsim. ⇒182

  • [16] ∗∗∗ Real Lambda phage Homepage: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000840245.1_ViralProj14204/GCF_000840245.1_ViralProj14204_genomic.fna.gz ⇒182

  • [17] ∗∗∗ NVBIO: nvBowtie, 2015, Homepage: http://nvlabs.github.io/nvbio/nvbowtie_page.html. ⇒152

  • [18] ∗∗∗ Terasic – DE Main Boards, datum, Homepage: http://de1-soc.terasic.com. ⇒175

OPEN ACCESS

Journal + Issues

Search