Open Access

Evaluation of direct metagenomics and target enriched approaches for high-throughput sequencing of field rabies viruses


Cite

Introduction

Rabies virus (RABV), belonging to the Lyssavirus genus, has a negative-sense, single-stranded, non-segmented RNA genome approximately 12 kb long. Due to the low fidelity of viral polymerase used for replication, RNA viruses are characterised by a high level of diversity (25). RABV RNA codes five proteins and the N gene–coding nucleoprotein is the most conservative fragment within the RABV genome (13). Traditionally, the N gene was utilised as the favourite not only for RABV detection but also for viral speciation and phylogenetic analysis (4, 20, 21). Partial or full nucleoprotein gene sequencing using the Sanger method is mostly valuable for preliminary phylogenetic studies and identification of RABV species. However, it provides limited information on viral genomes; therefore, the International Committee for Taxonomy of Viruses (ICTV) requires full-length genome sequencing when proposing new lyssavirus species.

Initially, whole-genome sequences (WGS) of RABV isolates were obtained with the genome-walking procedure. This procedure was based on amplification of multiple RABV fragments covering 12 kb of the RV genome and their sequencing using the Sanger method (27). From the first WGS of the RABV prototype in 1988 (26) to date, many advances in molecular technology have been made and alternative platforms for high-throughput sequencing (HTS) have been developed (6, 18).

The application of HTS allows metagenomic-based identification of random viral fragments in environmental samples (15). HTS generates a multiple raw data set to obtain a consensus sequence, and therefore it increases the reliability of sequencing by avoiding mismatches generated during Sanger sequencing. Owing to the high reliability of HTS, nowadays it is broadly applied in studies concerning virus evolution, host-virus interaction, and pathogenicity (28). Whole genome population studies also offer great potential to provide deep investigation of phylogenetic relationships among isolates for better understanding of what determines virus spread and persistence in the field, as the disease spread is reflected in a genetic signature in pathogen genomes. Viral pathogens, particularly fast-evolving RNA viruses, are model systems for exploration of pathogen populations, as they rapidly accumulate genetic diversity on a timescale similar to epidemiological processes (1, 5, 9).

Rabies is an acute encephalitis, a fatal zoonosis affecting all warm-blooded animals. In Europe, it is most widespread in wild animals, particularly in red foxes. Brain samples delivered for rabies diagnosis are very often putrefied or autolysed, making the genetic material unsuitable for HTS studies through disintegration. Before being diagnosed in the laboratory, in many cases the carcasses of dead animals have stayed in unfavourable environmental conditions for a long period, resulting in decomposition of the brain tissue and bacterial contamination. Such decomposition, the presence of many pathogens both viral and bacterial, and host organism genetic material adversely affect both the quality of RABV RNA and the effectiveness of full-length sequencing of the viral genome. Appropriate homogenisation of the brain tissue and effective purification of RNA isolated from field samples are also significant for the success of whole genome sequencing or metagenomic studies. A limitation of HTS encountered subsequently is a low concentration of viral RNA in total RNA isolated from the brain tissue.

Taking into account the difficulties and limitations in deep sequencing, namely the high background of genetic material from host species and bacteria in field samples, the main objective of the work was to evaluate and validate the different protocols (the direct metagenomic and RABV-enriched approaches) used for deep sequencing of field RABV isolates and compare different protocols used for RNA extraction in terms of their application to HTS.

Material and Methods

Samples. To develop a method for WGS and evaluate alternatives for maximal efficiency, a total of 23 animal brains diagnosed as positive for RABV infection in an immunofluorescence test (FAT) were used in the study. The samples (21 fox brains, 1 cat brain, and 1 dog brain) were collected in the southern part of Poland (Lesser Poland and Subcarpathian provinces) between 1997 and 2017. To compare the utility of RABV propagation in cell culture as applicable to the metagenomics approach, European bat lyssavirus-1 isolate cultivated in a neuroblastoma cell line (passage 6) was also included in the study.

Based on the RNA extraction procedure as well as on the scheme of RNA preparation for HTS, two HTS approaches were applied: the direct metagenomics approach (groups I, II, and III) and the RABV-enriched approach (group IV) as shown in Table 1.

Details of samples in the comparative study of extraction methods. RNA concentration, virus detection using real-time RT-PCR, and the number of reads obtained during whole-genome sequencing

Group Isolate Sample origin Collection date Extraction method /RT-PCR procedure Concentration of dsDNA after clean-up (Qubit HS) (ng/µL) Verification of HTS library Total number of reads Number of viral reads % of viral reads Number of RABV reads (centrifuge) Number of contigs Average coverage
Direct metagenomic approach I 767121097L red fox 1997 A – QIAmp Viral RNA Mini 1.45 + 22,250 117 0.0525 3 - -
965180404L red fox 2004 Kit/ RT + amplification of dsDNA with Klenow 1.06 + 15,282 67 0.438 0 - -
1045120899L red fox 1999 fragment 1.08 + 158,723 496 0.31 8 - -

II 767121097L red fox 1997 5.06 - - - - -
965180404L red fox 2004 B – Direct-zol RNA MiniPrep 1.21 + 160,354 1,1170 6.965 132 19 3.5
1045120899L red fox 1999 Zymo amplification Research/ of RT dsDNA + with 1.15 + 200,756 4,218 2.101 133 6 2.5
1321180108L red fox 2008 Klenow fragment 1.2 + 517,696 70,436 13.605 1,359 1 32
1379120910L red fox 2010 2.21 + 948,295 68,491 7.222 4,765 3 152

III 1996181013L* red fox 2013 1.18 + 839,440 47,118 5.613 28,116 1 495
1992121113L red fox 2013 13.4 + 423,115 1,417 0.334 62 9 2
1739120912L red fox 2012 2.93 + 4,216,387 14,103 0.334 657 1 15
1679180512L* red fox 2012 C – TRIzol/ chloroform/ethanol/ 2.04 + 4,543,264 21,683 0.4772 2,935 1 71
1577121111L red fox 2011 RT+ amplification of dsDNA 4.28 + 1,058,492 5,313 0.05019 564 1 13
1525180711L red fox 2011 with Klenow fragment 4.38 + 849,062 2,023 0.238 23 3 1.5
1391180910L red fox 2010 3.86 + 2,401,009 8,487 0.3534 10 3 1
EBLV-1 Eptesicus serotinus 2018 + 460,751 178,828 38.812 32,232 1 571

RABV-enriched approach IV 1045120899L as above as above 16.8 + 2,697,616 2,560,850 94.930 1,342,569 1 38,039
965180404L 56.5 + 2,951,939 2,765,165 93.67 1,503,424 1 41,961
767121097L 76.6 + 3,106,953 2,871,569 92.42 1,686,411 1 41,697
1379120910L 55.2 + 754,635 711,112 94.23 372,141 1 11,346
1321180108L 25.1 + 1,115,667 951,915 85.322 557,949 3 5,962
1525180711L 75.5 + 869,224 824,305 94.832 463,772 1 12,144
1739120912L 48.5 + 682,577 645,311 94.54 362,514 1 9,500
1996181013L* 65.4 + 760,390 718,626 94.507 423,322 1 10,144
1577121111L 81.4 + 904,532 846,126 93.54 507,207 1 12,122
1992121113L 75.9 + 978,751 922,826 94.286 525,370 1 13,536
1391180910L 47.3 + 985,144 943,962 95.819 539,481 1 13,075
2191180915L red fox 2015 C – TRIzol/chloroform/ 93 + 967,059 908,214 93.915 528,055 1 13,089
1990121113L red fox 2013 ethanol + virus enrichment 70.2 + 893,272 834,573 93.428 464,788 1 12,217
2176120515L red fox 2015 87.6 + 1,294,593 1,196,143 92.395 737,332 1 17,535
2068180814L red fox 2014 69.6 + 1,192,314 1,112,910 93.340 65,4514 1 16,190
2214181115L red fox 2015 87.6 + 1,230,082 1,140,045 92.680 709,820 1 16,316
2067120814L red fox 2014 62 + 1,265,627 1,150,880 90.933 703,662 1 17,091
2066120814L red fox 2014 68.4 + 1,934,161 1,741,947 90.062 1,062,064 1 25,144
2226120916L red fox 2016 84 + 1,133,386 1,011,316 89.229 633,117 1 14,546
2235181116L red fox 2016 103 + 1,947,768 1,828,286 93.865 1,119,736 1 26,791
2236181216P dog 2016 81.2 + 612,073 564,607 92.245 346,010 1 7,980
2237120117L red fox 2017 96.4 + 1,336,571 1,225,265 91.672 758,770 1 16,869
2238181117K cat 2017 62 + 1,034,833 918,601 88.768 562,828 1 12,576

* – brain samples at a heavy decomposition stage; + – positive; −– negative

RNA extraction. Total RNA was extracted from 20% homogenates (w/v) of the brain tissue. Three different methods of RNA isolation were tested:

A – RNA isolation from 140 μL of brain homogenates using a QIAmp Viral RNA Mini Kit (Qiagen, Germany) according to the manufacturer’s instructions;

B – RNA extraction using a Direct-zol RNA MiniPrep Kit (Zymo Research, USA) preceded by a brain tissue digestion stage with DNAse (TURBO DNase, Ambion, Thermo Fisher Scientific, USA). RNA extraction was carried out on tissue supernatant lysed with the TRIzol contained in the kit. All steps of RNA isolation were performed according to the manufacturer’s instructions;

C – RNA isolation with a method combining TRIzol/chloroform/ethanol and a QIAmp Viral RNA Mini Kit. The initial step consisted of lysis of brain homogenates with TRIzol. Subsequently, RNA was extracted with chloroform and precipitated with 75% ethanol. RNA was washed on the columns and eluted with water for molecular biology. For each sample, three subsequent elutions of RNA were performed. The third RNA eluate was subjected to long RT-PCR. All of the extracted RNA was used immediately for further studies or stored frozen at −20°C.

Reverse transcription (RT) and double-stranded DNA (dsDNA) synthesis. For complementary DNA (cDNA) synthesis, two protocols were evaluated, one of which used SuperScript III reverse transcriptase (Invitrogen, Thermo Fisher Scientific) and the other a PrimeScript RT-PCR Kit (TaKaRa Bio, Japan) with random hexamers. Synthesis of cDNA was performed as per the manufacturers’ protocols. To digest the remaining RNA, cDNA was treated with RNAse H (EurX, Poland). The second strand of DNA was synthesised with Klenow exonuclease (New England Biolabs, USA).

Genome amplification RT-PCR. To generate amplicons covering the complete viral genome of approximately 12 kb, three primer pairs suitable for generation of overlapping PCRs were designed. PCR products ranged between 3.7 kb and 4.5 kb and covered fragments of the RABV genome at position A, spanning nucleotides 1–4499; B, through 4418–8276; and C, 8172–11801. In addition, two sets of primers suitable for overlapping hemi-nested PCR products on each of these three amplicons were also designed (amplicon length 1.6 kb to 2.3 kb). The primers targeted conserved regions of the rabies virus genome with particular consideration given to the sequences of Polish RABV reference genomes (GenBank accession nos. MF197743.1, MF197741.1 and MF197742.1). All primer details are shown in Table 2. For the purpose of this study, two distinct amplification protocols were tested:

Primers employed for RT-PCR of field RABV samples

Amplicon Primer name Primer sequence 5′-3′ Location genome in RABV Amplicon size
A RVA_forward ATGGATGCCGACAAGATTGTATT 1–23 4499
RVA_reverse CAGGGGGTGCATCAGGGGAAT 4478–4499
B RVB_forward ATCCCAGAGATGCAATCATCC 4418–4439 3860
RVB_reverse TGAGTAGAATGGTAGGACTGGCACC 8251–8276
C RVC_forward GAACCCAGATCTTGGAGAGAGAA 8172–8195 3631
RVC_reverse TTCGGATTCAAGATCTTGTTTT 11779–11801
A1 RVA_forward as above 2267
RVA1_reverse TGGAATTTCTTGGAATTGGCCAAAGC 2241–2267
A2 RVA2_forward GCTCATGACGGATCCAAACTCCC 2193–2216 2300
RVA_reverse as above
B1 RVB_forward as above 2330
RVB1_reverse GATTCAGGAATCTCAAAGATTTGCGT 6724–6750
B2 RVB2_forward TTGACTCCTTATATCAAAACCCAGA 6640–6665 1636
RVB_reverse as above
C1 RVC_forward as above 2016
RVC1_reverse GTCATGGTTCTAGCTGCATGGCG 10155–10188
C2 RVC2_forward ATGAGGCAGGTGCTGGGTG 10054–10073 1750
RVC_reverse as above

I – Two-step RT-PCR. RNA in a 5 μL volume was mixed with 30 pmol of each of the two amplification primers from primer pairs A, B, or C and incubated at 70°C for 5 min and 37°C for 10 min. The hybridisation mixture was brought to 20 μL with the addition of 4 μL of 5X first strand buffer, 0.1 μL of 0.1 M DTT, 1 μL of RNase Out (Invitrogen, Thermo Fisher Scientific), 1 μL of SuperScript III enzyme, and 1 μL of 10 mM dNTPs.

After 2 h incubation at 50°C, the RT was terminated by heating at 70°C for 15 min and chilled on ice. PCR was performed for 10 μL of RT product added to 40 μL of reaction mixture containing 5 μL of buffer, 1 μL of dNTPs 10 mM, 1 μL of TaKaRa PrimeSTAR GLX DNA polymerase (TaKaRa Bio) and 33 μL of water for molecular biology. The reactions were carried out in a ProFlex thermocycler (Thermo Fisher Scientific) with the following programme: 1 cycle at 98°C for 4 min, followed by 40 cycles at 98°C for 20 s, 55°C for 30 s, and 72°C for 5 min. Products of amplification were detected by separation in 1% agarose gel.

In cases of weak or no signal from expected amplicons, hemi-nested PCRs were performed. Using the first round residues as a template, the reaction was carried out with adequate primers and slightly modified cycling: 35 cycles at 98°C for 20 s, 55°C for 30 s, and 72°C for 2 min.

II – One-step RT-PCR. The protocol was based on One-Step SuperScript III RT-PCR Kit usage. The reaction was performed in 25 μL of mixture containing 2.5 μL of RNA, 12.5 μL of 2X reaction buffer, 1 pmol of each primer, 7 μL of water for molecular biology and 1 μL of SuperScript III/Platinum Taq enzyme mix. Again, the reactions were carried out in a ProFlex thermocycler. For the amplification of A, B or C fragments the following programme was applied: 1 cycle of reverse transcription at 50°C for 30 min, 1 cycle at 95°C for 15 min, 40 cycles at 95°C for 30 s, 55°C for 30 s, and 72°C for 5 min, and final elongation at 72°C for 10 min. The smaller products A1, A2, B1, B2, C1, and C2 were amplified under the following conditions: 1 cycle of reverse transcription at 50°C for 30 min, 1 cycle at 95°C for 15 min, 35 cycles at 95°C for 30 s, 55°C for 30 s, and 72°C for 3 min, and final elongation at 72°C for 10 min. The amplicons were visualised under UV after separation in 1% agarose gel.

Real-time RT-PCR (rtRT-PCR). rtRT-PCR was performed to assess relative viral load based on the detection of over 100 bp of the N gene (Ct value). The reaction was performed as described previously (11, 29).

High-throughput sequencing

DNA pre-treatment and evaluation. After the reverse transcription, dsDNA clean-up was performed with AMPure XP magnetic beads (Beckman Coulter, USA). For the purpose of discarding DNA fragments shorter than 1,000 bp, a 0.5 : 1 bead-to-sample ratio was applied.

The quantity and quality (A260/280 and A230/280) of DNA was measured with the use of a Qubit 3.0 fluorimeter and dsDNA HS Assay Kit, (Thermo Fisher Scientific) and NanoDrop One spectrophotometer (Thermo Fisher Scientific), respectively. In addition, the integrity of the RT product was assayed by capillary electrophoresis using a 5200 Fragment Analyser with a DNF-488 High Sensitivity Genomic DNA Analysis Kit (Agilent, USA). The samples which passed quality control were then normalised to equal concentrations.

Library preparation. HTS libraries were prepared from 1 ng of dsDNA, according to the Nextera XT (Illumina, USA) protocol. The dual indexing system (Illumina) was used to label the samples uniquely. The libraries were then cleaned up with the use of AMPure XP magnetic beads (Beckman Coulter) at 0.8 : 1 ratio, removing fragments smaller than 300 bp. The quality and quantity of libraries were checked with the Qubit 3.0 fluorimeter and dsDNA BR Assay Kit, and an NGS DNF-473 Fragment Kit (Agilent), respectively. Each library was normalised with the use of library normalisation (LN) beads in the Nextera XT DNA Library Prep Kit (Illumina), then pooled and diluted to 20 pM concentration. PhiX Control v3 (Illumina) at 1% was used as an internal control for sequencing. Pair-end sequencing (2 × 300 bp) was performed on a MiSeq sequencer (Illumina) with a v3 kit (Illumina). A 10% portion of run capacity was dedicated to environmental samples of RABV and a 3% portion to amplified RABV genomes.

Bioinformatics. The quality check was done by FastQC. Data was trimmed by Trimmomatic; the operation consisted of removal of low quality reads (PHRED score below 33) and reads shorter than 36 bp (2). Non-viral data was filtered by BBDuk with three different approaches: positive filtration of virus reads based on the Kraken database, positive filtration according to the RABV reference sequence, and negative filtration of host reads. Evaluation of RABV data was performed in both Kraken and Centrifuge software (30, 14). Cleaned RABV data was then assembled de novo by metaSPAdes software (19).

Results

In order to evaluate complete genome sequencing of field RABV isolates from brain samples, three RNA extraction methods were compared to select the most efficient for the metagenomic approach. The efficiency of RNA extraction was evaluated by the comparison of dsDNA properties: post-clean-up concentration (estimated by fluorimeter) and RABV genome integration (estimated by capillary electrophoresis). Measurements of dsDNA concentration on a spectrophotometer before clean-up is strongly discouraged due to misrepresentative dsDNA concentrations caused by the residue of reverse transcription. Approximate determination of dsDNA concentration on the spectrophotometer is possible after the clean-up procedure. During the study, HTS libraries were created from samples of dsDNA when the results of the quality check were positive. Additionally, proper size distribution, adapter remains, and quantity of libraries were evaluated. If these parameters were satisfactory, Illumina sequencing was carried out.

Testing of different RNA extraction procedures of the field samples began with the QIAmp Viral RNA Mini Kit and moved on to the Direct-zol RNA MiniPrep Kit. Procedure testing revealed a higher concentration of dsDNA in samples processed with the Direct-zol RNA MiniPrep Kit – group II – than in samples of which the RNA was purified with the QIAmp Viral RNA Mini Kit – group I (Table 1). In the next step, HTS libraries were prepared, and if the quantity and quality of the library was sufficient, deep sequencing was performed. Unfortunately, the number of viral reads was insufficient to obtain full-length sequences with reliable coverage of all RABV isolates described in groups I and II. The isolate 1379120910L was successfully full-length sequenced with average coverage of 152, however, it was the sample kept the shortest in storage (over 8 years) of all samples in groups I and II, which have collection dates between 1997 and 2008. Due to the long storage period, RNA was considerably fragmented (capillary electrophoresis data not shown) and this negatively affected library preparation and ultimately the results of the metagenomics studies.

In the next step, the combined method of RNA isolation utilising TRIzol/chloroform/ethanol extraction and RNA purification on a column – group III – was subjected to testing. To assess the concentration of RABV in entire samples, a real-time RT-PCR was performed simultaneously. The dsDNA concentration of reverse-transcribed RNA from the combined TRIzol and column method was significantly higher than that of dsDNA obtained from Direct-zol RNA MiniPrep extracted equivalents. The relative Ct values ranged between 13.77 and 18.58 (data not shown), suggesting a high concentration of viral RNA. The number of total reads was significantly higher compared to the sequencing results obtained from group II. Nevertheless, the percentage of viral reads was much lower and complete RABV genomes were obtained for four out of seven brain samples with average coverage between 13 and 17 for three RABV isolates and coverage of 495 for the isolate 1996181013L (Table 1, group III). The isolate EBLV-1 propagated in the cell culture was deeply sequenced with viral reads numbering 178,828 (38.812% of total reads), of which 32,232 reads were from European bat lyssavirus with the average coverage of 571.

The two commercial kits exploited different techniques, one being based on the digestion of the brain homogenate with lysis buffer and carrier RNA and the second applying TRIzol and ethanol (95–100%). A comparison of the two RNA extraction procedures revealed higher quality and quantity of extracted RNA when the second was used. Therefore, a modification of the TRIzol method was made with chloroform and ethanol at 75% and purification on the column was staged next. Quality and quantity gains were achieved in the RNA extracted.

In the last of the investigated approaches, the combined extraction method followed by RABV enrichment was conducted – group IV. Specially designed primers (Table 1) were used in the RABV amplification. Initially, RT-PCR products up to 4.5 kb in size were obtained, according to the scheme of overlapping amplicons (A, B, and C) as illustrated in Fig. 1. Due to the low yield of DNA polymerases, in the next step shorter products (A1, A2, B1, B2, C1, and C2) were amplified by RT-PCR, ranging in size from 1.67 kb to 2.33 kb. Typical RT-PCR products generated during the study are illustrated in Fig. 2. A combination of amplification products covering complete genomes of RABV isolates was subjected to HTS. Full-length sequences were obtained for all 23 subjected samples. In this group the number of viral reads was significantly higher than that of group III. The average coverage of consensus sequences ranged between 5,968 and 41,961.

Fig. 1

A schematic of the RABV genome indicating position of five ORFs and primers

Fig. 2

Gel analysis of representative purified products of amplification. A – first round (amplicons A, B, and C). B – second round (amplicons A1, A2, B1, B2, C1, and C2)

Discussion

High-throughput sequencing has been widely used for the characterisation of many pathogens, including viruses, bacteria, and parasites (10, 2123). The direct metagenomic approach enables the study of the structure of a whole microbiome: bacteria, fungi, and all viruses present in an environmental sample where the host material is present as contamination. Whole metagenome sequencing provides a total approach for direct detection of specific viruses and makes possible an accurate survey of the virus structure. The main problem in this approach is the relatively small quantity of viral RNA compared to contamination by host and bacterial material.

Many HTS platforms and RNA preparation protocols were established for the WGS of RABV, however, the vast majority of samples were collected directly from fresh subjects without any decomposition signs or were tissue culture–propagated viruses like vaccine strains (7, 8). Relatively often, difficulties are faced when metagenomic studies are conducted on field samples collected several days after an animal’s death. An additional obstacle which has to be taken into consideration in RABV research is the quality of material, which is often highly disintegrated due to prolonged contact with the environment and abundant presence of RNases.

Obtaining viral nucleic acids from the sample is a crucial prerequisite for successful pathogen detection. Therefore, three different methods of RNA extraction from the brain samples collected in the field were tested in this study in terms of their applicability for the metagenomic approach. Salient results evincing high quantity and quality of RNAs (data not shown) were obtained using the combined method of RNA extraction including initial sample lysis with TRIzol and extraction with chloroform and 75% ethanol, which bettered both column-based methods of RNA extraction (the QIAmp Viral RNA Mini and the Direct-zol RNA MiniPrep kits). It should consequently be assumed that treating brain samples with TRIzol significantly increases the efficacy and reliability of RNA extraction. The brain sample constitutes a difficult template for RNA isolation, mainly due to the high concentration of fats. Organic solvents sufficiently dissolve fats, improving of the efficacy of RNA isolation.

It is apparent that for indirect pathogen detection based on the presence of a gene fragment, it is most effective to extract the RNA using standard column-based kits, due to a faster and easier isolation procedure. But for high-throughput sequencing, the integrity of the genetic material is key for library preparation, and standard RNA extraction methods do not always provide sufficient quality of RNA for this approach. This is particularly true in the case of RNA viruses, which are much more sensitive to environmental conditions due to the fragile structure of RNA. Our results and suggestions correlate with the findings of Wylezich et al. (31) that efficient RNA extraction is crucial in metagenomics studies.

Preliminary estimation of viral load in the samples was determined using RT-PCR for the comparison of Ct values between different RABV isolates. This comparison indicated high concentrations of RABV in the samples extracted with Direct-zol RNA MiniPrep and the combined TRIzol/column method (groups II and III). This prediction was not fully reflected in success in deep sequencing, probably because the estimation provided by RT amplification was too approximate. In addition, RT-PCR detection is based on the amplification of short fragments of a viral genome, making this method less sensitive to sample fragmentation. During deep sequencing, in our study, a large number of total reads and viral reads were obtained (mean 3.64%), and fragments of RABV nucleotide sequences were detected, but it was not possible to determine full-length consensus sequences. Even if a consensus was obtained, the average coverage of contigs was too low (below 3.5). Metagenomic studies allow direct detection of pathogens but are characterised by a defined detection limit in terms of sequencer throughput and percentage throughput per sample.

Due to the nature of a viral cycle based on a host cellular system, the main obstacle with all viral metagenomic studies is a low viral load. It is important to understand that viral genomic material constitutes only a small fraction of all extracted RNA, where the overwhelming majority of such material will be high background from host species and bacteria. To overcome this issue different strategies of target enrichment may be applied.

The first recommended solution is propagation of viruses in cell culture before HTS. A good HTS result was obtained for EBLV-1 isolate cultivated in a neuroblastoma cell line. However, it is not always possible to multiply viruses from decomposed field samples when RABV is not able to infect cells or be isolated in the cell culture (16). Indeed, only three out of six RABV isolates originating from brain samples collected in the field were able to propagate in cell culture, moreover, this was only at low titres (data not shown). Passaging viruses in a cell culture system adds new artificial diversity to a viral population. The extent of alteration to the original consensus sequence of a RABV population depends on the number of passages necessary to obtain virus at a high enough titre to harvest (3). The finding was previously published that the number of single-nucleotide polymorphisms (SNPs) observed in cell-cultured RNA preparations were greater than those in tissue-extracted samples (16). Therefore, virus generation in cell culture prior to WGS should be highly constricted when performing studies on phylogeography of the population or on genomic diversity or virus evolution.

The second approach to viral enrichment is amplification of the whole RABV genome of 12 kb. This tactic results in a significant increase in specific RABV reads and consequently much greater coverage, however, it is limited to HTS of known pathogens only. Target enrichment, i.e. the amplification of viral RNA in a long-range PCR using specific primers designed for the detection of RABV, was a much more effective method of RNA preparation for HTS in our study. Primers were designed based on previously sequenced Polish RABV isolates. Two RT-PCR protocols were applied for the amplification of the RABV viral genome, of which three fragments of 4.5 kb, 3.8 kb, and 3.7 kb were amplified, covering almost 12 kb. We found that the TaKaRa PrimeSTAR GXL DNA polymerase could amplify longer amplicons than the SuperScript III One-Step RT-PCR Kit. However, there was no difference in RT-PCR results when smaller fragments of viral genome were amplified (A1, A2, B1, B2, C1, and C2) with molecular weight of DNA ranging between 1.6 kb and 2.3 kb. All tested field samples were successfully amplified using both protocols: either the two-step RT-PCR with TaKaRa PrimeSTAR GLX DNA polymerase or, for fragments shorter than 2.5 kb, the SuperScript III One-Step RT-PCR Kit. The amplification of shorter fragments of the genome is much easier, whereas for long-range PCR reactions high-fidelity and high-yield polymerases are required. Hence in our study, for the detection of fragments around 2–2.5 kb, a single-tube one-step RT-PCR with a mix of SuperScript III/Taq DNA polymerase enzymes was sufficient. It is paramount to take into consideration the occurrence of random errors generated during both reverse transcription and amplification, therefore, it is vital to use high-fidelity polymerases with high DNA replication accuracy to minimise amplification errors. High-fidelity amplification is essential for experiments of which the outcome depends upon the correct DNA sequence, e.g. cloning, single-nucleotide polymorphism (SNP) analysis, and HTS applications.

The HTS methodology described in this paper facilitated obtaining complete genomes of several RABV isolates originating from the brain tissue of animals collected in the field. Significantly greater rates of RABV genome coverage were obtained with the RABV-enriched approach. However, metagenomic studies enabled full-length sequencing of 6 out of 16 field viruses including EBLV-1 propagated in a neuroblastoma cell line. The direct metagenomic approach provides information on original genome sequences, but with lower coverage. Sequencing coverage describes the average number of reads that align to, or “cover,” known reference bases (12); if the coverage value for viral reads is over 20, it provides reliable nucleotide sequences. The enriched approach gives greater coverage, but with the risks of genome modification and artificial diversity caused by PCR amplification. Complete viral sequences with sufficient coverage provide the ability to discriminate between isolates that are very closely related both genetically and geographically. The application of such a powerful tool in rabies cases is crucial for better understanding of the outbreak as well as for implementing more effective rabies control strategies.

In conclusion, the study describes the comparison of two approaches to the HTS of field rabies viruses. The crucial issues are summarised here which should be considered before deep sequencing. Direct metagenomics offers the most realistic illustration of a microbiome and is a straightforward approach for surveying a viral community in environmental samples. Major issues that have to be overcome are high sequencing depth due to host contamination, insufficient viral load in original samples, and higher detection limits compared to amplification-based methods. Low quality of the samples results in a low number of total reads, decreases the sequencing efficiency, and increases total costs. To overcome those problems enrichment techniques may be applied: removal of host material, e.g. enzymatic digestion; amplification of target sequences; ultracentrifugation of viral particles; or accumulation of the viral load via cell culturing.

During the study a set of recommendations for sequencing RABV samples were derived. Careful sample processing is crucial for successful library preparation and sequencing. Appropriate storage and preservation of collected material and employment of a pretreatment method (digestion of host genetic material or ultracentrifugation) significantly increases the number of viral reads. An appropriate nucleic acid extraction method and control of RNA/DNA parameters, both of concentration (fluorimeter) and integrity (capillary electrophoresis) during each stage of sample processing are imperative for effective library preparation and sequencing.

If deep characterisation of viruses is intended, e.g. for spatial and temporal phylogeography of viral populations during outbreaks, target enrichment followed by deep sequencing is also recommended as it generates much greater coverage of obtained consensus sequences.

eISSN:
2450-8608
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Life Sciences, Molecular Biology, Microbiology and Virology, other, Medicine, Veterinary Medicine