Rabies virus (RABV), belonging to the
Initially, whole-genome sequences (WGS) of RABV isolates were obtained with the genome-walking procedure. This procedure was based on amplification of multiple RABV fragments covering 12 kb of the RV genome and their sequencing using the Sanger method (27). From the first WGS of the RABV prototype in 1988 (26) to date, many advances in molecular technology have been made and alternative platforms for high-throughput sequencing (HTS) have been developed (6, 18).
The application of HTS allows metagenomic-based identification of random viral fragments in environmental samples (15). HTS generates a multiple raw data set to obtain a consensus sequence, and therefore it increases the reliability of sequencing by avoiding mismatches generated during Sanger sequencing. Owing to the high reliability of HTS, nowadays it is broadly applied in studies concerning virus evolution, host-virus interaction, and pathogenicity (28). Whole genome population studies also offer great potential to provide deep investigation of phylogenetic relationships among isolates for better understanding of what determines virus spread and persistence in the field, as the disease spread is reflected in a genetic signature in pathogen genomes. Viral pathogens, particularly fast-evolving RNA viruses, are model systems for exploration of pathogen populations, as they rapidly accumulate genetic diversity on a timescale similar to epidemiological processes (1, 5, 9).
Rabies is an acute encephalitis, a fatal zoonosis affecting all warm-blooded animals. In Europe, it is most widespread in wild animals, particularly in red foxes. Brain samples delivered for rabies diagnosis are very often putrefied or autolysed, making the genetic material unsuitable for HTS studies through disintegration. Before being diagnosed in the laboratory, in many cases the carcasses of dead animals have stayed in unfavourable environmental conditions for a long period, resulting in decomposition of the brain tissue and bacterial contamination. Such decomposition, the presence of many pathogens both viral and bacterial, and host organism genetic material adversely affect both the quality of RABV RNA and the effectiveness of full-length sequencing of the viral genome. Appropriate homogenisation of the brain tissue and effective purification of RNA isolated from field samples are also significant for the success of whole genome sequencing or metagenomic studies. A limitation of HTS encountered subsequently is a low concentration of viral RNA in total RNA isolated from the brain tissue.
Taking into account the difficulties and limitations in deep sequencing, namely the high background of genetic material from host species and bacteria in field samples, the main objective of the work was to evaluate and validate the different protocols (the direct metagenomic and RABV-enriched approaches) used for deep sequencing of field RABV isolates and compare different protocols used for RNA extraction in terms of their application to HTS.
Based on the RNA extraction procedure as well as on the scheme of RNA preparation for HTS, two HTS approaches were applied: the direct metagenomics approach (groups I, II, and III) and the RABV-enriched approach (group IV) as shown in Table 1.
Details of samples in the comparative study of extraction methods. RNA concentration, virus detection using real-time RT-PCR, and the number of reads obtained during whole-genome sequencing
Group | Isolate | Sample origin | Collection date | Extraction method /RT-PCR procedure | Concentration of dsDNA after clean-up (Qubit HS) (ng/µL) | Verification of HTS library | Total number of reads | Number of viral reads | % of viral reads | Number of RABV reads (centrifuge) | Number of contigs | Average coverage | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Direct metagenomic approach | I | 767121097L | red fox | 1997 | A – QIAmp Viral RNA Mini | 1.45 | + | 22,250 | 117 | 0.0525 | 3 | - | - | |
965180404L | red fox | 2004 | Kit/ RT + amplification of dsDNA with Klenow | 1.06 | + | 15,282 | 67 | 0.438 | 0 | - | - | |||
1045120899L | red fox | 1999 | fragment | 1.08 | + | 158,723 | 496 | 0.31 | 8 | - | - | |||
II | 767121097L | red fox | 1997 | 5.06 | − | - | - | - | - | - | ||||
965180404L | red fox | 2004 | B – Direct-zol RNA MiniPrep | 1.21 | + | 160,354 | 1,1170 | 6.965 | 132 | 19 | 3.5 | |||
1045120899L | red fox | 1999 | Zymo amplification Research/ of RT dsDNA + with | 1.15 | + | 200,756 | 4,218 | 2.101 | 133 | 6 | 2.5 | |||
1321180108L | red fox | 2008 | Klenow fragment | 1.2 | + | 517,696 | 70,436 | 13.605 | 1,359 | 1 | 32 | |||
1379120910L | red fox | 2010 | 2.21 | + | 948,295 | 68,491 | 7.222 | 4,765 | 3 | 152 | ||||
III | 1996181013L* | red fox | 2013 | 1.18 | + | 839,440 | 47,118 | 5.613 | 28,116 | 1 | 495 | |||
1992121113L | red fox | 2013 | 13.4 | + | 423,115 | 1,417 | 0.334 | 62 | 9 | 2 | ||||
1739120912L | red fox | 2012 | 2.93 | + | 4,216,387 | 14,103 | 0.334 | 657 | 1 | 15 | ||||
1679180512L* | red fox | 2012 | C – TRIzol/ chloroform/ethanol/ | 2.04 | + | 4,543,264 | 21,683 | 0.4772 | 2,935 | 1 | 71 | |||
1577121111L | red fox | 2011 | RT+ amplification of dsDNA | 4.28 | + | 1,058,492 | 5,313 | 0.05019 | 564 | 1 | 13 | |||
1525180711L | red fox | 2011 | with Klenow fragment | 4.38 | + | 849,062 | 2,023 | 0.238 | 23 | 3 | 1.5 | |||
1391180910L | red fox | 2010 | 3.86 | + | 2,401,009 | 8,487 | 0.3534 | 10 | 3 | 1 | ||||
EBLV-1 | 2018 | + | 460,751 | 178,828 | 38.812 | 32,232 | 1 | 571 | ||||||
RABV-enriched approach | IV | 1045120899L | as above | as above | 16.8 | + | 2,697,616 | 2,560,850 | 94.930 | 1,342,569 | 1 | 38,039 | ||
965180404L | 56.5 | + | 2,951,939 | 2,765,165 | 93.67 | 1,503,424 | 1 | 41,961 | ||||||
767121097L | 76.6 | + | 3,106,953 | 2,871,569 | 92.42 | 1,686,411 | 1 | 41,697 | ||||||
1379120910L | 55.2 | + | 754,635 | 711,112 | 94.23 | 372,141 | 1 | 11,346 | ||||||
1321180108L | 25.1 | + | 1,115,667 | 951,915 | 85.322 | 557,949 | 3 | 5,962 | ||||||
1525180711L | 75.5 | + | 869,224 | 824,305 | 94.832 | 463,772 | 1 | 12,144 | ||||||
1739120912L | 48.5 | + | 682,577 | 645,311 | 94.54 | 362,514 | 1 | 9,500 | ||||||
1996181013L* | 65.4 | + | 760,390 | 718,626 | 94.507 | 423,322 | 1 | 10,144 | ||||||
1577121111L | 81.4 | + | 904,532 | 846,126 | 93.54 | 507,207 | 1 | 12,122 | ||||||
1992121113L | 75.9 | + | 978,751 | 922,826 | 94.286 | 525,370 | 1 | 13,536 | ||||||
1391180910L | 47.3 | + | 985,144 | 943,962 | 95.819 | 539,481 | 1 | 13,075 | ||||||
2191180915L | red fox | 2015 | C – TRIzol/chloroform/ | 93 | + | 967,059 | 908,214 | 93.915 | 528,055 | 1 | 13,089 | |||
1990121113L | red fox | 2013 | ethanol + virus enrichment | 70.2 | + | 893,272 | 834,573 | 93.428 | 464,788 | 1 | 12,217 | |||
2176120515L | red fox | 2015 | 87.6 | + | 1,294,593 | 1,196,143 | 92.395 | 737,332 | 1 | 17,535 | ||||
2068180814L | red fox | 2014 | 69.6 | + | 1,192,314 | 1,112,910 | 93.340 | 65,4514 | 1 | 16,190 | ||||
2214181115L | red fox | 2015 | 87.6 | + | 1,230,082 | 1,140,045 | 92.680 | 709,820 | 1 | 16,316 | ||||
2067120814L | red fox | 2014 | 62 | + | 1,265,627 | 1,150,880 | 90.933 | 703,662 | 1 | 17,091 | ||||
2066120814L | red fox | 2014 | 68.4 | + | 1,934,161 | 1,741,947 | 90.062 | 1,062,064 | 1 | 25,144 | ||||
2226120916L | red fox | 2016 | 84 | + | 1,133,386 | 1,011,316 | 89.229 | 633,117 | 1 | 14,546 | ||||
2235181116L | red fox | 2016 | 103 | + | 1,947,768 | 1,828,286 | 93.865 | 1,119,736 | 1 | 26,791 | ||||
2236181216P | dog | 2016 | 81.2 | + | 612,073 | 564,607 | 92.245 | 346,010 | 1 | 7,980 | ||||
2237120117L | red fox | 2017 | 96.4 | + | 1,336,571 | 1,225,265 | 91.672 | 758,770 | 1 | 16,869 | ||||
2238181117K | cat | 2017 | 62 | + | 1,034,833 | 918,601 | 88.768 | 562,828 | 1 | 12,576 |
* – brain samples at a heavy decomposition stage; + – positive; −– negative
Primers employed for RT-PCR of field RABV samples
Amplicon | Primer name | Primer sequence 5′-3′ | Location genome in RABV | Amplicon size |
---|---|---|---|---|
A | RVA_forward | ATGGATGCCGACAAGATTGTATT | 1–23 | 4499 |
RVA_reverse | CAGGGGGTGCATCAGGGGAAT | 4478–4499 | ||
B | RVB_forward | ATCCCAGAGATGCAATCATCC | 4418–4439 | 3860 |
RVB_reverse | TGAGTAGAATGGTAGGACTGGCACC | 8251–8276 | ||
C | RVC_forward | GAACCCAGATCTTGGAGAGAGAA | 8172–8195 | 3631 |
RVC_reverse | TTCGGATTCAAGATCTTGTTTT | 11779–11801 | ||
A1 | RVA_forward | as above | 2267 | |
RVA1_reverse | TGGAATTTCTTGGAATTGGCCAAAGC | 2241–2267 | ||
A2 | RVA2_forward | GCTCATGACGGATCCAAACTCCC | 2193–2216 | 2300 |
RVA_reverse | as above | |||
B1 | RVB_forward | as above | 2330 | |
RVB1_reverse | GATTCAGGAATCTCAAAGATTTGCGT | 6724–6750 | ||
B2 | RVB2_forward | TTGACTCCTTATATCAAAACCCAGA | 6640–6665 | 1636 |
RVB_reverse | as above | |||
C1 | RVC_forward | as above | 2016 | |
RVC1_reverse | GTCATGGTTCTAGCTGCATGGCG | 10155–10188 | ||
C2 | RVC2_forward | ATGAGGCAGGTGCTGGGTG | 10054–10073 | 1750 |
RVC_reverse | as above |
After 2 h incubation at 50°C, the RT was terminated by heating at 70°C for 15 min and chilled on ice. PCR was performed for 10 μL of RT product added to 40 μL of reaction mixture containing 5 μL of buffer, 1 μL of dNTPs 10 mM, 1 μL of TaKaRa PrimeSTAR GLX DNA polymerase (TaKaRa Bio) and 33 μL of water for molecular biology. The reactions were carried out in a ProFlex thermocycler (Thermo Fisher Scientific) with the following programme: 1 cycle at 98°C for 4 min, followed by 40 cycles at 98°C for 20 s, 55°C for 30 s, and 72°C for 5 min. Products of amplification were detected by separation in 1% agarose gel.
In cases of weak or no signal from expected amplicons, hemi-nested PCRs were performed. Using the first round residues as a template, the reaction was carried out with adequate primers and slightly modified cycling: 35 cycles at 98°C for 20 s, 55°C for 30 s, and 72°C for 2 min.
The quantity and quality (A260/280 and A230/280) of DNA was measured with the use of a Qubit 3.0 fluorimeter and dsDNA HS Assay Kit, (Thermo Fisher Scientific) and NanoDrop One spectrophotometer (Thermo Fisher Scientific), respectively. In addition, the integrity of the RT product was assayed by capillary electrophoresis using a 5200 Fragment Analyser with a DNF-488 High Sensitivity Genomic DNA Analysis Kit (Agilent, USA). The samples which passed quality control were then normalised to equal concentrations.
In order to evaluate complete genome sequencing of field RABV isolates from brain samples, three RNA extraction methods were compared to select the most efficient for the metagenomic approach. The efficiency of RNA extraction was evaluated by the comparison of dsDNA properties: post-clean-up concentration (estimated by fluorimeter) and RABV genome integration (estimated by capillary electrophoresis). Measurements of dsDNA concentration on a spectrophotometer before clean-up is strongly discouraged due to misrepresentative dsDNA concentrations caused by the residue of reverse transcription. Approximate determination of dsDNA concentration on the spectrophotometer is possible after the clean-up procedure. During the study, HTS libraries were created from samples of dsDNA when the results of the quality check were positive. Additionally, proper size distribution, adapter remains, and quantity of libraries were evaluated. If these parameters were satisfactory, Illumina sequencing was carried out.
Testing of different RNA extraction procedures of the field samples began with the QIAmp Viral RNA Mini Kit and moved on to the Direct-zol RNA MiniPrep Kit. Procedure testing revealed a higher concentration of dsDNA in samples processed with the Direct-zol RNA MiniPrep Kit – group II – than in samples of which the RNA was purified with the QIAmp Viral RNA Mini Kit – group I (Table 1). In the next step, HTS libraries were prepared, and if the quantity and quality of the library was sufficient, deep sequencing was performed. Unfortunately, the number of viral reads was insufficient to obtain full-length sequences with reliable coverage of all RABV isolates described in groups I and II. The isolate 1379120910L was successfully full-length sequenced with average coverage of 152, however, it was the sample kept the shortest in storage (over 8 years) of all samples in groups I and II, which have collection dates between 1997 and 2008. Due to the long storage period, RNA was considerably fragmented (capillary electrophoresis data not shown) and this negatively affected library preparation and ultimately the results of the metagenomics studies.
In the next step, the combined method of RNA isolation utilising TRIzol/chloroform/ethanol extraction and RNA purification on a column – group III – was subjected to testing. To assess the concentration of RABV in entire samples, a real-time RT-PCR was performed simultaneously. The dsDNA concentration of reverse-transcribed RNA from the combined TRIzol and column method was significantly higher than that of dsDNA obtained from Direct-zol RNA MiniPrep extracted equivalents. The relative Ct values ranged between 13.77 and 18.58 (data not shown), suggesting a high concentration of viral RNA. The number of total reads was significantly higher compared to the sequencing results obtained from group II. Nevertheless, the percentage of viral reads was much lower and complete RABV genomes were obtained for four out of seven brain samples with average coverage between 13 and 17 for three RABV isolates and coverage of 495 for the isolate 1996181013L (Table 1, group III). The isolate EBLV-1 propagated in the cell culture was deeply sequenced with viral reads numbering 178,828 (38.812% of total reads), of which 32,232 reads were from European bat lyssavirus with the average coverage of 571.
The two commercial kits exploited different techniques, one being based on the digestion of the brain homogenate with lysis buffer and carrier RNA and the second applying TRIzol and ethanol (95–100%). A comparison of the two RNA extraction procedures revealed higher quality and quantity of extracted RNA when the second was used. Therefore, a modification of the TRIzol method was made with chloroform and ethanol at 75% and purification on the column was staged next. Quality and quantity gains were achieved in the RNA extracted.
In the last of the investigated approaches, the combined extraction method followed by RABV enrichment was conducted – group IV. Specially designed primers (Table 1) were used in the RABV amplification. Initially, RT-PCR products up to 4.5 kb in size were obtained, according to the scheme of overlapping amplicons (A, B, and C) as illustrated in Fig. 1. Due to the low yield of DNA polymerases, in the next step shorter products (A1, A2, B1, B2, C1, and C2) were amplified by RT-PCR, ranging in size from 1.67 kb to 2.33 kb. Typical RT-PCR products generated during the study are illustrated in Fig. 2. A combination of amplification products covering complete genomes of RABV isolates was subjected to HTS. Full-length sequences were obtained for all 23 subjected samples. In this group the number of viral reads was significantly higher than that of group III. The average coverage of consensus sequences ranged between 5,968 and 41,961.
High-throughput sequencing has been widely used for the characterisation of many pathogens, including viruses, bacteria, and parasites (10, 21–23). The direct metagenomic approach enables the study of the structure of a whole microbiome: bacteria, fungi, and all viruses present in an environmental sample where the host material is present as contamination. Whole metagenome sequencing provides a total approach for direct detection of specific viruses and makes possible an accurate survey of the virus structure. The main problem in this approach is the relatively small quantity of viral RNA compared to contamination by host and bacterial material.
Many HTS platforms and RNA preparation protocols were established for the WGS of RABV, however, the vast majority of samples were collected directly from fresh subjects without any decomposition signs or were tissue culture–propagated viruses like vaccine strains (7, 8). Relatively often, difficulties are faced when metagenomic studies are conducted on field samples collected several days after an animal’s death. An additional obstacle which has to be taken into consideration in RABV research is the quality of material, which is often highly disintegrated due to prolonged contact with the environment and abundant presence of RNases.
Obtaining viral nucleic acids from the sample is a crucial prerequisite for successful pathogen detection. Therefore, three different methods of RNA extraction from the brain samples collected in the field were tested in this study in terms of their applicability for the metagenomic approach. Salient results evincing high quantity and quality of RNAs (data not shown) were obtained using the combined method of RNA extraction including initial sample lysis with TRIzol and extraction with chloroform and 75% ethanol, which bettered both column-based methods of RNA extraction (the QIAmp Viral RNA Mini and the Direct-zol RNA MiniPrep kits). It should consequently be assumed that treating brain samples with TRIzol significantly increases the efficacy and reliability of RNA extraction. The brain sample constitutes a difficult template for RNA isolation, mainly due to the high concentration of fats. Organic solvents sufficiently dissolve fats, improving of the efficacy of RNA isolation.
It is apparent that for indirect pathogen detection based on the presence of a gene fragment, it is most effective to extract the RNA using standard column-based kits, due to a faster and easier isolation procedure. But for high-throughput sequencing, the integrity of the genetic material is key for library preparation, and standard RNA extraction methods do not always provide sufficient quality of RNA for this approach. This is particularly true in the case of RNA viruses, which are much more sensitive to environmental conditions due to the fragile structure of RNA. Our results and suggestions correlate with the findings of Wylezich
Preliminary estimation of viral load in the samples was determined using RT-PCR for the comparison of Ct values between different RABV isolates. This comparison indicated high concentrations of RABV in the samples extracted with Direct-zol RNA MiniPrep and the combined TRIzol/column method (groups II and III). This prediction was not fully reflected in success in deep sequencing, probably because the estimation provided by RT amplification was too approximate. In addition, RT-PCR detection is based on the amplification of short fragments of a viral genome, making this method less sensitive to sample fragmentation. During deep sequencing, in our study, a large number of total reads and viral reads were obtained (mean 3.64%), and fragments of RABV nucleotide sequences were detected, but it was not possible to determine full-length consensus sequences. Even if a consensus was obtained, the average coverage of contigs was too low (below 3.5). Metagenomic studies allow direct detection of pathogens but are characterised by a defined detection limit in terms of sequencer throughput and percentage throughput per sample.
Due to the nature of a viral cycle based on a host cellular system, the main obstacle with all viral metagenomic studies is a low viral load. It is important to understand that viral genomic material constitutes only a small fraction of all extracted RNA, where the overwhelming majority of such material will be high background from host species and bacteria. To overcome this issue different strategies of target enrichment may be applied.
The first recommended solution is propagation of viruses in cell culture before HTS. A good HTS result was obtained for EBLV-1 isolate cultivated in a neuroblastoma cell line. However, it is not always possible to multiply viruses from decomposed field samples when RABV is not able to infect cells or be isolated in the cell culture (16). Indeed, only three out of six RABV isolates originating from brain samples collected in the field were able to propagate in cell culture, moreover, this was only at low titres (data not shown). Passaging viruses in a cell culture system adds new artificial diversity to a viral population. The extent of alteration to the original consensus sequence of a RABV population depends on the number of passages necessary to obtain virus at a high enough titre to harvest (3). The finding was previously published that the number of single-nucleotide polymorphisms (SNPs) observed in cell-cultured RNA preparations were greater than those in tissue-extracted samples (16). Therefore, virus generation in cell culture prior to WGS should be highly constricted when performing studies on phylogeography of the population or on genomic diversity or virus evolution.
The second approach to viral enrichment is amplification of the whole RABV genome of 12 kb. This tactic results in a significant increase in specific RABV reads and consequently much greater coverage, however, it is limited to HTS of known pathogens only. Target enrichment,
The HTS methodology described in this paper facilitated obtaining complete genomes of several RABV isolates originating from the brain tissue of animals collected in the field. Significantly greater rates of RABV genome coverage were obtained with the RABV-enriched approach. However, metagenomic studies enabled full-length sequencing of 6 out of 16 field viruses including EBLV-1 propagated in a neuroblastoma cell line. The direct metagenomic approach provides information on original genome sequences, but with lower coverage. Sequencing coverage describes the average number of reads that align to, or “cover,” known reference bases (12); if the coverage value for viral reads is over 20, it provides reliable nucleotide sequences. The enriched approach gives greater coverage, but with the risks of genome modification and artificial diversity caused by PCR amplification. Complete viral sequences with sufficient coverage provide the ability to discriminate between isolates that are very closely related both genetically and geographically. The application of such a powerful tool in rabies cases is crucial for better understanding of the outbreak as well as for implementing more effective rabies control strategies.
In conclusion, the study describes the comparison of two approaches to the HTS of field rabies viruses. The crucial issues are summarised here which should be considered before deep sequencing. Direct metagenomics offers the most realistic illustration of a microbiome and is a straightforward approach for surveying a viral community in environmental samples. Major issues that have to be overcome are high sequencing depth due to host contamination, insufficient viral load in original samples, and higher detection limits compared to amplification-based methods. Low quality of the samples results in a low number of total reads, decreases the sequencing efficiency, and increases total costs. To overcome those problems enrichment techniques may be applied: removal of host material,
During the study a set of recommendations for sequencing RABV samples were derived. Careful sample processing is crucial for successful library preparation and sequencing. Appropriate storage and preservation of collected material and employment of a pretreatment method (digestion of host genetic material or ultracentrifugation) significantly increases the number of viral reads. An appropriate nucleic acid extraction method and control of RNA/DNA parameters, both of concentration (fluorimeter) and integrity (capillary electrophoresis) during each stage of sample processing are imperative for effective library preparation and sequencing.
If deep characterisation of viruses is intended,