Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations

Chin, Ephrem LH; da Silva, Cristina; Hegde, Madhuri

doi:10.1186/1471-2156-14-6

Research article
Open access
Published: 19 February 2013

Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations

Ephrem LH Chin¹,
Cristina da Silva¹ &
Madhuri Hegde¹

BMC Genetics volume 14, Article number: 6 (2013) Cite this article

20k Accesses
60 Citations
14 Altmetric
Metrics details

Abstract

Background

Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected.

Results

Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples.

Conclusions

We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.

Background

The introduction of next-generation sequencing (NGS) has revolutionized the way sequencing is being conducted in many research and clinical laboratories. Large genome centers have been the early adopters of NGS and use it primarily for large-scale genome sequencing projects [1–3]. A single next-generation instrument is able to sequence a whole human genome at 7.4-fold coverage in two months [2]. In comparison, the International Human Genome Sequencing Consortium of 20 laboratories worldwide took approximately 15 months to perform the same work [4]. There are currently four major manufacturers of next-generation instruments, and they all share the same fundamental process using four different chemistries [5]. Third-generation sequencers, like the Ion Torrent and Pacific Biosciences systems, have emerged as viable alternatives to the four next-generation sequencers and have started to appear in laboratories [6, 7].

In the last few years, clinical laboratories have begun to investigate how best to use the prodigious data-generation capacity of the NGS for clinical testing, as this tremendous sequencing capacity opens up new diagnosis possibilities that Sanger sequencing technology could not offer. Automated dideoxy Sanger sequencing has been the workhorse in clinical laboratories for many years and is considered to be the “gold standard” [8]. Clinical sequencing assays using Sanger sequencing are easy to develop and can be deployed rapidly in a clinical laboratory; however, it has limited data-generation capacity, mainly due to cost constraints, and it only allows analysis of one or at most a few genes in a patient. Accurate and sensitive mutation identification are of paramount importance for diagnosis confirmation, genetic counseling, risk assessment, and carrier screening in patients and family affected with a genetic disorder. The ability of a single next-generation sequencer to generate massive amounts of data allows a laboratory the opportunity to analyze many more genes in a cost-effective manner [9]. Many possible candidate genes for a specific phenotype can be investigated with ease, and NGS will allow regions of a gene not typically tested for mutations, such as deep intronic and promoter regions, to be analyzed on a routine basis. Here, we tested the analytical sensitivity and specificity of NGS for application in a clinical setting using previously identified simple and complex mutations.

The goal during a standard laboratory test development and validation process is to ensure the accuracy of the reported results. To achieve accuracy of results, laboratories have to ensure that every step of the testing process is carefully evaluated, and results documented to prove that a procedure works as expected and can consistently achieve the expected result. For a laboratory-developed test (LTD), laboratories are charged with establishing the following for the test: accuracy, precision, analytical sensitivity, analytical specificity, the reported range of test results, the test’s normal values, and the efficiency of the call rate for genotyping assays as indicated by the Center for Disease Control and Prevention, ACCE Model Process for Evaluating Genetic Tests as of January 3, 2010 (http://www.cdc.gov/genomics/gtesting/ACCE/). The analytical sensitivity of an assay is its ability to detect a low concentration of a given substance in a biological sample [10]. The sensitivity of NGS is vastly superior to Sanger sequencing and is capable of detecting mutant alleles as low >5%, as in mitochondria testing [11]. This extreme low level of the mutant allele will be undetectable by conventional Sanger sequencing and may not be confirmed as a “real” change. In our study, we are looking at two possibilities: equal proportion of both mutant and wild-type alleles, and either a mutant allele or a wild-type allele. The analytical specificity of an assay is its ability to identify only a specific substance [10]. In this study, we have assessed NGS for its application in clinical testing.

Methods

Validation samples selected for this study

For the first validation SOLiD sequencing run, we selected 20 samples that were referred to our laboratory for Sanger sequencing for a variety of different single-gene disorders. The selection of validation samples was based on the type of mutation present in the sample, the number of exons in the gene, and the complexity of the gene, which included % GC, sequence context around the mutation. The following genes were included: ACADVL, BCKDHA, CBS, CFTR, DMD, GAA, GALC, GALT, GBA, GJB2, HEXB, IDUA, OPA1, RECQL4, SGSH, SMPD1, and ZEB2. Samples selected for use in the validation of the SOLiD v3 instrument carried 119 changes consisting of 102 missense changes, seven deletions, nine duplications/insertions, and one indel mutation. These changes were initially identified by standard conventional Sanger clinical sequencing assays.

DNA isolation and sample enrichment

Genomic DNA was purified from peripheral blood or saliva samples (DNA Genotek) using standard extraction conditions as recommended by the Puregene DNA extraction system (Qiagen). The coding region and at least 20 bp of the flanking intronic sequence were amplified using custom-designed primers (Additional file 1) using the FastStart Taq PCR system (Roche Applied Sciences). PCR products ranged in size from 250 bp to 750 bp. PCR amplifications were performed in 50-ul reactions using 50 ng of genomic DNA, 10X reaction buffer, 0.2 mM of each dNTP, 2 pM of each forward and reverse primer, and 2U of Taq polymerase. The cycling condition consisted of an initial denaturation at 95°C for 3 min, 10 cycles of step-down annealing, where there was a decrease of 0.5°C at each cycle following the initial condition of 1 min denaturation at 95°C, 1 min of annealing at 60°C, and extension for 1 min at 72°C. 25 cycles of minute denaturation at 95°C, 1 min of annealing at 55°C, and extension for 1 min at 72°C and a final 7-min extension at 72°C. After amplification the PCR products were visualized on a 2% agarose gel and purified with Millipore MultiScreen PCR UF 96-well plates (Millipore). Enriched amplicons were quantitated in triplicate using PicoGreen (Life Technologies) and pooled in equimolar amounts.

Next-generation sequencing (NGS) analysis on an ABI SOLiD v3 sequencer

Each pooled sample was end-repaired (Epicenter Biotechnologies) and concatenated (New England BioLabs) using the manufacturer’s standard instructions. Results of concatenation were checked using an Agilent Bioanalyzer DNA 7500 chip (Agilent) to ensure that individual PCR fragments had been joined end to end to form a larger molecular weight product. Concatenated sample was then sheared randomly using Covaris S2 sonicator, and the sample was checked using an Agilent Bioanalyzer DNA 7500 chip to ensure that sheared sample was within 150 bp to 180 bp. Shearing concatenated sample ensures that we have even, non-biased coverage across the regions of interest . Sheared samples were then end-repaired and sequencing adaptor with unique barcode attached to each sample. An Agilent Bioanalyzer high-sensitivity chip was run to assess the success of adaptor ligation, as sample size should be increased by 90 bp after ligation, to a size range of 240 bp to 270 bp. Each sample was then amplified using Platinum Taq PCR system and SOLiD fragment library oligo kit (Life Technologies). Samples were then quantified using an Agilent Bioanalyzer high-sensitivity chip. Quantification of each sample was performed by calculating the area under the peak using the Agilent Bioanalyzer manual integration feature. Each sample is diluted to 1 ng/ul and all 20 individually barcoded samples are pooled together to create a single SOLiD library. Barcoding allows multiple small enriched targets to be combined and analyzed. The SOLiD library containing all 20 barcoded samples were diluted to 60 pg/ul, and emulsion PCR using the Solid ePCR kit (Life Technologies) was performed at two titration points (1pM and 1.5pM). Beads were purified and enriched for beads that had amplified template attached. Beads were then quantified using a NanoDrop and an estimated 15 million beads were used to perform a work flow analysis (WFA) on a quad on the SOLiD instrument. Approximately 15 million beads were deposited on a single quad on the glass slide (Life Technologies). Data generated on the WFA run were then used to determine the quality and quantity of beads present in the sample. Using quantification data from the WFA run, 60 million beads were then deposited onto a new quad, and a 50-bp barcoded fragment sequencing run was performed on the SOLiD v3 instrument.

Data analysis

Data were analyzed using a software package that was commercially available: NextGENe™ (SoftGenetics LLC). Raw data from the 20 samples were analyzed in NextGENe™ according to the manufacturers’ standard analysis process. A single nucleotide polymorphism (SNP) detection and small and large indel-calling algorithm was run. Two projects were created per sample; one with the 50-bp reads from each individual sample was aligned back against reference sequence, which was downloaded from NCBI. The second was running up to four cycles of condensation for each sample to ensure that small and large indels were detected. Analysis on NextGENe™ was performed on a dual quad core running at 3.33 GHz desktop computer with 48 GB of RAM and 1 TB of storage.

Mutation and polymorphism nomenclature

The reference sequence used for the 20 samples is as follows in Table 1. Nucleotide numbering reflects the cDNA numbering, with +1 corresponding to the A nucleotide of the ATG translation initiation codon in the reference sequence. The initiation codon is codon 1.

Table 1 Validation sample changes

Full size table

Results

Pooled PCR

Despite considerable work to ensure that each coding region of the entire library is represented equally during pooling, there is still great variability in the laboratory process that was hard to control. There seems to be lower coverage in the first coding exon of each of the 20 samples in this run, which may be due to the presence of higher GC content, whereas some additional exons gave a low coverage or no coverage (Table 2).

Table 2 GC content for first coding and (*) low-coverage exons (>20X coverage)

Full size table

Target matched reads

In this run, a single quad generated 38,779,652 50-bp reads on the ABI SOLiD v3 instrument, which equated to 1,939 gigabases of data. Data generated from this run provided in excess of 1.9 million 50-bp reads per sample (Table 3). Approximately 53% of the 1.9 million 50-bp reads were good-quality data and mapped to the genes of interest, providing approximately an average of 71,000 reads per coding region and in excess of 9,800 reads per base. This indicates that our analytical specificity of good-quality reads is at 100% [12]. While we were able to identify all 119 expected changes as identified with our Sanger sequencing assay results, this data set had nine false-positive changes, which brought the analytical sensitivity of this study in at 92.7% [12].

Table 3 Run statistic

Full size table

Data analysis

Initial analysis with the NextGENe™ software was able to detect 119 out of the 119 expected changes (Table 4). Three changes (IDUA c.973-45 G > C, OPA1 c.93_96dupAAAA and SGSH c.664-39_664-38delCT) missed during the initial analysis were complex changes or changes at the end of PCR fragments, where good-quality data were found to be discarded due to the initial software setting. The entire data set were subjected to analysis to determine the quality of each 50-bp read, with good-quality reads retained for additional analysis and bad-quality reads removed from analysis. The additional rounds of analysis performed on NextGENe™ used only good-quality reads for alignment for the three samples for which mutations were missed. This alternative strategy enabled the laboratory to detect the remaining three mutations that were missed in initial phases of the data analysis, and we were successful in detecting all 119 changes present in the data set. NextGENe™ was not only able to detect single nucleotide changes, such as ACADVL c.1504C > G (p.L502V), but also small deletions and insertion events, such as CFTR c.1521_1523delCTT and CFTR c.2052_2053insA. The real power of NextGENe software was its ability to detect larger deletions, duplications, and indels, such as SMPD1 c.785_807del23, SGSH c.337_345delins11, and GBA c.1265_1317del55, using data generated from a 50-bp fragment sequencing run by applying a SoftGenetic’s propriety condensation algorithm, which enabled good-quality 50-bp fragment data to be lengthened and enabled the detection of larger size deletions and duplication events (Figure 1). This ability to detect the entire spectrum of mutations from single nucleotide changes to large deletions and duplications using the NextGENe™ software represents an important capability that a clinical laboratory has to have if they are to be able to offer clinical sequencing tests using next-generation sequencing data. This single run demonstrates that NGS software like NextGENe™ has matured sufficiently for use in a clinical environment and that next-generation sequencers, such as the ABI SOLiD, are ready to be deployed in clinical laboratories. While our data analysis pipeline was able to detect all 119 known changes, nine additional changes (six single nucleotide changes and three deletions) were also picked up. The laboratory was 100% concordant with the NGS data identifying all 119 known changes in the 20 samples. There were nine changes that were identified in the NGS data that were not identified in the Sanger sequencing data and that provided us with a 7.56% false-positive rate (Table 5).

Table 4 Number of changes

Full size table

Table 5 False-positive rate

Full size table

Coverage

The coverage of each coding region ranged from 643,999 reads per exon for a small gene like GJB2, to the largest gene, which had an average of over 8,000 reads for the 79 coding regions in the DMD gene. For substitution changes, coverage ranged from 34 to 42340 reads. For deletions, the coverage ranged from 20 to 34879 reads. For duplications or insertions, the coverage ranged from 179 to 33377 reads. For the single indel mutation, coverage was 7735 reads (Table 1).

Discussion

It is critical to ensure that samples selected for use in validation of NGS carried representative changes and mutations that a clinical laboratory expects to detect in real-world samples.

NGS is able to detect complex mutations using targeted amplification

Genes selected included the ACADVL, BCKDHA, CBS, CFTR, DMD, GAA, GALC, GALT, GBA, GJB2, HEXB, IDUA, OPA1, REQL4, SGSH, SMPD1 and ZEB2 genes. Duchenne muscular dystrophy (DMD) is caused by mutations in the DMD gene, the largest human gene, spanning 2.2 Mb on the X chromosome [13, 14]. Gaucher disease is an autosomal recessive disorder where mutations in the GBA gene result in a decrease in the activity of acid β-glucosidase. The GBA gene is an extremely difficult gene to perform diagnostic testing on, due to the presence of a pseudogene that is >98% identical to the active gene [15, 16]. The REQL4 gene has an atypical structure; it is a very compact gene of ~6.5 kb, where most of the introns are less than 100 bp in length. It is also highly repetitive and GC rich, making it difficult to amplify and sequence cleanly [17, 18]. Other genes selected for inclusion in the validation run were mainly based on the changes they carry. One such example is a sample with two mutations in the GJB2 gene. This sample carries a c.35delG on one allele and a c.35dupG on the second allele (Table 1). In conventional Sanger sequencing analysis, it is very difficult to interpret the data when there are two deletions at the same nucleotide position [19]. Both mutations in the GJB2 gene were identified on the NGS run. NGS is able to sequence both strands independently, providing our laboratory with not only the genotype but also the data to determine which change is on which strand of the DNA.

Target amplification method needs to be chosen carefully for NGS

In this study, we used a standard PCR approach to test the sensitivity and specificity of NGS. We faced many challenges during the initial startup phase in acquiring and deploying an NGS instrument in a clinical laboratory environment. Clinical laboratories routinely generate hundreds if not thousands of PCR reactions a day for use in Sanger sequencing, but this enrichment strategy would not work for NGS; it involves too many labor-intensive steps to accurately quantitate individual PCR amplicons before it can be pooled for use in the NGS chemistry pipeline. This labor-intensive manual process will raise costs and lengthen the time of the entire process. Laboratories will find it hard to continue to use standard Sanger sequencing enrichment techniques on a routine basis, because of the need to exploit the full capacity of the NGS instrument to minimize costs. On the SOLiD v3 instrument, we are able to interrogate up to 2.4 Mbp of a region of interest in a single quad. The cost in time and effort to generate individual PCR amplicons for an entire 2.4-Mbp region of interest is prohibitive and raises the chances that a mistake will occur. Even if long PCR techniques could be employed as the enrichment technique, it would require 240, 10-kb individual reactions to enrich for a 2.4-Mbp region.

It is clear that, to manage the workflow of a larger number of amplicons for gene panels, clinical laboratories will need to consider target enrichment methods, such as multiplex PCR (Fluidigm™), microdroplet-based PCR (RainDance™), or in solution-based PCR (Agilent SureSelect™). Jones et. al [20]. have recently demonstrated the use of microdroplet-based PCR for the testing of 25 genes for congenital disorders of glycosylation (CDG) in a clinical laboratory. In the work performed by Jones et. al., it was shown that even after using target enrichment methods, some exons fail to give adequate coverage and still need Sanger sequencing to complete the clinical test. Sanger sequencing will continue to play an important role in the clinical laboratory for assay completeness, both for sequencing low-coverage and difficult regions in a gene and for confirmatory studies once a mutation is identified in a proband and additional family members need to be tested. Given our initial approach of adapting the enrichment method used for standard Sanger sequencing, we have demonstrated any change within the boundaries of custom-designed primers flanking the region of interest (eg, exons) can be detected successfully.

Coverage

Using coverage data as the sole indicator of whether a change was real is difficult. The nine false-positive changes that were picked up had a median coverage of approximately 400 reads and a mean of approximately 3,600 reads. As a contrast, confirmed changes had approximate median coverage of 5,300 reads and an approximate mean coverage of 7,000 reads. The numbers of reads for actual confirmed changes are approximately 15-fold higher compared to false-positive changes. As the number of reads for both confirmed and false-positive changes overlaps significantly, we are unable to use just the number of reads as the sole indicator. In this study, we see a great overlap in coverage between the number of reads for substitution mutations and with smaller insertion/deletion mutations. To detect larger deletions/duplications using NextGENe’s™ condensation function, the number of reads was effectively reduced. The GBA_2 sample, c.1265_1319del55 mutation had only 20 reads, compared to the GJB2 sample, which has a single base deletion, c.35delG mutation that had 34,879 reads. Similarly, the OPA1 sample, c.93_96dupAAAA mutation has only 179 reads compared to the GJB2 sample, c.35dupG mutation, which had 33,377 reads. In an effort to try to determine an appropriate coverage threshold, simulation experiments were run for mutation c.2052_2053insA in the CFTR gene. A varying number of reads that align to the region were randomly selected and used for analysis. We performed 80 simulations with the number of reads selected varying from 15 to 50 reads for every 10,000 reads. Coverage for the insertion varied from 8 to 43. For some of the simulations, NextGENe was able to detect the insertion with coverage as low as 8 reads. We chose 20 reads as the average threshold. Other groups have also expressed a similar viewpoint [21–25]. In work performed by De Leeneer K. et. al., the authors have performed a detailed analysis to determine the coverage needed during a NGS sequencing run given two variables (quality score of data and sequencing errors) to detect heterozygous changes. In their paper, they have determined that data with a quality score of 30 will require a minimum 18X coverage if sequencing error is at 15% [24]. Dohm J et. al. in their study found bona fide SNPs by applying high coverage of >20X [24].

Confidence score

Software has a Phred-like confidence score calculated with a novel SoftGenetics algorithm. The software algorithm takes into account multiple variables to calculate a final probability that any one change is a true. A phred score of 10 means there is approximately a 1 in 10 chance that the change is the result of an error, while a phred score of 30 represents a 1 in 1000 chance that the change is an error. This Phred-like score gives us greater confidence in determining true and false-positive changes. In our study, we have seen real changes with Phred-like confidence scores averaging a score of 24 with a minimum score of 9.4 and a maximum score of 34.6 (Table 1). Some changes detected using the condensation algorithm does not have a Phred-like confidence score. Confidence score of nine and above along with coverage above 20X makes it more likely that a change is real.

Proportion of bases

Another indicator is the relative proportion of mutant compared to the wild-type base. In one of the samples we ran, there is a heterozygous c.1504 C > G (p.L502V) missense mutation in the ACADVL gene. This mutation had 5869 reads showing an approximately equal proportion of the wild-type C allele (60%) compared to the mutation G allele (40%). Our validation data set suggests that real heterozygous calls should be present in the data in approximately equal proportion and can range as to as much as 70% wild-type to 30% mutant, whereas homozygous/hemizygous calls should consist almost exclusively of the mutant allele but can range as much as 20% wild-type to 80% mutant. The proportion of bases called will never be exact, due to the presence of nonspecific amplification that was sequenced and aligned back to the regions of interest. This is compounded by errors generated during next generation sequencing wet bench process and errors generated by the Solid instrument during sequencing.

NGS pipeline in a clinical laboratory

Most clinical laboratories are very well equipped and accustomed to performing high-complexity testing that requires multiple steps. While most clinical laboratories will not find it difficult to perform the wet bench work required to perform a NGS run, it is a challenge to maintain the same level of consistency as could be achieved easily with a Sanger sequencing pipeline.

The current NGS pipelines involve many interdependent steps, and a major challenge faced by our laboratory was how to accurately and consistently quantitate small amounts of the enriched library that are present in each single step of the process. A subtle change in quantity could result in a bad library preparation and lead to a less than ideal data set, especially if loading the quad to its maximum capacity. Equal deep coverage of at least 20 reads per base across every region of interest is needed to ensure that all changes are picked up accurately by the laboratory.

Changes in laboratory structure

Clinical laboratories often lack experienced bioinformatics staff and the necessary computing infrastructure within a clinical setup. There are only a few NGS 50-bp fragment analysis programs available on the market. The few that exist were developed for use by programmers and bioinformatics specialists. This dearth of software packages, which are both 'laboratorian' friendly and powerful enough to perform de novo detection of the entire mutation spectrum, hinders developments that would enable to use of NGS fragment capabilities to perform targeted resequencing projects. We selected SoftGenetics NextGENe™ software package as it is designed to detect the entire mutation spectrum, including small and large indels using data generated from a 50-bp fragment run. Our laboratory has demonstrated that we are able to leverage the power of SOLiD’s 50-bp fragment run to detect not only single nucleotide changes, but also small and large indels. This is possible due to a proprietary indel detection process called condensation, developed by SoftGenetics [26]. The condensation tool is used to polish and lengthen short sequence reads into fragments that are longer and more accurate. The short reads from the SOLiD System are often not unique within the genome being analyzed. By clustering similar reads containing a unique anchor sequence, data of adequate coverage are condensed; short reads are lengthened and instrument errors are filtered from the analysis. This stage helps to prepare data for analysis in applications such as SNP/Indel detection by statistically removing many of the errors, while maintaining true variations. The reads used for each condensed read are recorded to maintain allele frequency information. In addition, the condensation tool can be set to automatically run multiple cycles, further increasing the read lengths. Condensation operates without referring to a reference sequence. Reads are clustered using 12-bp anchor sequences within the reads. Each possible 12-bp sequence within the reads is considered for indexing. All reads containing this exact sequence are clustered together to form a group. The group of reads is further sorted by the flanking shoulder sequences, immediately upstream and downstream from the anchor sequence, into subgroups. A consensus read, generally 1.6 times the original read length, is created for each subgroup. By removing many low-frequency, biased calls and improving alignment accuracy by lengthening reads, the condensation tool is useful for preparing data prior to indel detection. NextGENe™ then aligns the consensus reads to the reference sequence. NextGENe™ can be run by a laboratory technician, which is an important consideration for a clinical laboratory. A laboratory technician who has been trained to analyze Sanger sequencing data does not necessarily have the programming skills to perform NGS analysis. Skilled professional programmers or bioinformatics specialists are needed to work in partnership with laboratory directors, genetics counselors, and clinicians to interpret the massive amount of data generated in a single NGS run.

Due to the immense capacity to generate data from a NGS platform, clinical laboratories will not perform single-gene analysis on the NGS platform. We are able to use the increased capabilities of the NGS platform by raising the number of genes being analyzed at a time. As the number of genes in a gene panel increases, the potential number of false positives identified will correspondingly go up. Clinical laboratories will deal with a larger number of false-positive changes in order to avoid missing any real disease-causing mutations. As with any clinical test, changes identified from a NGS platform will need to be confirmed using an alternative technology, such as Sanger sequencing. It is important that clinical laboratories perform such confirmation to determine the validity of calls generated by the NGS data. We have been able to identify three indicators (coverage of above 20 reads, confidence score of 30 and above and proportion of bases for heterozygotes that can range as skewed as 70% wild-type to 30% mutant and for homozygous as much as 20% wild-type to 80% mutant) to help to determine whether a change that is detected is real.

Cost considerations when implementing NGS in a clinical laboratory

The cost of implementing a NGS system in a laboratory is not confined to the cost of the instrument package as provided by the manufacturer. There are many pieces of ancillary equipment required, and their availability will be critical to the success of the NGS setup in the laboratory. Equipment such as a powerful computer and secure data storage are required in the laboratory to handle the massive amounts of data. Cloud computing is an option that has emerged as NGS was developed over the last few years. While this is an alternative, the clinical laboratory will need to identify a secure HIPAA-compliant cloud provider that will be able to support clinical needs. While the cost of such a computer and storage cluster is reasonable, laboratories will need to budget additional funds to cover the purchase of such ancillary equipment.

Conclusions

In conclusion, we have demonstrated that NGS technology is ready to be deployed in clinical laboratories. The analytical sensitivity achieved in our study was 92.7%, and was able to detect all 119 changes which were identified previously using Sanger sequencing. However, NGS and associated technologies are still in their infancy, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success. It has been suggested by many parties that the importance of targeted gene sequencing panels will decrease as the cost of NGS decreases. There is no need to just perform a targeted sequencing run when the same information can be extracted from a whole-exome or -genome analysis dataset. A recent study by Snyder et al. [27] suggests that, due to the size of the target that is being interrogated (exomes/genomes versus 2.4 Mbp), the lower depth of coverage reduces the sensitivity of variant detection. This affects the confidence of a clinical laboratory to detect all pertinent variants in our target genes. As such, targeted gene sequencing panels will continue to play an important role in clinical sequencing, until such time that whole exomes and genomes are able to reach the same level of high, even coverage as a targeted sequencing panel.

References

Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Guo Y: The diploid genome sequence of an Asian individual. Nature. 2008, 456: 60-65. 10.1038/nature07484.
Article PubMed Central CAS PubMed Google Scholar
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT: The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008, 452: 872-876. 10.1038/nature06884.
Article CAS PubMed Google Scholar
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456: 53-59. 10.1038/nature07517.
Article PubMed Central CAS PubMed Google Scholar
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.
Article CAS PubMed Google Scholar
Voelkerding KV, Dames SA, Durtschi JD: Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009, 55: 641-658. 10.1373/clinchem.2008.112789.
Article CAS PubMed Google Scholar
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B: Real-time DNA sequencing from single polymerase molecules. Science. 2009, 323: 133-138. 10.1126/science.1162986.
Article CAS PubMed Google Scholar
Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M: An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011, 475: 348-352. 10.1038/nature10242.
Article CAS PubMed Google Scholar
Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977, 74: 5463-5467. 10.1073/pnas.74.12.5463.
Article PubMed Central CAS PubMed Google Scholar
Oetting WS: Impact of next generation sequencing: the 2009 Human Genome Variation Society Scientific Meeting. Hum Mutat. 2010, 31: 500-503. 10.1002/humu.21210.
Article PubMed Google Scholar
Saah AJ, Hoover DR: "Sensitivity" and "specificity" reconsidered: the meaning of these terms in analytical and diagnostic settings. Ann Intern Med. 1997, 126: 91-94.
Article CAS PubMed Google Scholar
Huang T: Next generation sequencing to characterize mitochondrial genomic DNA heteroplasmy. Curr Protoc Hum Genet. 2011, 71: 19.8.1-19.8.12.
Google Scholar
Association for Molecular Pathology Clinical Practice Committee: Molecular Diagnostic Assay Validation. 2009
Google Scholar
Koenig M, Hoffman EP, Bertelson CJ, Monaco AP, Feener C, Kunkel LM: Complete cloning of the Duchenne muscular dystrophy (DMD) cDNA and preliminary genomic organization of the DMD gene in normal and affected individuals. Cell. 1987, 50: 509-517. 10.1016/0092-8674(87)90504-6.
Article CAS PubMed Google Scholar
Mehler MF: Brain dystrophin, neurogenetics and mental retardation. Brain Res Brain Res Rev. 2000, 32: 277-307. 10.1016/S0165-0173(99)00090-9.
Article CAS PubMed Google Scholar
Horowitz M, Wilder S, Horowitz Z, Reiner O, Gelbart T, Beutler E: The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics. 1989, 4: 87-96. 10.1016/0888-7543(89)90319-4.
Article CAS PubMed Google Scholar
Martinez-Arias R, Calafell F, Mateu E, Comas D, Andres A, Bertranpetit J: Sequence variability of a human pseudogene. Genome Res. 2001, 11: 1071-1085. 10.1101/gr.GR-1677RR.
Article PubMed Central CAS PubMed Google Scholar
Kitao S, Lindor NM, Shiratori M, Furuichi Y, Shimamoto A: Rothmund-thomson syndrome responsible gene, RECQL4: genomic structure and products. Genomics. 1999, 61: 268-276. 10.1006/geno.1999.5959.
Article CAS PubMed Google Scholar
Kitao S, Shimamoto A, Goto M, Miller RW, Smithson WA, Lindor NM, Furuichi Y: Mutations in RECQL4 cause a subset of cases of Rothmund-Thomson syndrome. Nat Genet. 1999, 22: 82-84. 10.1038/8788.
Article CAS PubMed Google Scholar
Hjelm LN, Chin EL, Hegde MR, Coffee BW, Bean LJ: A simple method to confirm and size deletion, duplication, and insertion mutations detected by sequence analysis. JMD. 2010, 12: 607-610. 10.2353/jmoldx.2010.100011.
Article PubMed Central CAS PubMed Google Scholar
Jones MA, Bhide S, Chin E, Ng BG, Rhodenizer D, Zhang VW, Sun JJ, Tanner A, Freeze HH, Hegde MR: Targeted polymerase chain reaction-based enrichment and next generation sequencing for diagnostic testing of congenital disorders of glycosylation. Genet Med. 2011, 13: 921-932. 10.1097/GIM.0b013e318226fbf2.
Article PubMed Central CAS PubMed Google Scholar
Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, Woolf B, Shen L, Donahue WF, Tusneem N, Stromberg MP: Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res. 2008, 18 (10): 1638-1642. 10.1101/gr.077776.108.
Article PubMed Central CAS PubMed Google Scholar
Besaratinia A, Li H, Yoon JI, Zheng A, Gao H, Tommasi S: A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens. Nucleic Acids Res. 2012, 40 (15): e116-10.1093/nar/gks610.
Article PubMed Central CAS PubMed Google Scholar
Mokry M, Nijman IJ, van Dijken A, Benjamins R, Heidstra R, Scheres B, Cuppen E: Identification of factors required for meristem function in Arabidopsis using a novel next generation sequencing fast forward genetics approach. BMC Genomics. 2011, 12: 256-10.1186/1471-2164-12-256.
Article PubMed Central PubMed Google Scholar
De Leeneer K, De Schrijver J, Clement L, Baetens M, Lefever S, De Keulenaer S, Van Criekinge W, Deforce D, Van Nieuwerburgh F, Bekaert S: Practical tools to implement massive parallel pyrosequencing of PCR products in next generation molecular diagnostics. PLoS One. 2011, 6 (9): 25531-10.1371/journal.pone.0025531.
Article Google Scholar
Dohm Juliane C, Lattaz C, Borodina T, Himmelbauer H: Substantial biases in ultra short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008, 36 (16): e105-10.1093/nar/gkn425.
Article PubMed Central CAS PubMed Google Scholar
Reducing Error in Next Generation Sequencing Data with NextGENe™ Software’s Condensation Tool™ Application Note. http://www.softgenetics.com/ReducingError_NextGenerationSequencing_AppNote.pdf,
Clark MJ, Chen R, Lam HY, Karczewski KJ, Euskirchen G, Butte AJ, Snyder M: Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011, 29: 908-914. 10.1038/nbt.1975.
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

Supported by grants from NIH RC1NS 069541–01 and MDA G6396330.

Author information

Authors and Affiliations

Department of Human Genetics, Emory University, Michael Street, Atlanta, GA, USA
Ephrem LH Chin, Cristina da Silva & Madhuri Hegde

Authors

Ephrem LH Chin
View author publications
You can also search for this author in PubMed Google Scholar
Cristina da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Madhuri Hegde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhuri Hegde.

Additional information

Competing interests

The authors declare no competing interests.

Authors’ contributions

EC participated in the drafting of the manuscript and participated in its design and coordination of work performed. CDS participated in data analysis and drafting of the manuscript. MRH conceived the study, participated in its design and helped to draft the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Additional file 1: Table of primers used in the amplification of the 20 validation samples.(DOC 551 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chin, E.L., da Silva, C. & Hegde, M. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations. BMC Genet 14, 6 (2013). https://doi.org/10.1186/1471-2156-14-6

Download citation

Received: 01 February 2012
Accepted: 08 February 2013
Published: 19 February 2013
DOI: https://doi.org/10.1186/1471-2156-14-6

Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations

Abstract

Background

Results

Conclusions

Background

Methods

Validation samples selected for this study

DNA isolation and sample enrichment

Next-generation sequencing (NGS) analysis on an ABI SOLiD v3 sequencer

Data analysis

Mutation and polymorphism nomenclature

Results

Pooled PCR

Target matched reads

Data analysis

Coverage

Discussion

NGS is able to detect complex mutations using targeted amplification

Target amplification method needs to be chosen carefully for NGS

Coverage

Confidence score

Proportion of bases

NGS pipeline in a clinical laboratory

Changes in laboratory structure

Cost considerations when implementing NGS in a clinical laboratory

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomic Data

Contact us