Blood samples
Samples evaluated in this study were part of on-going efforts to monitor Plasmodium species infections in the Wosera, East Sepik Province, Papua New Guinea (PNG) [22, 23]. This region of northern, lowland PNG is holoendemic (parasite rate in one-year old children > 0.75 [24–26]) for malaria and all four human malaria parasite species, P. falciparum, P. vivax, P. malariae and P. ovale are observed. Informed consent was obtained from all study participants. This study was approved by the Medical Research Advisory Committee of PNG and by the Institutional Review Board for Human Investigation at University Hospitals of Cleveland, Ohio.
P. falciparum (Pf) laboratory strains
Pf laboratory-adapted strains obtained from the Malaria Research and Reference Reagent Resource (MR4, ATCC Manassas, Virginia) included 3D7 (MRA-102), HB3 (MRA-155), K1 (MRA-159) and VS/1 (MR4-176). In vitro growth of Pf was performed as described previously [27]. Thin blood smears were fixed with 100% methanol for 30 seconds, stained with 4% Giemsa for 30 minutes, and examined by microscopy with an oil-immersion objective (100×). Parasitemia was based on the number of infected red blood cells (IRBCs)/total red blood cells (infected plus uninfected RBCs; n = 1,000).
DNA template preparation
DNA was extracted from whole blood field samples (200 μL) using the QIAamp 96 DNA Blood Kit (Qiagen, Valencia, CA). Genomic DNA was extracted from cultures (200 μL) of laboratory adapted Pf strains using the QIAamp DNA blood mini kit (Qiagen, Valencia, CA).
PCR amplification of Pf SNPs
All reactions (25 μl) were performed in a buffer containing 3 pmoles of the appropriate upstream and downstream primers, 67 mM Tris-HCl, pH 8.8, 6.7 mM MgSO4, 16.6 mM (NH4)2 SO4, 10 mM 2-mercaptoethanol, 100 μ M dATP, dGTP, dCTP, and dTTP, and 2.5 units of thermostable DNA polymerase. Amplification reactions were performed in a Peltier Thermal Cycler, PTC-225 (MJ Research, Watertown, MA). Specific primers and thermocycling conditions used to amplify the Pf dihydrofolate reductase (dhfr) target sequence for evaluating polymorphisms associated with pyrimethamine resistance were described in Carnevale et al [3]. Additional, presumably neutral SNPs, chosen irrespective of location or function were utilized for assessing the sensitivity of the histogram segmentation analysis and for examining the diagnostic threshold variability were amplified using the primers and conditions listed in Additional File 1: Primers and PCR amplification conditions for additional SNPs. Following PCR amplification, products were loaded on 2% Agarose gels (Amresco, Solon, OH), and electrophoresis was performed in 1× TBE buffer (8.9 mM Tris, 8.9 mM boric acid, 2.0 mM EDTA). The gels were stained for 30 min with SYBR Gold (Molecular Probes, Eugene, OR), diluted 1:10,000 in 1× TBE buffer, and DNA products were visualized on a Storm 860 using ImageQuant software (Molecular Dynamics, Sunnyvale, CA).
LDR-FMA evaluation of Pf SNPs
The methods and strategies used to perform the LDR-FMA evaluation of Pf mutations in the dhfr gene have been previously described in detail [3]. Primers for the LDR-FMA diagnosis of the additional, presumably neutral SNPs chr1SNP, chr7SNP, chr8SNP, chr9SNP, and chr13SNP are listed in Additional File 4: Ligase detection reaction primers for genotyping additional SNPs. The following brief description and summary in Figure 6 provide an overview of the three-step, post-PCR, LDR-FMA procedure.
Following PCR amplification of the gene-specific target sequences carrying the locus of interest, products were combined into a multiplex LDR (Step #1) where allele-specific upstream primers ligate to conserved sequence downstream primers. Upstream, allele-specific primers include 5' extensions of unique "TAG" sequences. Downstream, conserved sequence primers are modified by 5' phosphorylation and 3' biotinylation. The 5' ends of the LDR products receive "classification" labeling in a second multiplex (Step #2) reaction where hybridization occurs between the TAG sequences added to the allele-specific primers and anti-TAG (complementary sequence) oligonucleotide probes bound to fluorescent microspheres. Following this hybridization reaction, products are incubated (Step #3) in a solution containing streptavidin-R-phycoerythrin (SA-PE) to allow "reporter" labeling through binding to the 3'-biotin on the conserved sequence primers. Detection of doubly labeled ligation products occurs through dual fluorescence flow cytometry in the BioPlex array reader (Bio-Rad Laboratories, Hercules, CA) and leads to collection of "reporter" signal in unique allele-specific bins. Anti-TAG oligonucleotide probes bound to fluorescent microspheres (2.5 × 105 beads/mL/US$25) are available from Luminex Corporation (Austin, TX).
Specific LDR primers/probes used for the dhfr locus have been previously described [3] and primers/probes for additional SNPs utilized here are listed in Additional Table 2. Individual reactions were performed in a solution (15 μL) containing 20 mM Tris-HCl buffer, pH 7.6, 25 mM potassium acetate, 10 mM magnesium acetate, 1 mM NAD+, 10 mM dithiothrietol, 0.1% Triton X-100, 10 nM (200 fmol) of each LDR probe, 1 μL of each PCR product, and 2 units of Taq DNA ligase (New England Biolabs, Beverly, MA). Reactions were initially heated at 95°C for one minute, followed by 32 thermal cycles at 95°C for 15 seconds (denaturation) and 58.0°C for 2 minutes (annealing/ligation). The multiplex LDR product (5 μL) was then added to 60 μL of hybridization solution (3 M tetramethylammonium chloride [TMAC], 50 mM Tris-HCl, pH 8.0, 3 mM EDTA, pH 8.0, 0.10% sodium dodecyl sulfate) containing 250 Luminex FlexMAP microspheres from each allelic set (total number of alleles = 9). Mixtures were heated to 95°C for 90 seconds and incubated at 37°C for 40 minutes to allow hybridization between SNP-specific LDR products and bead-labeled anti-TAG probes. Following hybridization, 6 μL of streptavidin-R-phycoerythrin (Molecular Probes, Eugene, OR) in TMAC hybridization solution (20 ng/μL) was added to the post-LDR mixture and incubated at 37°C for 40 minutes in Costar-6511M polycarbonate 96-well V-bottom plates (Corning Inc., Corning, NY). Detection of SNP-specific LDR:microsphere-labeled anti-TAG hybrid complexes was performed using a BioPlex array reader (Bio-Rad Laboratories, Hercules, CA); the plate temperature was set to 37°C throughout detection. Fluorescence data, reported as median fluorescence intensity (MFI; range 0 to 25,000), were collected using Bio-Rad software, BioPlex Manager 3.0 (Bio-Rad Laboratories, Hercules, CA).
As a first step in evaluating background signals generated by our LDR-FMA diagnostic assays, we analyzed 70 samples from randomly selected American Red Cross blood donors who had no history of malaria exposure. Through this analysis we observed that median fluorescent intensity (MFI) LDR-FMA signals from these samples were normally distributed. From these results, conventional methods (3× standard deviations above the mean) for establishing thresholds between negative and positive fluorescence were deemed appropriate for comparison against our polar transformation method (Table 2).
Statistical analyses and graphing
All statistical analyses were performed using MATLAB version 7.7 (R2008a or b) (MathWorks Inc., Boston, MA). After transforming a set of N bivariate (x,y) fluorescence values into polar (r,θ) coordinates and forming the histogram as described above, the "first minimum after the initial maximum" in the magnitude variable was found by applying three criteria. Let n(r) denote the number of counts in the histogram centered at magnitude r and let Δr denote the bin width. (1) The histogram count at the minimum should be less than the counts on either side, or possibly equal to the number on the right, i.e. n(r-Δr) > n(r) ≤ n(r + Δr). (2) The location of the minimum should exceed the location of the first maximum. The first maximum is required to be a local maximum and to exceed a minimum count requirement, to avoid finding spurious local max/min combinations before the true first population maximum. We used bins of size Δr = 100 MFI units, approximately equal to a single standard deviation of the MFI distribution for uninfected blank samples. We find the following heuristic to be robust in practice: assuming at least 10% of the population falls in the "uninfected" class, we expect approximately 50% of that number to fall within one standard deviation of the uninfected population mean. Therefore we use a threshold for the bin count at the "first maximum" of max(8,N/20). (3) We impose a minimum value for the relevant "first minimum after the initial maximum." Assuming bin widths and standard deviation of MFI blank signals of 100 each, we set the bin occupancy threshold equal to twice the number of counts expected to fall between two and three standard deviations from the blank mean, if all N samples were actually uninfected. That is, we require
where erf denotes the standard error function.
After obtaining a magnitude cutoff (r*) for distinguishing infected from uninfected samples, we produce a histogram of the angles θ for all Ninf ≤ N "infected" samples, i.e. those with (r > r*), in 45 bins with a width of 2 degrees ranging from zero to π/2 (ninety degrees). Here let n(θ) denote the number of counts in the histogram centered at angle θ and let Δθ denote the bin width for the angle histogram. Again we apply three criteria to find the wild-type/mixed cutoff (θ
lo
, close to θ = 0) and the mutant/mixed cutoff (θ
hi
, close to θ = π/2). First we find the largest count in the range 0 to π/4 (45 degrees) or π/4 to π/2 (45 to 90 degrees), respectively. (1) Given maximum counts at θ = θ0 (the local maximum close to θ = 0) and θ = θ90 (the local maximum close to θ = π/2) we require θ0 <θ
lo
and θhi <θ
90
. (2) We require that the diagnosis thresholds be placed at local minima of the histogram, i.e. n(θ
lo
- Δθ) > n(θ
lo
) ≤ n(θ
lo
+ Δθ) and n(θ
hi
- Δθ) ≥ n(θ
hi
) <n(θ
hi
+ Δθ). (3) In order to be an appropriate local minimum, the count at θ
lo
or θ
hi
should not exceed the expected number of counts for the uniform distribution of Ninf samples over 45 bins, or Ninf/45.
Confidence Intervals
As described in the Results section, we estimated the probability of classification error via bootstrap resampling with replacement to establish percentage confidence interval distributions for the three thresholds (uninfected vs. infected; X+/Y- vs. X+/Y+; X-/Y+ vs. X+/Y+) [12]. With each resampled data set, we used the histogram method to determine thresholds. After running the bootstrap 1,000 to 100,000 times, we determined (1 - α) percentile confidence intervals that were placed at the
and
quantiles of the bootstrap estimated thresholds (Figure 4C). We used these confidence intervals to obtain upper bounds on the probability that the classification of a randomly chosen member of the population might change upon resampling as follows. Let ρ(r,θ) denote the density of signal throughout the polar plane, and let f(r,θ) denote the classification confidence associated with any given point on the polar plane. We would like to approximate the following integral in order to bound the total probability of error by our thresholds (that is, the probability of misclassifying a sample):
for θ- = min(θ) and θ+ = max(θ) (respectively for r- and r+). We can rewrite this integral as:
where x
i
= (r
i
,θ
i
) and N is our total sample size. If we choose a subset of confidences, such as {α1,α2,...α
n
} for α
j
∈ (0,1) then we can obtain an upper bound on the probability of error as the sum of the desired confidences, α
j
, multiplied by the fraction of the total sample size that lies inside those α
j
confidence intervals. (For example, a value of α = 0.05 corresponds to a 95% confidence.) From these estimates, we obtain a lower bound on the overall confidence in our classifications (see Table 1).