Skip to main content

Developing molecular tools and insights into the Penstemon genome using genomic reduction and next-generation sequencing



Penstemon’s unique phenotypic diversity, hardiness, and drought-tolerance give it great potential for the xeric landscaping industry. Molecular markers will accelerate the breeding and domestication of drought tolerant Penstemon cultivars by, creating genetic maps, and clarifying of phylogenetic relationships. Our objectives were to identify and validate interspecific molecular markers from four diverse Penstemon species in order to gain specific insights into the Penstemon genome.


We used a 454 pyrosequencing and GR-RSC (genome reduction using restriction site conservation) to identify homologous loci across four Penstemon species (P. cyananthus, P. davidsonii, P. dissectus, and P. fruticosus) representing three diverse subgenera with considerable genome size variation. From these genomic data, we identified 133 unique interspecific markers containing SSRs and INDELs of which 51 produced viable PCR-based markers. These markers produced simple banding patterns in 90% of the species × marker interactions (~84% were polymorphic). Twelve of the markers were tested across 93, mostly xeric, Penstemon taxa (72 species), of which ~98% produced reproducible marker data. Additionally, we identified an average of one SNP per 2,890 bp per species and one per 97 bp between any two apparent homologous sequences from the four source species. We selected 192 homologous sequences, meeting stringent parameters, to create SNP markers. Of these, 75 demonstrated repeatable polymorphic marker functionality across the four sequence source species. Finally, sequence analysis indicated that repetitive elements were approximately 70% more prevalent in the P. cyananthus genome, the largest genome in the study, than in the smallest genome surveyed (P. dissectus).


We demonstrated the utility of GR-RSC to identify homologous loci across related Penstemon taxa. Though PCR primer regions were conserved across a broadly sampled survey of Penstemon species (93 taxa), DNA sequence within these amplicons (12 SSR/INDEL markers) was highly diverse. With the continued decline in next-generation sequencing costs, it will soon be feasible to use genomic reduction techniques to simultaneously sequence thousands of homologous loci across dozens of Penstemon species. Such efforts will greatly facilitate our understanding of the phylogenetic structure within this important drought tolerant genus. In the interim, this study identified thousands of SNPs and over 50 SSRs/INDELs which should provide a foundation for future Penstemon phylogenetic studies and breeding efforts.


Interest is increasing in drought tolerant landscape plants due to water shortages experienced by many municipalities, especially in the Southwestern US [1, 2]. However, the increased use of drought tolerant species also carries concerns regarding the introduction of non-native and potentially invasive species [3, 4]. One way to address both issues is to landscape with native xeric flora [3]. Penstemon Mitchell (Plantaginaceae) has excellent potential for xeric landscapes and some Penstemon cultivars, adapted to mild climates, are already used throughout Europe as landscape plants [510]. Despite its potential, few Penstemon cultivars are used in xeric landscapes and there has been little to no drought or cold tolerant cultivar development for such landscapes [68, 1012]. Penstemon, with over 270 species, is one of the largest and most diverse plant genera of those that are strictly indigenous to North and Central America. This genus features a deep diversity in morphology, including a broad assortment of colors, flowers, and leaf structures. Penstemon’s putative center of origin is the arid Intermountain West of the United States [13, 14] and has frequently been discussed as an untapped resource for xeric landscape cultivar development [57, 911, 1517]. Because domestication and cultivar development, of any species, is slow, costly, and time consuming, few in the landscape industry have invested in native species breeding. However, given the recent and dramatic decrease in costs and relative ease of genotyping, we anticipate the wider utilization of marker assisted selection to accelerate breeding programs of native species, including drought tolerant Penstemon[1820].

PCR-based markers are now essential tools to facilitate plant domestication, plant breeding, germplasm conservation, phylogenetics, and genetic mapping studies [1922]. Not surprisingly, little molecular or traditional genetic work has been reported for Penstemon[23]. To achieve broad resolution of the genome with three of the most efficient markers, SSRs (simple sequence repeats or microsatellites), INDELs (insertions/deletions), and SNPs (single nucleotide polymorphisms), vast amounts of DNA sequence are needed, particularly for SNPs where sufficient read depth is needed to distinguish true polymorphisms from sequence noise [2426]. With the development of next-generation sequencing (e.g., Roche 454-pyrosequencing) the cost of high-throughput marker discovery has been dramatically reduced [18]. Additionally, Maughan et al. [25] described a simple genome reduction method, known as GR-RSC (genome reduction using restriction site conservation), which reduces the genome by > 90% thereby, making it feasible to redundantly sequence the remaining genome with next-generation sequencing technologies. This process is repeated across multiple cultivars or species, with comparisons identifying many inter- and intraspecific homologous loci. Genomic reduction techniques consistently identify homologous loci between related species [20, 27], and GR-RSC has enabled the identification and development of interspecific homologous SNPs [20].

We utilized GR-RSC to identify homologous sequences in four diploid (2n = 2x = 16) Penstemon species chosen to represent a range of taxonomic and genome size diversity [5, 14]. Included in our analysis are two closely related species from the subgenus Dasanthera (P. davidsonii Greene and P. fruticosus (Pursh) Greene var. fruticosus), one from the subgenus Habroanthus (P. cyananthus Hook. var. cyananthus), and one (P. dissectus Elliot) from the monophyletic subgenus Dissecti, which is phenotypically divergent from all other Penstemon species. This experimental design allowed us to make broad inter- and intra-subgenera comparisons in Penstemon. The objectives of our study were three-fold: First, identify homologous SSR and INDEL markers from the four diverse species and test their conservation across 93, mostly xerophilic, Penstemon taxa. Second, identify conserved homologous sequences for SNPs for use in future interspecific studies. Third, assess observed variation in the GR-RSC sequences to gain insights into the Penstemon genome and possible reasons for the large size variation previously identified among the diploid taxa [5].


Plant material and DNA extraction

DNA from P. cyananthus, P. davidsonii, P. dissectus, and P. fruticosus leaf tissue was extracted using the CTAB purification method [28] with modifications [29] for the GR-RSC technique. The source localities and identification of these plants have been reported previously [5]. A single sample from each species with the highest quality and DNA concentration, as determined using a ND-1000 spectrophotometer (NanoDrop Technologies Inc., Montchanin, DE), was selected to provide the 500 ng of DNA necessary for the genome reduction protocol.

For the molecular marker experiments, we used 93 Penstemon taxa. Leaf tissue was collected mostly from wild populations in the United States Intermountain West (Table 1). Each field-collected sample was identified to species and (or) variety using taxonomic keys specific to the area [30, 31]. We extracted DNA using Qiagen DNeasy Plant Mini Kit (Qiagen Inc., Valencia, CA), and concentrations were diluted to 25–35 ng/μL.

Table 1 Penstemon taxa (with collection counties) utilized in the 12 marker analysis with respective marker sizes

Genome reduction, barcode addition and 454 pyrosequencing

Genome reduction followed Maughan et al. [25]. Briefly, for each sample, EcoRI and BfaI were used for the initial restriction digest, after which a biotin-labeled adapter was ligated to the EcoRI restriction site and a non-labeled adapter was ligated independently to the BfaI restriction site. Next, a non-labeled size exclusion step using Chroma Spin + TE-400 columns (Clontech Laboratories, Inc., Mountain View, CA) and magnetic biotin-streptavidin separation (Dynabeads M-280 Streptavidin, Invitrogen Life Science Corporation, Carlsbad, CA) was performed. Unique multiplex identifiers (MID) barcodes were added independently to each species using primers complementary to the adapter and cut sites (Table 2). Preliminary amplification was performed using 95°C for 1 min., 22 cycles of 95°C for 15 s, 65°C for 30 s, and 68°C for 2 min. PCR products were loaded into a 1.2% agarose Flashgel DNA Cassette (Lonza Corporation; Rockland, ME) to verify smearing and adequate amplification in preparation for pyrosequencing.

Table 2 The four multiplex identifiers (MID) barcodes (adapter) primers used for the genomes of Penstemon cyananthus , P. dissectus , P. davidsonii , and P. fruticosus

After the initial PCR, concentrations of each of the four species samples were determined fluorometrically using PicoGreen® dye (Invitrogen, Carlsbad, CA). Samples were then pooled using approximately equal molar concentrations of each species except for P. cyananthus (genome size = 1C = 893 Mbp), where the molar concentration was doubled to maintain a similar genomic representation compared to the other three species with smaller genome sizes (P. dissectus, 1C = 462 Mbp; P. davidsonii, 1C = 483 Mbp; and P. fruticosus, 1C = 476 Mbp; [5]). DNA fragments between 500–600 bp were selected following Maughan et al. [25]. Sequencing was performed by the Brigham Young University DNA Sequencing Center (Provo, UT) using a half 454-pyrosequencing plate, Roche-454 GS GLX instrument, and Titanium reagents (Brandord, CT).

Sequence assembly

Sequence data were sorted by species using their unique MID species barcode (Table 2) by means of the software package CLC Bio Workbench (v. 2.6.1; Katrinebjerg, Aarhus N, Denmark). Following sorting (Table 2), assemblies were performed using Roche’s de novo assembler, Newbler (v. 2.6), which yields consensus sequences (contigs) of all individual reads, from each independent species, for use in subsequent analyses.

A full assembly (all individual reads of all four species pooled together) was performed by Newbler with “complex genome” parameter set and a trim file with MID barcodes specified; all other parameters were left to their defaults. For all subsequent species assemblies (all individual reads of one species), these same parameters were used with a few added conservative options selected: an expected depth of ‘10’ (20 default), a minimum overlap length of ‘50’ (40 default), and a minimum overlap identity of 95% (90% default).

Repeat element identification

Assembled sequences from all four species were masked for possible genome wide repetitive elements using a combination of RepeatModeler and RepeatMasker [32]. RepeatModeler is a de novo repeat element family identification and modeling algorithm that implements RECON [33] and RepeatScout [34]. RepeatModeler scanned all contigs from the four Penstemon species assemblies and produced a predicted repeat element library of predictive models to find repeat elements. Using this reference library, RepeatMasker then scanned the four species to filter out repetitive elements. Singletons were omitted from the analysis. To assess possible repetitive element biases with RepeatMasker when implementing a denovo library from RepeatModeler, we analyzed the GR-RSC data from Arabidopsis RIL’s (recombinant inbred lines) Ler-O and Col-4 from Maughan et al’s. [35] study, compared to the Arabidopsis non-reduced genome downloaded from TAIR (The Arabidopsis Information Resource) [36].

Marker development, verification, and use

To identify SSRs, INDELs, and SNPs, we used software MISA and SNP_Finder_Plus (custom Perl-script), respectively [25, 37, 38]. RepeatMasker was used to identify and mask transposable elements. MISA parameters were set as follows: di-nucleotide motifs had a minimum of eight repeats, tri-nucleotide motifs had a minimum of six repeats, tetra-nucleotide motifs had a minimum of five repeats, and 100 bp was set as the interruption (max difference between two purported SSR alleles). For the comparison of SSR frequency and repeat motifs across species, “unmasked” assembly files were used to remove bias caused by masking low complexity reads. The following parameters were used to define the heuristic thresholds for SNP_Finder_Plus: 8× minimum read depth for the SNP, 30% proportion of the reads representing the minor allele and 90% identity (an indication of homozygosity within a single species used in a dual-species assembly) required for each SNP locus. These parameters also helped compensate for sequencing and assembly errors, which allow greater confidence in calling base pair discrepancies as actual SNPs in the dual-species assemblies and the confident identification of heterozygosity in the individual assemblies. For both individual assemblies and dual species assemblies SNPs reported are those conforming to the aforementioned parameters.

All genomic sequences matching the above criteria were used for marker development. Primer3 v2.0 [39] was used to identify primers for amplifying these markers, with the following parameters: optimal primer size = 20 (range = 18–27); product size range = 100–500 base bp; Tm range = 50–60°C with 55°C optimum; and maximum polynucleotide = 3. Allowing PCR products greater than 200 bp greatly increased the possibility of INDELs in the PCR products.

The PCR (SSR/INDEL) markers were validated using the original four species as template DNA. Each 10 μl PCR reaction had ~ 30 ng genomic DNA, 0.05 mM dNTPs, 0.1 mM cresol red, 1.0 μl of 10X PCR buffer (Sigma-Aldrich, St. Louis, MO), 0.5 units of JumpStart™ Taq DNA Polymerase (Sigma-Aldrich, St. Louis, MO) and 0.5 μM (each) of the forward and reverse primers. The thermal cycler (Mastercycler® Pro; Eppendorf International; Hamburg, Germany) was set as follows: 94°C for 30 s, 45 cycles of 92°C for 20 s, (primer specific annealing temperature)°C for 1 min. 30 s, 72°C for 2 min., and 72°C for 7 min. (final extension). Following PCR reactions, DNA was loaded into 3% Metaphor® agarose (Lonza Corporation; Rockland, ME) gels and run using a gel electrophoresis box at 100 V for 2 h. Optimal annealing temperatures for each SSR/INDEL marker were selected based on clarity of bands produced over varying annealing temperatures. Only SSR/INDEL markers with one or two reproducible bands are reported in the marker studies (Tables 1 and 3). The same conditions used for marker validation were used in the SSR/INDEL marker studies, except gel electrophoresis times were increased to 4 h at 100 V.

Table 3 Summary of marker characteristics including the primary SSR motif identified in the original GR-RSC (genome reduction using restriction site conservation) sequence, primer sequences, EFL (expected fragment length), total bands, and fragment sizes

The gels were evaluated and scored as: 1 = marker present; 0 = marker absent based upon molecular weight. The results were then analyzed to assess the strength of hierarchical signal in these data using 10,000 replications of fast bootstrapping as implemented in PAUP* v. 4.0b10 [40].

Our interspecific SNP genotyping was accomplished using Fluidigm (Fluidigm Corp., South San Francisco, CA) nanofluidic Dynamic Array Integrated Fluidic Circuit (IFC) Chips [40] on the EP-1TM System (Fluidigm Corp., South San Francisco, CA) and competitive allele-specific PCR KASPar chemistry (KBioscience Ltd., Hoddesdon, UK). A 5 μL sample mix, consisting of 2.25 μL genomic DNA (20 ng μL-1), 2.5 μL of 2x KBiosciences Allele Specific PCR (KASP) reagent Mix (KBioscience Ltd.), and 0.25 μL of 20x GT sample loading reagent (Fluidigm Corp., South San Francisco, CA) was prepared for each DNA sample. Similarly, a 4 μL 10x KASP Assay, containing 0.56 μL of the KASP assay primer mix (allele specific primers at 12 μM and the common reverse primer at 30 μM), 2 μL of 2x Assay Loading Reagent (Fluidigm Corp., South San Francisco, CA), and 1.44 μL DNase-free water was prepared for each SNP assay.

The two assay mixes were added to the dynamic array chip, mixed, and then thermal cycled using an integrated fluidic circuit Controller HX and FC1 thermal cycler (Fluidigm Corp., South San Francisco, CA). The thermo cycler was set as follows: 70°C for 30 min; 25°C for 10 min for thermo mixing of components followed by hot-start Taq polymerase activation at 94°C 15 min then a touchdown amplification protocol consisting of 10 cycles for 94°C for 20 sec, 65°C for 1 min (decreasing 0.8°C per cycle), 26 cycles of 94°C for 20 sec, 57°C for 1 min, and then hold at 20°C for 30 sec. Five end-point fluorescent images of the chip were acquired using the EP-1TM imager (Fluidigm Corp., South San Francisco, CA), once after the initial touchdown cycles were complete and then after each additional run on “additional touchdown cycles.” The extra cycles were run four times, with an analysis of the chip after each run.

The determination of each SNP allele was based on a minimum of at least two of three SNP genotyping experiments. The primers were then analyzed for functionality using the results from each of the five stops for each chip, which were compared to determine the most accurate call. Functionality was determined by number of calls verses no calls, and consistency.

Cross species sequencing verification

To evaluate the DNA sequence homology and polymorphism type (SSR or INDEL) at specific marker amplicons (Table 1) across the Penstemon genus, DNA samples from each of five species (P. cyananthus, P. davidsonii, P. dissectus, P. fruticosus, and P. pachyphyllus) were amplified and Sanger sequenced. We accomplished the PCR amplification using Qiagen HotStarTaq Plus Master Mix (Valencia, CA, USA) according to the manufacturer’s recommendations. The amplification protocol consisted of an initial denaturation step of 5 min at 95°C, followed by 40 cycles of amplification consisting of 30 sec denaturation at 94°C, 30 sec for primer annealing at 55°C and 1 min of extension at 72°C. PCR products were separated on 1% agarose gels run in 0.5X TBE and visualized by ethidium bromide staining and UV transillumination. PCR products were purified using a standard ExoSAP (Exonuclease I/Shrimp Alkaline Phosphatase) protocol and sequenced directly as PCR products. DNA sequencing was performed at the Brigham Young University DNA Sequencing Center (Provo, UT, USA) using standard ABI Prism Taq dye-terminator cycle- sequencing methodology. DNA sequences were analyzed, assembled and aligned using Geneious software (Biomatters, Auckland, New Zealand).

Gene ontology

We used BLASTX [41] on assembled sequences of all four species to compare with the GenBank refseq-protein database [42] with a threshold of < 1.0e-15. Blast2GO (v2.4.2) was used to map the blast hits and annotate them to putative cellular components, biological processes, and molecular functions found in the blast database [43]. For species comparisons, the GO level 3 was used for cellular components and level 2 was used for both biological processes and molecular functions.

Assembled sequences of all four species were also compared to all available Antirrhinum and Mimulus (genera more or less related to Penstemon) genes on GenBank (downloaded 23 June 2011). Comparisons were made using BLASTN [41] with an e-value threshold of <1.0e-13.

Results and discussion

Genome reduction, pyrosequencing and species assemblies

Given that a full 454 pyrosequencing plate using Titanium reagents is capable of producing 1.3 million reads averaging ~400 bp each [25], we expected a half plate to produce approximately 250 Mbp from 650,000 reads. Our reaction produced 287 Mbp from 733,413 reads, 20% more than expected, with an average read length of 392 bp. In total, 93.8, 46.4, 48.8, and 53.3 Mbp were sequenced from P. cyananthus, P. dissectus, P. davidsonii and P. fruticosus, respectively, closely resembling the 2:1:1:1 ratio of DNA pooled from each species for sequencing (Table 4). Likewise, from our de novo assemblies, we identified nearly twice as many contigs, 9,714 in P. cyananthus than the 4,777 found in P. fruticosus, for example, which was expected because we sequenced approximately twice as much DNA from P. cyananthus than the other three species. There was 0.6% of P. cyananthus genome represented compared to 0.5% average coverage of the other three species (Table 4); thus, essentially an equal genome representation from each species was realized using the GR-RSC technique by pooling approximately equal genome molar concentrations in the sequencing reaction. The contigs of this study have been deposited at DDBJ/EMBL/GenBank as a Whole Genome Shotgun project under the accessions AKKG00000000 (P. cyananthus), AKKH00000000 (P. dissectus), AKKI00000000 (P. davidsonii), and AKKJ00000000 (P. fruticosus). The version described in this paper is the first version for each accession, XXXX01000000.

Table 4 Summary data from 454-pyrosequencing and Newbler de-novo assembly (v.2.0.01) of Penstemon cyananthus , P. dissectus , P. davidsonii , and P. fruticosus

DNA sequences produced by the GR-RSC technique represent a broad sample of the genome. With this sample, we can begin to estimate genome-wide characteristics, such as GC content, frequency of repeat elements, and so forth. From the genome reduction, GC content was measured to be 36.4%, 34.5%, 35.3%, and 35.15% for P. cyananthus, P. dissectus, P. davidsonii and P. fruticosus, respectively (Table 4), matching the average 35% GC content reported for dicots [44]. Using the dicot average GC content a priori, we estimated a theoretical frequency of the BfaI and EcoRI recognition sites. The theoretical GC content in combination with estimated genome sizes of the four species [5] suggested the GR-RSC should have rendered a 104 fold reduction of the genome of each species. With a reduced genome of these species, the 650,000 reads that were sequence suggest an average of 11× coverage; however the observed read depth was 8.5×, 22.7% less than expected (Table 4). This lighter coverage is partly due to the lower than expected specificity of reads. An average of 48.2% of the reads were matched to contigs with the other 51.8% either too short or lacking in homology to successfully match to a contig (Table 4).

The full assembly of all four Penstemon, using the Newbler de novo assembler, produced a total of 44,966 contigs, representing 16.4 Mbp, or 5.7% of our total sequence. In the individual species assemblies of P. cyananthus, P. dissectus, P. davidsonii, and P. fruticosus, a total of 9,714, 5,364, 4,882, and 4,777 contigs were created representing 4.6, 2.6, 2.4, and 2.3 Mbp of assembled bases respectively. These contigs represent, on average, 0.5% of the total genomes being sequenced (Table 4).

Marker analysis

We utilized assembly contigs from genomic sequence of all four species with “masked” multiple repeats, such as transposons, to identify SSRs. Penstemon cyananthus, P. dissectus, P. davidsonii, and P. fruticosus had 97, 113, 49, and 58 SSRs identified respectively (Table 5). There were more SSRs identified in P. dissectus than P. cyananthus, which has a 1.9 times larger genome and a higher representation of sequence than P. dissectus (Table 5). This inverse relationship between genome size and SSRs content agrees with observations in other plant genomes [45]. Some SSRs were found as putative homologs in multiple species; after eliminating redundancies, we tallied 133 unique SSRs (Table 3). We generated primer pairs surrounding 77 of these SSRs large enough to potentially capture INDELs, of these, 51 produced 1 or 2 reproducible bands with no or few faint superfluous bands. From those 51, there was an overall success rate of 94% with 42 (82%) being polymorphic between the four species (Table 3).

Table 5 Data obtained from MISA (SSR), Blast2GO (GO) and RepeatMasker (RM)

To assess the possibility of utilizing these markers in interspecific plant improvement studies, 12 of the 51 SSR/INDEL markers (Table 3) were tested on 93 mostly xeric Penstemon taxa (72 species [Table 1]) representing five of six subgenera recognized in the genus [14]. The overall success rate of the markers was 98% with 100% being polymorphic across the 93 taxa. Without sequencing each band and/or doing inheritance studies on each marker it is not possible to clearly determine if a polymorphism of a given marker is a variant of an allele or a new locus. However, we did amplify and sequence the amplicon produced at 11 of these markers in five Penstemon species (P. cyananthus, P. davidsonii, P. dissectus, P. fruticosus, and P. pachyphyllus). P. pachyphyllus var. pachyphyllus represents the largest subgenus (Penstemon) in the genus. These five species represented four of the presently classified six Penstemon subgenera. Of the 55 attempted sequences, 60% produced high quality sequences results which could be compared to the original 454 contigs containing the microsatellites. Using BLASTN (v2.2.25+) [41] we found that 33 sequences matched the respective microsatellite-containing contigs from which the SSR/INDEL markers were derived with an e-value of no more than 1.0e-36. An example of the types of polymorphism (SSRs and INDEL) found at these loci across the various species is represented graphically for the marker PS035 (Figure 1). For 22 (40%) of the 55 attempted sequences, we were unable to obtain high quality sequence information. In the majority of these cases (94%) the lack of high quality data was clearly due to the amplification of multiple amplicons (seen as multiple bands in gel electrophoresis) which impeded the sequencing of the PCR reaction. The source of the multiple amplicons may be from heterozygousity at the locus or from the amplification of paralogous loci.

Figure 1
figure 1

An example of SSR and INDEL found in the comparisons of four Penstemon species in the sequences of marker Pen035.

Both the sequence data (Figure 1) as well as the marker size data (Tables 1 and 3) are clear evidence of sequence conservation, and probable homologous loci, in many of the SSR/INDEL markers. Marker PS012, the apparent most conserved marker, had six unique molecular weight bands and was present in all 93 taxa. The marker with the most diversity in its molecular weights was PS011 which had 18 variants and was not readable in seven of the 93 taxa. Of the 1,116 possible marker × taxa interactions, 22 (2.0%) did not produce reliable data. Seven of those 22 (0.5%) were absent of any product with the remaining 15 producing multiple bands (reported as ambiguous data). Clearly readable double bands were found in 135 of the 1,116 (12.1%) marker × taxa interactions (Table 1).

Our data suggest a high degree of sequence conservation across the genus, favoring the present hypothesis of a recent and rather rapid evolutionary radiation of the genus [13, 14]. Furthermore, our data agree with Morgante et al. [45] who suggest that SSR presence in non-coding sequence are highly conserved and predate recent genome expansions of many plants. Some of our markers differed in length by as much as 570 bp (Tables 1 and 3) suggesting the presence of INDELs and possibly additional SSRs (Table 3). We confirmed the presence of INDELs in the sample of 11 markers which we sequenced (Figure 1). In some instances, these large fragment length variances may be amplifying a different locus, which is a recognized concern when using SSR based markers above the species level [46, 47]. INDELs are useful as PCR based markers since they, like SSRs, are codominant and abundant in the genome and are commonly used in genetic mapping [26]. By combining the SSRs we identified in the source sequence for each of these markers with potential INDELs, alleles will be easily and inexpensively identified by gel electrophoresis.

To assess the possibility of phylogenetic (i.e., hierarchical) structure of the variation within these SSR/INDEL data at the broad taxonomic scale of our survey, we analyzed the 12 marker data set (Table 1) with PAUP*. Fast bootstrapping recovered a largely unresolved topology suggesting rampant homoplasy. Or one or more of these markers represent more than one locus. These results are similar to what others have reported about SSR type markers. SSRs have demonstrated utility for population and intraspecific relationships, such as cultivar differentiation; however, they can be problematic when used to reconstruct relationships above the species level where length differences are expected to poorly reflect homology [47, 48]. Nonetheless, with over 96% of these SSR/INDEL regions being conserved across Penstemon, these markers have potential for studies of interspecific hybridization and cultivar development.

Interspecific Penstemon breeding is complex [7, 11, 15, 49]; thus, having a set of inexpensive and easily used SSR/INDEL markers, which amplify across the genus, will have utility in understanding the results of some wide crosses. Empirical studies of various Penstemon interspecific crosses have ranged from a clearly recognizable intermediate phenotype of the two parents, to the F1 essentially mimicking one of the two parents, usually mirroring the female parent. Furthermore, in some instances the F2’s and additional generations continue to mimic the female parent to the point that Viehmeyer [49] began to question if apomixis was involved. An example of this phenomenon was a ‘Flathead Lake’ × P. cobaea interspecific cross. It was not until the hybrid progeny of this cross was crossed with other interspecific hybrids when the progeny gave a much wider range of phenotypes [49]. A probable reason for this phenomenon is “unequal segregation” which has been described in other wide crosses [50, 51]. Thus through the use of these SSR/INDEL markers, regions of the genome can be identified which are unusual genotypic combinations, for that specific cross, and selections made accordingly [5154]. Thus increasing the number of unique genotype/phenotype plants to be grown out to maturity from thousands of seedlings. Since many Penstemon require two years before their first anthesis, using markers to identify the greatest number of genotypic diverse plants is potentially very useful in the breeding of this crop.

Beyond amplification ability, we also assessed the composition and trends of all SSRs identified. On average, adenine and thymine rich repeat motifs were the most common repeat type in the di-, tri-, and tetra-nucleotide repeat motifs (Figure 2). In general, AT motifs are the most common motifs in noncoding regions of most plant genomes [45]. More variation was observed in the repeat motifs in the tetra-nucleotide repeats across the four species. Even closely related P. fruticosus and P. davidsonii had completely distinct tetra-nucleotide repeat motifs (Figure 2). This is likely due, in part, to the rarity of the motifs and high number of possible nucleotide combinations. Several studies have found that the hypothetical origins of some SSRs are retrotransposition events [48, 55, 56] and, as such, may be useful in developing part of a unique “fingerprint” for a given species.

Figure 2
figure 2

Simple sequence repeat (SSR) motif distributions identified in each of the four Penstemon ( P. cyananthus , P. dissectus , P. davidsonii , and P. fruticosus ) sequences using the program MISA.

SNP analysis

Using our SNP discovery parameters of an 8× minimum coverage, and 30% representation of the minor allele, we identified an average of one SNP per 2,890 bp across the four species ranging from P. cyananthus (1/1,855 bp) to P. fruticosus (1/3,777 bp). The three species with similar genome sizes all had similar SNP frequencies (Table 6). As reported in other plant species [57, 58], we found that the frequencies of bp transitions (A↔G or C↔T) were more common compared to transversions (A↔T, A↔C, G↔C, G↔T) in Penstemon by an average factor of 1.5 (Table 6). This is close to the 1.4 factor in Arabidopsis[35]. In the dual species assemblies, using the same parameters and a 90% SNP identity, the average transition to transversion mutation rate was lower at 1.2 (Table 6).

Table 6 SNP type and distributions along with SNP comparisons of sequences found within and between species (homologous sequence comparisons) using SNP_Finder_Plus (8X min. coverage, 30% min. minor allele, 90% min. identity)

In the dual species assembly, we found an average of 1 SNP/97 bp between homologous sequence assemblies of any two of the four species. The frequency of SNPs between homologous sequences of P. dissectus and P. davidsonii was the highest at 1/64 bp, with the lowest being between P. cyananthus and P. davidsonii at 1/119 bp. These results are in line with previous molecular based studies [5, 14]. Penstemon davidsonii and P. fruticosus both belong to subgenus Dasanthera, while P. cyananthus and either P. davidsonii or P. fruticosus homologous sequences had fewer SNPs at 1/124 and 1/119, respectively. All homologous sequence comparison involving P. dissectus had the highest density of SNPs (Table 6) suggesting that P. dissectus is the most evolutionary distant of the four species.

It is important, for a high degree of confidence in the results, when the “SNP identity” parameter in SNP_Finder_Plus to have two or more independent samples from the same species. This requirement was not met for each of the species assemblies, thus, introducing a weakness in our interspecific SNP comparisons. Although with the parameters of a minimum 8× coverage and minor allele frequency set at least 30%, a putative SNP must be present in at least three of the eight contig reads, thus providing some protection from mislabeling a sequencing and/or assembly error as a SNP. Furthermore, when doing across species comparisons the average SNP coverage was actually 14.4× (Table 6). Therefore, on average, five identical putative SNPs represented the minor allele.

To understand the viability of our interspecific SNP as markers, we utilized the 1,958 P. davidsonii × P. cyananthus and 2,348 P. davidsonii × P. dissectus SNPs identified in the 14,254 and 10,053 respective homologous contig parings (Tables 4 and 6). After removing contigs absent of identifiable SNPs, putative repetitive elements, and nonnuclear plastid DNA, 431 remained. Of these contigs, 99 were homologous across all three species (P. cyananthus, P. davidsonii and P. dissectus) another 164 were only in the P. davidsonii × P. cyananthus comparisons while the remaining 168 were in the P. davidsonii × P. dissectus contigs. Of those 431 contigs, we selected the first 192 for SNP marker development, 86 from each of the species comparisons. These contigs were utilized for competitive allele-specific PCR SNP primer design using PrimerPicker (KBioscience Ltd., Hoddesdon, UK).

Of the 192 SNP markers tested, using KASPar genotyping chemistry, 75 (39%) of produced consistent results for P. cyananthus, P. davidsonii, P. dissectus, and P. fruticosus (Table 7). All 75 SNP markers indicated polymorphisms between P. cyananthus, P. davidsonii, and P. dissectus, where only 16 (21% of the 75) produced results in P. fruticosus (Table 7). These results suggest that it is possible to develop intrageneric SNPs for Penstemon. However, it is unclear as to how viable these markers will be for use across all the species of the genome since only 21% worked on all the species used in this GR-RSC study.

Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for all 75 functional SNP assays

Repetitive elements

We identified 28.5%, 16.8%, 17.4% and 16.1% of the respective sequence from P. cyananthus, P. dissectus, P. davidsonii, and P. fruticosus as repeat elements using RepeatModeler and RepeatMasker. Of these elements, 3.0-7.8% were identified as LTR (long terminal repeat) retroelements, 0.3-1.0% transposons and the remainder were unclassified (Table 5). Since RepeatModeler utilizes RECON and RepeatScout to create a de novo model in RepeatMasker in place of the Arabidopsis model, details about the subcategories of LTRs and transposons which are included in the model could not be addressed. Maughan et al. [35] utilized GR-RSC on the Arabidopsis lines Ler-0 and Col-4. Utilizing RepeatModeler, then RepeatMasker on their sequence data from these lines, we found an average of 6.2% were identified as repetitive elements, of which 4.4% were identified as LTR retroelements and 0.4% were transposons. By way of comparison, the downloaded full “non-genome reduced” sequence of Arabidopsis line TAIR10 had a similar 7.4% of the sequence identified as repeat elements of which 3.0% were LTR retroelements and 0.2% were transposons (Table 5 and Figures 3 and 4). These data suggest that the GR-RSC method reflects, at least for repetitive elements, similar proportions as to that found in the full sequence of Arabidopsis.

Figure 3
figure 3

Percentage of retroelements, DNA transposons and other unclassified repeats in Penstemon cyananthus , P. dissectus , P. davidsonii , P. fruticosus , and both genome reduced and non-genome reduced Arabidopsis1. 1 Genome reduced A. thaliana sequence from Maughan et al. [35]; A. thaliana RILs Ler-0 and Col-4; Non-genome reduced A. thaliana sequence downloaded from TAIR (The Arabidopsis Information Resource) as whole chromosomes; the diploid (2n = 2x = 16) genome size as reported by Broderick et al. and Schmuths et al. [5, 59].

Figure 4
figure 4

Relationship between genome size and repeat elements in Penstemon including the relationship of both LTRs and total repeat elements to genome size for both genome reduced Penstemon and genome reduced/non-genome reduced Arabidopsis1(yellow). 1 Genome reduced A. thaliana sequence from Maughan et al. [35]; A. thaliana RILs Ler-0 and Col-4; Non-genome reduced A. thaliana sequence downloaded from TAIR (The Arabidopsis Information Resource) as whole chromosomes; Genome size as reported by Broderick et al. and Schmuths et al. [5, 59].

Broderick et al. [5] hypothesized that the broad range found in Penstemon genome sizes, of the same ploidy, may be explained by retrotransposons. Lynch [60] detailed a relationship between genome size and repeat elements suggesting a linear relationship between the number of elements and genomes size [6062]. The four Penstemon species used in this study provide insufficient evidence to establish a linear relationship between genome size and repeat elements in Penstemon. However, the three smaller, similar sized, Penstemon genomes possess comparable quantities of repetitive elements whereas P. cyananthus (the largest genome) has nearly double the number of repeat elements compared to the other three species (Figure 3).

Not only do repetitive elements largely influence genome size, but they are also likely to evolve more rapidly than do low-copy sequence [62, 63]. Thus, repetitive elements of a species take on unique “fingerprints” which become valuable in phylogenetic relationship studies [64, 65]. Thus, our limited four Penstemon species genomic data set suggest agreement with the two hypotheses that firstly, repetitive elements are a major component of the genome size variation identified by Broderick et al. [5]. Secondly, these elements are variable between the species we tested suggesting the possibility of identifying species specific repetitive elements. However, without further comparisons we were unable to identify specific repetitive elements associated with the four Penstemon species used in this study.

Gene ontology

Using BLASTX we identified an average of 21.5% of the contigs across the four species as putative genes with an average of 13.9% annotated by Blast2GO (Table 5). These putative genes were compared and contrasted in a more detailed study by Dockter [23]. Furthermore, he compared the Penstemon sequences to known genes from the related genera Antirrhinum and Mimulus, and identified nine putative Penstemon genes from Antirrhinum and 14 from Mimulus with an e-value below 1.0e-13. Three genes (NADH dehydrogenase from M. aurantiacus, ribosomal protein L10 from M. guttatus, and ribosomal protein subunit 2 from M. aurantiacus, M. szechuanensis, and M. tenellus var. tenellus) were perfect hits (e-value = 0.0).


Penstemon are recognized for their phenotypic variation and their adaptation to multiple environments [68, 13, 14, 17, 30, 31]. Broderick et al. [5] found that this diversity is reflected by a wide range in their genome sizes. Nevertheless, even with this demonstrated plasticity we have identified evidence that there is a high level of sequence conservation across the genus. This apparent sequence conservation is in harmony with the hypothesis that Penstemon has rapidly irradiated to its variety of species rather recently in evolutionary time [13, 14]. Furthermore, our study identified evidence that the genome size variation in Penstemon is rooted in the amount of repetitive elements in each species.

Despite the large differences in Penstemon’s genome size, the finding that the genus has a great deal of sequence conservation is invaluable for the development of interspecific markers. The further development and mapping of a number of conserved markers will facilitate the domestication of xeric Penstemon cultivars via interspecific hybridization which are largely unexploited largely due to crossing barriers [68, 1012, 15]. Viehmeyer [16] hypothesized that it might be possible to develop Penstemon breeding lines that would facilitate the indirect interspecific hybridization of any two species within the genus. He and others have used traditional breeding techniques to develop a number of interspecific hybrids [7, 11, 15, 17, 66]. Clarifying the phylogenetic relationships within the genus should facilitate these objectives [67]. In the largest Penstemon phylogenetic study conducted to date, Wolfe et al. [14] sequenced the ITS and two chloroplast genes in 163 species. They concluded that many species are polyphyletic in their origins thus making them difficult to discriminate between one another; thus, requiring additional molecular studies to more accurately define taxonomic relationships.

We tested 51 SSR/INDEL based markers (Table 3), and identified several thousand inter- and intraspecific SNPs (Table 6), all of which have potential as both inter- and intraspecific markers. Of the 51 SSRs/INDELs we selected 12 to test across 93 Penstemon taxa. The resulting data was used to more clearly define the phylogenetic relationships of those taxa but our results were incoherent. It is possible that some of these markers may represent more than one locus in the Penstemon genome. This situation has been identified by others as a potential weakness in using SSR based markers in interspecific phylogenetic studies [46, 47]. A major reason for the vagary in Penstemon’s phylogeny is that it appears to have quite recently evolved and rapidly radiated leaving weak species boundaries [13, 14]. Furthermore, there are a number of reports of speciation via natural interspecific hybridization found within the genus [14, 6873]. Therefore, like Wolfe et al. [14], we concluded that better marker data sets will be required to reduce present phylogenetic ambiguity.

To gain clearer insights into the relationships of Penstemon it will take carefully designed large scale sequencing studies. There are methods which are showing promise to do such studies economically. One example would be to utilize GR-RSC or similar methods which will sample large quantities of homologous sequence of a genome at ever decreasing costs [18, 20, 74]. Since our SSR/INDEL, sequence, and SNP data have demonstrated broad applicability across Penstemon it becomes evident that further studies utilizing this same GR-RSC protocol and downstream analysis on additional species would allow broader comparisons of putative genes, repeat elements, SNPs and SSRs, facilitating a much better understanding of the genus. Furthermore, using this technique on carefully selected parents and their segregating progeny would allow Penstemon genetic mapping studies which would greatly enhance the ability to do breeding and domestication studies within the genus. Historically, studies of this nature would have been unthinkable; however, mass homologous loci sequence studies are rapidly becoming feasible [18, 20, 74]. In the interim it is possible to take the data we report here and further test the 75 SNPs we have reported here along with others not yet developed and for around US$0.05/data point [18, 20] do a much broader study. Studies on homologous SNPs across many Penstemon taxa, similar to the Amaranthus study of Maughan et al. [20], should assist in developing improved insights into Penstemon phylogenetic relationships and produce high quality genetic maps from carefully designed segregating Penstemon populations.


  1. St Hilaire R, Arnold MA, Wilkerson DC, Devitt DA, Hurd BH, Lesikar BJ, Lohr VI, Martin CA, McDonald GV, Morris RL, Pittenger DR, Shaw DA, Zoldoske DF: Efficient water use in residential urban landscapes. HortScience. 2008, 43: 2081-2092.

    Google Scholar 

  2. Martin CA: Landscape water use in Phoenix, Arizona. Desert Plants. 2001, 17: 26-31.

    Google Scholar 

  3. Bradley BA, Blumenthal DM, Early R, Grosholz ED, Lawler JJ, Miller LP, Sorte CJB, D’Antonio CM, Diez JM, Dukes JS, Ibanez I, Olden JD: Global change, global trade, and the next wave of plant invasions. Front Ecol Environ. 2012, 10: 20-28. 10.1890/110145.

    Article  Google Scholar 

  4. Burt JW, Muir AA, Piovia-Scott J, Veblen KE, Chang AL, Grossman JD, Weiskel HW: Preventing horticultural introductions of invasive plants: potential efficacy of voluntary initiatives. Biol Invasions. 2007, 9: 909-923. 10.1007/s10530-007-9090-4.

    Article  Google Scholar 

  5. Broderick SR, Stevens MR, Geary B, Love SL, Jellen EN, Dockter RB, Daley SL, Lindgren DT: A survey of Penstemon’s genome size. Genome. 2011, 54: 160-173. 10.1139/G10-106.

    Article  PubMed  Google Scholar 

  6. Lindgren D, Wilde E: Growing Penstemons: Species, Cultivars and Hybrids. 2003, Haverford, PA: Infinity Publishing Com

    Google Scholar 

  7. Lindgren DT: Breeding Penstemon. Breeding Ornamental Plants. Edited by: Callaway DJ, Callaway MB. 2000, Portland, Oregon: Timber Press, 196-212.

    Google Scholar 

  8. Nold R: Penstemons. 1999, Portland, Oregon: Timber Press

    Google Scholar 

  9. Viehmeyer G: Let’s breed better Penstemon. Bul Amer Penstemon Soc. 1955, 14: 275-288.

    Google Scholar 

  10. Way D, James P: The Gardener’s Guide to Growing Penstemon. 1998, Portland, OR: Timber Press

    Google Scholar 

  11. Lindgren DT, Schaaf DM: Penstemon: a summary of interspecific crosses. HortScience. 2007, 42: 494-498.

    Google Scholar 

  12. Lindgren D: List and Description of Named Cultivars in the Genus Penstemon (2006). 2006, Lincoln, Nebraska: University of Nebraska-Lincoln Extension; EC1255

    Google Scholar 

  13. Straw RM: A redefinition of Penstemon (Scrophulariaceae). Brittonia. 1966, 18: 80-95. 10.2307/2805112.

    Article  Google Scholar 

  14. Wolfe AD, Randle CP, Datwyler SL, Morawetz JJ, Arguedas N, Diaz J: Phylogeny, taxonomic affinities, and biogeography of Penstemon (Plantaginaceae) based on ITS and cpDNA sequence data. Amer J Bot. 2006, 93: 1699-1713. 10.3732/ajb.93.11.1699.

    Article  CAS  Google Scholar 

  15. Uhlinger RD, Viehmeyer G: Penstemon in your Garden. 1971, Lincoln, Nebraska: University of Nebraska College of Agriculture The Agricultural Experiment Station, Station Circular 105

    Google Scholar 

  16. Viehmeyer G: Reversal of evolution in the genus Penstemon. Am Nat. 1958, 92: 129-137. 10.1086/282021.

    Article  Google Scholar 

  17. Viehmeyer G: Advances in Penstemon breeding. Bul Amer Penstemon Soc. 1973, 32: 16-21.

    Google Scholar 

  18. Cronn R, Knaus BJ, Liston A, Maughan PJ, Parks M, Syring JV, Udall J: Targeted enrichment strategies for next-generation plant biology. Amer J Bot. 2012, 99: 291-311. 10.3732/ajb.1100356.

    Article  CAS  Google Scholar 

  19. Heslop-Harrison JS: Exploiting novel germplasm. Aust J Agric Res. 2002, 53: 873-879. 10.1071/AR02078.

    Article  Google Scholar 

  20. Maughan PJ, Smith SM, Fairbanks DJ, Jellen EN: Development, characterization, and linkage mapping of single nucleotide polymorphisms in the grain amaranths (Amaranthus sp.). Plant Gen. 2011, 4: 1-10. 10.3835/plantgenome2011.12.0001.

    Article  Google Scholar 

  21. Bernardo R: Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci. 2008, 48: 1649-1664. 10.2135/cropsci2008.03.0131.

    Article  Google Scholar 

  22. Tanksley SD, McCouch SR: Seed banks and molecular maps: unlocking genetic potential from the wild. Science. 1997, 277: 1063-1066. 10.1126/science.277.5329.1063.

    Article  CAS  PubMed  Google Scholar 

  23. Dockter RB: Genome snapshot and molecular marker development in Penstemon (Plantaginaceae). M.S. Thesis. 2011, Brigham Young University, Department of Plant and Wildlife Sciences

    Google Scholar 

  24. Santana QC, Coetzee MPA, Steenkamp ET, Mlonyeni OX, Hammond GNA, Wingfield MJ, Wingfield BD: Microsatellite discovery by deep sequencing of enriched genomic libraries. Biotechniques. 2009, 46: 217-223. 10.2144/000113085.

    Article  CAS  PubMed  Google Scholar 

  25. Maughan PJ, Yourstone SM, Jellen EN, Udall JA: SNP discovery via genomic reduction, barcoding and 454-pyrosequencing in amaranth. Plant Gen. 2009, 2: 260-270. 10.3835/plantgenome2009.08.0022.

    Article  CAS  Google Scholar 

  26. Păcurar DI, Păcurar ML, Street N, Bussell JD, Pop TI, Gutierrez L, Bellini C: A collection of INDEL markers for map-based cloning in seven Arabidopsis accessions. J Exp Bot. 2012, 63: 2491-2501. 10.1093/jxb/err422.

    Article  PubMed Central  PubMed  Google Scholar 

  27. Althoff DM, Gitzendanner MA, Segraves KA: The utility of amplified fragment length polymorphisms in phylogenetics: a comparison of homology within and between genomes. Syst Biol. 2007, 56: 477-484. 10.1080/10635150701427077.

    Article  CAS  PubMed  Google Scholar 

  28. Sambrook J, Fritcsh EF, Maniatis T: Molecular Cloning: A Laboratory Manual. 1989, Cold Spring Harbor, N.Y: Cold Spring Harbor Lab

    Google Scholar 

  29. Todd JJ, Vodkin LO: Duplications that suppress and deletions that restore expression from a chalcone synthase multigene family. Plant Cell. 1996, 8: 687-699.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Holmgren NH: Penstemon. Intermountain Flora: Vascular Plants of the Intermountain West. Volume 4. Edited by: Cronquist A, Holmgren AH, Holmgren NH, Reveal JL, Holmgren PK. 1984, Bronx, New York, USA: New York Botanical Garden, 370-457.

    Google Scholar 

  31. Welsh SL, Atwood ND, Goodrich S, Higgins LC: A Utah Flora. 2008, Provo, Utah: Brigham Young University, 4

    Google Scholar 

  32. RepeatMasker. []

  33. Bao Z, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12: 1269-1276. 10.1101/gr.88502.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  34. Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): I351-I358. 10.1093/bioinformatics/bti1018.

    Article  CAS  PubMed  Google Scholar 

  35. Maughan PJ, Yourstone SM, Byers RL, Smith SM, Udall JA: Single-nucleotide polymorphism genotyping in mapping populations via genomic reduction and next-generation sequencing: proof-of-concept. Plant Gen. 2010, 3: 1-13.

    Article  Google Scholar 

  36. Rhee SY, Beavis W, Berardini TZ, Chen GH, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu YH, Xu I, Yoo D, Yoon J, Zhang PF: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 2003, 31: 224-228. 10.1093/nar/gkg076.

    Article  CAS  PubMed  Google Scholar 

  37. Thiel T, Michalek W, Varshney RK, Graner A: Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet. 2003, 106: 411-422.

    CAS  PubMed  Google Scholar 

  38. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618. 10.1101/gr.361602.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  39. Rozen S, Skaletsky HJ: Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols: Methods in Molecular Biology. Edited by: Krawetz S, Misener S. 2000, Totowa, NJ: Humana Press, 365-386.

    Google Scholar 

  40. PAUP* Phylogenetic analysis using parsimony (*and other methods). []

  41. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.

    Article  CAS  PubMed  Google Scholar 

  42. GenBank. []

  43. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.

    Article  CAS  PubMed  Google Scholar 

  44. Kawabe A, Miyashita NT: Patterns of codon usage bias in three dicot and four monocot plant species. Genes Genet Syst. 2003, 78: 343-352. 10.1266/ggs.78.343.

    Article  CAS  PubMed  Google Scholar 

  45. Morgante M, Hanafey M, Powell W: Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002, 30: 194-200. 10.1038/ng822.

    Article  CAS  PubMed  Google Scholar 

  46. Robinson JP, Harris SA: Amplified fragment length polymorphisms and microsatellites: a phylogenetic perspective. EU-Compendium: Which DNA Marker for Which Purpose?. Edited by: Gillet EM. 1999, Göttingen, Germany: Institut für Forstgenetik und Forstpflanzenzüchtung, Universität Göttingen, 95-121.

    Google Scholar 

  47. Ochieng JW, Steane DA, Ladiges PY, Baverstock PR, Henry RJ, Shepherd M: Microsatellites retain phylogenetic signals across genera in eucalypts (Myrtaceae). Genet Mol Biol. 2007, 30: 1125-1134. 10.1590/S1415-47572007000600016.

    Article  CAS  Google Scholar 

  48. Nadir E, Margalit H, Gallily T, Ben-Sasson SA: Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications. Proc Natl Acad Sci USA. 1996, 93: 6470-6475. 10.1073/pnas.93.13.6470.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  49. Viehmeyer G: Reports dealing in large part with hybridization and selection. Bul Amer Penstemon Soc. 1965, 24: 95-100.

    Google Scholar 

  50. Zamir D, Tadmor Y: Unequal segregation of nuclear genes in plants. Bot Gaz. 1986, 147: 355-358. 10.1086/337602.

    Article  Google Scholar 

  51. Eshed Y, Zamir D: A genomic library of Lycopersicon pennellii in L. esculentum: A tool for fine mapping of genes. Euphytica. 1994, 79: 175-179. 10.1007/BF00022516.

    Article  CAS  Google Scholar 

  52. Robbins MD, Masud MAT, Panthee DR, Gardner RG, Francis DM, Stevens MR: Marker assisted selection for coupling phase resistance to Tomato spotted wilt virus and Phytophthora infestans (late blight) in tomato. HortScience. 2010, 45: 1424-1428.

    Google Scholar 

  53. Canady MA, Meglic V, Chetelat RT: A library of Solanum lycopersicoides introgression lines in cultivated tomato. Genome. 2005, 48: 685-697. 10.1139/g05-032.

    Article  CAS  PubMed  Google Scholar 

  54. Canady MA, Ji YF, Chetelat RT: Homeologous recombination in Solanum lycopersicoides introgression lines of cultivated tomato. Genetics. 2006, 174: 1775-1788. 10.1534/genetics.106.065144.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  55. Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S: Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res. 2001, 11: 1441-1452. 10.1101/gr.184001.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  56. Parida SK, Kalia SK, Kaul S, Dalal V, Hemaprabha G, Selvi A, Pandit A, Singh A, Gaikwad K, Sharma TR, Srivastava PS, Singh NK, Mohapatra T: Informative genomic microsatellite markers for efficient genotyping applications in sugarcane. Theor Appl Genet. 2009, 118: 327-338. 10.1007/s00122-008-0902-4.

    Article  CAS  PubMed  Google Scholar 

  57. Zhang FK, Zhao ZM: The influence of neighboring-nucleotide composition on single nucleotide polymorphisms (SNPs) in the mouse genome and its comparison with human SNPs. Genomics. 2004, 84: 785-795. 10.1016/j.ygeno.2004.06.015.

    Article  CAS  PubMed  Google Scholar 

  58. Morton BR, Bi IV, McMullen MD, Gaut BS: Variation in mutation dynamics across the maize genome as a function of regional and flanking base composition. Genetics. 2006, 172: 569-577.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  59. Schmuths H, Meister A, Horres R, Bachmann K: Genome size variation among accessions of Arabidopsis thaliana. Ann Bot. 2004, 93: 317-321. 10.1093/aob/mch037.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  60. Lynch M: The Origins of Genome Architecture. 2007, Sunderland, MA: Sinauer Associates, Inc

    Google Scholar 

  61. Lynch M, Conery JS: The origins of genome complexity. Science. 2003, 302: 1401-1404. 10.1126/science.1089370.

    Article  CAS  PubMed  Google Scholar 

  62. Kidwell MG: Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002, 115: 49-63. 10.1023/A:1016072014259.

    Article  CAS  PubMed  Google Scholar 

  63. Raskina O, Barber JC, Nevo E, Belyayev A: Repetitive DNA and chromosomal rearrangements: speciation-related events in plant genomes. Cytogenet Genome Res. 2008, 120: 351-357. 10.1159/000121084.

    Article  CAS  PubMed  Google Scholar 

  64. Kolano B, Gardunia BW, Michalska M, Bonifacio A, Fairbanks D, Maughan PJ, Coleman CE, Stevens MR, Jellen EN, Maluszynska J: Chromosomal localization of two novel repetitive sequences isolated from the Chenopodium quinoa Willd. genome. Genome. 2011, 54: 710-717. 10.1139/g11-035.

    Article  CAS  PubMed  Google Scholar 

  65. Kubis S, Schmidt T, Heslop-Harrison JS: Repetitive DNA elements as a major component of plant genomes. Ann Bot. 1998, 82 (Suppl A): 45-55.

    Article  CAS  Google Scholar 

  66. Meyers B: A summary of Bruce Meyers’ Penstemon hybridizations. Bul Amer Penstemon Soc. 1998, 57: 2-11.

    Google Scholar 

  67. Friedt W, Snowdon RJ, Ordon F, Ahlemeyer J: Plant breeding: assessment of genetic diversity in crop plants and its exploitation in breeding. Prog Bot. 2007, 68: 151-178. 10.1007/978-3-540-36832-8_7.

    Article  Google Scholar 

  68. Wolfe AD, Elisens WJ: Diploid hybrid speciation in Penstemon (Scrophulariaceae) revisited. Amer J Bot. 1993, 80: 1082-1094. 10.2307/2445754.

    Article  Google Scholar 

  69. Wolfe AD, Elisens WJ: Nuclear ribosomal DNA restriction site variation in Penstemon section Peltanthera (Scrophulariaceae): an evaluation of diploid hybrid speciation and evidence for introgression. Amer J Bot. 1994, 81: 1627-1635. 10.2307/2445341.

    Article  Google Scholar 

  70. Wolfe AD, Elisens WJ: Evidence of chloroplast capture and pollen-mediated gene flow in Penstemon sect. Peltanthera (Scrophulariaceae). Syst Bot. 1995, 20: 395-412. 10.2307/2419800.

    Article  Google Scholar 

  71. Datwyler SL, Wolfe AD: Phylogenetic relationships and morphological evolution in Penstemon subg. Dasanthera (Veronicaceae). Syst Bot. 2004, 29: 165-176. 10.1600/036364404772974077.

    Article  Google Scholar 

  72. Wolfe AD, Xiang Q-Y, Kephart SR: Assessing hybridization in natural populations of Penstemon (Scrophulariaceae) using hypervariable intersimple sequence repeat (ISSR) bands. Mol Ecol. 1998, 7: 1107-1125. 10.1046/j.1365-294x.1998.00425.x.

    Article  CAS  PubMed  Google Scholar 

  73. Wolfe AD, Xiang Q-Y, Kephart SR: Diploid hybrid speciation in Penstemon (Scrophulariaceae). Proc Natl Acad Sci USA. 1998, 95: 5112-5115. 10.1073/pnas.95.9.5112.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  74. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE: A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011, 6: e19379-10.1371/journal.pone.0019379.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references


We acknowledge Shaun Broderick, a graduate student, Tiffany Austin, and Aaron King, undergraduates, for their laboratory assistance and Robert Byers a graduate student and Scott Yourstone, an undergraduate, for their bioinformatic assistance, all from Brigham Young University. This research was funded in part by an Annaley Naegle Redd Assistantship from the Brigham Young University Charles Redd Center for Western Studies and a Year-End Funding Grant from the Department of Plant and Wildlife Sciences, Brigham Young University.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mikel R Stevens.

Additional information

Competing interests

The authors declare no competing interests.

Authors’ contributions

Rhyan B Dockter, David B Elzinga, Brad Geary, P Jeff Maughan, Leigh A Johnson, Danika Tumbleson, JanaLynn Franke, Keri Dockter, and Mikel R Stevens. RBD preformed the GR-RSC technique and either carried out or oversaw the all other steps of the study and participated in all planning and design of all experiments as well as their analysis and did the initial drafting of the manuscript. DBE did or assisted in all bioinfomatics performed in this study. BG participated in the design of all aspects of the study as well as advised RBD and was involved in the editing and revising of the manuscript. PJM advised and assisted in the GR-RSC technique as well as advised RBD in relevant issues of the bioinfomatics of the study and was involved in the editing and revising of the manuscript. LAJ advised and assisted RBD and MRS in the taxonomy related issues of the study and was involved in the editing and revising of the manuscript. DT, JF, and KD carried out all aspects, including basic analysis, of the marker studies reported. MRS was the senior advisor of RBD and was intricately involved in all aspects of the study and the manuscript. All authors both read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Dockter, R.B., Elzinga, D.B., Geary, B. et al. Developing molecular tools and insights into the Penstemon genome using genomic reduction and next-generation sequencing. BMC Genet 14, 66 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Breeding domesticated Penstemon
  • Genome reduction
  • Homologous sequences
  • LTR retroelements
  • Plantaginaceae
  • Pyrosequencing
  • Repetitive elements