Skip to main content

Chromosome-scale assembly of the Verbenaceae species Queen’s Wreath (Petrea volubilis L.)

Abstract

Objectives

Petrea volubilis, a member of the Order Lamiales and the Verbenaceae family, is an important horticultural species that has been used in traditional folk medicine. To provide a genome sequence for comparative studies within the Order Lamiales that includes important families such as Lamiaceae (mints), we generated a long-read, chromosome-scale genome assembly of this species.

Data description

Using a total of 45.5 Gb of Pacific Biosciences long read sequence, we generated a 480.2 Mb assembly of P. volubilis, of which, 93% is chromosome anchored. Representation of genic regions was robust with 96.6% of the Benchmarking of Universal Single Copy Orthologs present in the genome assembly. A total of 57.8% of the genome was annotated as a repetitive sequence. Using a gene annotation pipeline that included refinement of gene models using transcript evidence, 30,982 high confidence genes were annotated. Access to the P. volubilis genome will facilitate evolutionary studies in the Lamiales, a key order of Asterids that includes significant crop and medicinal plant species.

Peer Review reports

Objective

The Asterid species, Petrea volubilis L., also known as Queen’s Wreath, Purple Wreath, Bluebird vine or Sandpiper vine, is a member of the Verbenaceae family within the Order Lamiales. As a perennial woody vine, P. volubilis is a key ornamental species due to its intense violet flowers. Historically, leaves of P. volubilis have been used in Mexico as folk medicine to remedy kidney stones, rheumatism, diarrhea, and urinary infections [1] and as an abortifacient in Jamaica [2]. P. volubilis extracts have been found to have antipyretic, analgesic, and anti-microbial [3, 4] and insecticidal activities [4]. Recently, P. volubilis was included as one of four outgroup species in a study that revealed the evolutionary basis of chemical diversity in the Lamiacaeae [5]. In this project, we sequenced and annotated the P. volubilis genome to facilitate our understanding of genome and chemodiversity evolution within the Lamiales.

Data description

High molecular weight DNA was isolated using a modified cetyl trimethylammonium bromide method (2% CTAB, 100 mM Tris, 1.4 M Sodium Chloride, 20 mM EDTA) [6] followed by RNase treatment and cleanup using the DNeasy PowerClean Pro Cleanup Kit (Qiagen). Pacific Biosciences (PacBio) SMRTbell Express Template libraries were constructed and sequenced on a PacBio Sequel instrument generating 45.5 Gb of total sequence (Table 1, Data file 1, Data sets 1 & 2, [7]). Reads less than 5 kb were filtered out and the remaining reads were assembled using Canu v1.8 [8] with the options: minOverlapLength = 2000 minReadLength = 5000 genomeSize = 450 m resulting in an initial assembly of 630.0 Mb with 6,515 contigs and an N50 contig length of 369,179 bp. The genome was polished with two rounds of GCpp (v1.9.0) [9], followed by three rounds of polishing with Pilon (v1.23) [10] using Illumina whole genome shotgun reads (Table 1, Data file 1, Data set 3, [7, 11]). A k-mer distribution plot using GenomeScope [12] revealed the genome was heterozygous (Table 1, Data file 2, Data set 3, [7]). Haplotigs were removed using two rounds of purge_dups using the default parameters (v1.0.0) [13, 14] and Hi-C libraries constructed by Phase Genomics (Table 1, Data file 1, Data sets 4 & 5, [7, 15, 16]) were used to place the final scaffolds into 17 chromosomes using the Juicer (v1.6)/3D-DNA pipeline (git commit: 529ccf4; Table 1, Data file 3) [7, 17, 18]. The final assembly size is 480.2 Mb (478.8 Mb ungapped, 93% chromosome-anchored), consistent with the size estimated by flow cytometry of 455 Mb per 1C [5] (Table 1, Data files 4 & 5, [7]). A comparison of k-mers in the Illumina whole genome shotgun reads vs the genome assembly using KAT (v2.4.1) [19] with a k-mer size of 21 revealed that P. volubilis is heterozygous (estimated heterozygosity rate 1.45%) and the assembly is near-complete (estimated completeness, 98.8%;(Table 1, Data file 6, [7]). The majority of k-mers in the reads are present in one copy indicating the haplotigs were successfully purged from the final assembly (Table 1, Data files 1 & 6, Data set 3, [7]). Assessment of representation of genic regions using the Benchmarking of Universal Single Copy Orthologs [20] (BUSCO; v5.4.3 with embryophyta_odb10) revealed 96.6% of the BUSCO genes present in the genome assembly (Table 1, Data file 7, [7]). While the scaffold N50 was 25.6 Mb, the contig N50 was 0.53 Mb due potentially to heterozygosity that reduced the ability of the assembler to generate longer contigs (Table 1, Data file 6, [7]; see Limitations).

Table 1 Overview of data files and data sets used in this study

The P. volubilis genome was annotated as described previously [29]. In brief, repetitive sequences were identified in the unscaffolded contigs using RepeatModeler (v2.0.1) [30] and protein-coding genes removed from the library using ProtExcluder (v1.2) [31]. The custom repetitive sequences were then added to the Repbase Viridiplantae repeats (v20150807) [32] and used to mask repeats using RepeatMasker (v4.1.0) [30] with the parameters -s -nolow -no_is -gff (Table 1, Data file 8, [7]); 57.8% of the genome was masked. RNA-seq reads from five libraries (Table 1, Data file 1, Data sets 6, 7, 8, 9, & 10, [7, 23,24,25,26,27]) were cleaned with Cutadapt (v2.9) [33] using a quality cutoff of 10 and a minimum length 100 nt and then aligned using HISAT2 (v2.2.0) [34] with a maximum intron length of 5000 bp. Gene predictions were generated with BRAKER2 (v2.1.5) [35] using the RNA-seq alignments as hints. Final gene models were refined using the RNA-seq transcript assemblies using two rounds of PASA2 (v2.4.1) [36, 37] and genome-guided transcript assemblies created from the RNA-seq alignments using Stringtie (v2.1.1) [38]. Gene models were annotated using alignments to the predicted Arabidopsis thaliana proteome, Pfam database, and transcript evidence as described previously [29]; a total of 49,169 high confidence models (30,982 genes) within the 56,052 working models (37,610 genes) were annotated (Table 1, Data file 9, [7]). High confidence models within the working model set were defined by either protein evidence (alignment to Arabidopsis or Pfam domain and/or expression evidence (TPM > 0). Representative models, both working and high confidence, were defined as the model for each locus (gene) with the longest CDS. BUSCO assessments (v5.4.3 and embryophyta_odb10) of the annotation revealed 89.9% and 88.5% of BUSCO genes in the working gene model and representative high confidence gene model set, respectively (Table 1, Data file 7, [7]). The final genome annotation was transferred from the scaffolds to the chromosomes using Liftoff (v1.6.3) [39] with the parameters -a 0.9 -s 0.95 -exclude_partial -cds -polish.

Limitations

Petrea volubilis is heterozygous and we purged haplotigs in the assembly process. This likely contributed to the reduced N50 contig size (0.53 Mb) and the slightly larger assembly size (480.2 Mb) compared to the estimated genome size from flow cytometry (445 Mb). However, based on BUSCO scores, a mere 4.3% of the orthologs were duplicated in the assembly suggestive that we removed the majority of alternative haplotigs. Future efforts using near-perfect long genomic reads such as PacBio HiFi or Oxford Nanopore Technologies Q20 + platforms would permit a haplotype-resolved genome assembly.

Availability of data and materials

All raw sequence data is available in the National Center for Biotechnology Information under BioProject ID PRJNA534065 (https://identifiers.org/bioproject:PRJNA534065;[11, 15, 16, 21,22,23,24,25,26,27]). The assembled genome is available in Genbank under the accession JAOWBU000000000 (https://identifiers.org/assembly:GCA_026212405.1; [28]) and in Figshare (https://doi.org/10.6084/m9.figshare.21429219.v3, [7]). A summary of data sets is available in Table 1 and are available on Figshare (https://doi.org/10.6084/m9.figshare.21429219.v3, [7]).

Abbreviations

BUSCO:

Benchmarking Universal Single Copy Orthologs

PacBio:

Pacific BioSciences

References

  1. Josabad Alonso-Castro A, Jose Maldonado-Miranda J, Zarate-Martinez A, Jacobo-Salcedo MDR, Fernández-Galicia C, Alejandro Figueroa-Zuñiga L, et al. Medicinal plants used in the Huasteca Potosina. México J Ethnopharmacol. 2012;143:292–8.

    Article  PubMed  Google Scholar 

  2. Mitchell SA, Ahmad MH. A review of medicinal plant research at the University of the West Indies, Jamaica, 1948–2001. West Indian Med J. 2006;55:243–69.

    Article  CAS  PubMed  Google Scholar 

  3. Abdelwahab M, Abdel-Lateff A, Fouad M, Desoukey S, Kamel M. Phytochemical and biological study of Petrea volubilis L. (Verbenaceae). Bull Pharm Sci. 2011;34:9–20.

  4. El-Hela AA, Al-Amier H, Craker LE. Phytochemical and Biological Investigation of Bluebird Vine (Petrea volubilis). Planta Med. 2009;75:P-56.

  5. Mint Evolutionary Genomics Consortium. Phylogenomic Mining of the Mints Reveals Multiple Mechanisms Contributing to the Evolution of Chemical Diversity in Lamiaceae. Mol Plant. 2018;11:1084–96.

    Article  Google Scholar 

  6. Doyle JJ, Doyle LJ. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull. 1987;19:11–5.

    Google Scholar 

  7. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Data files and Data sets for Hamilton et al. “Chromosome-scale assembly of the Verbenaceae species Queen’s Wreath (Petrea volubilis L.).” 2023. https://doi.org/10.6084/m9.figshare.21429219.v3.

    Book  Google Scholar 

  8. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. GCpp. 2022. https://github.com/PacificBiosciences/gcpp.

  10. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina whole genome shotgun reads, SRR11516645. Illumina whole genome shotgun reads, SRR11516645. 2023. https://identifiers.org/ncbi/insdc.sra:SRR11516645.

    Google Scholar 

  12. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. purge_dups. 2022. https://github.com/dfguan/purge_dups.

  14. Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36:2896–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina Hi-C DNA sequence reads, SRR15904679. Illumina Hi-C DNA sequence reads, SRR15904679. 2023. https://identifiers.org/ncbi/insdc.sra:SRR15904679.

    Google Scholar 

  16. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina Hi-C DNA sequence reads, SRR15904680. Illumina Hi-C DNA sequence reads, SRR15904680. 2023. https://identifiers.org/ncbi/insdc.sra:SRR15904680.

    Google Scholar 

  17. Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, Lander ES, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mapleson D, Garcia Accinelli G, Kettleborough G, Wright J, Clavijo BJ. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 2017;33:574–6.

    Article  CAS  PubMed  Google Scholar 

  20. Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, et al. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol Biol Evol. 2018;35:543–8.

    Article  CAS  PubMed  Google Scholar 

  21. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Pac Bio reads from high molecular weight DNA, SRR11516643. Pac Bio reads from high molecular weight DNA, SRR11516643. 2023. https://identifiers.org/ncbi/insdc.sra:SRR11516643.

    Google Scholar 

  22. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Pac Bio reads from high molecular weight DNA, SRR11516644. Pac Bio reads from high molecular weight DNA, SRR11516644. 2023. https://identifiers.org/ncbi/insdc.sra:SRR11516644.

    Google Scholar 

  23. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina RNA-Seq - Root, SRR8937863. Illumina RNA-Seq - Root, SRR8937863. 2023. https://identifiers.org/ncbi/insdc.sra:SRR8937863.

    Google Scholar 

  24. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina RNA-Seq - Petiole, SRR8937861. Illumina RNA-Seq - Petiole, SRR8937861. 2023. https://identifiers.org/ncbi/insdc.sra:SRR8937861.

    Google Scholar 

  25. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina RNA-Seq - Stem, SRR8937862. Illumina RNA-Seq - Stem, SRR8937862. 2023. https://identifiers.org/ncbi/insdc.sra:SRR8937862.

    Google Scholar 

  26. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina RNA-Seq - Immature leaf, SRR8937859. Illumina RNA-Seq - Immature leaf, SRR8937859. 2023. https://identifiers.org/ncbi/insdc.sra:SRR8937859.

    Google Scholar 

  27. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Illumina RNA-Seq - Mature leaf, SRR8937860. Illumina RNA-Seq - Mature leaf, SRR8937860. 2023. https://identifiers.org/ncbi/insdc.sra:SRR8937860.

    Google Scholar 

  28. Hamilton JP, Vaillancourt B, Wood JC, Buell CR. Chromosome-scale assembly of the Verbenaceae species Queen’s Wreath (Petrea volubilis L.) Genome Assembly. Petrea volubilis L. genome assembly. 2023. https://identifiers.org/assembly:GCA_026212405.1.

    Google Scholar 

  29. Pham GM, Hamilton JP, Wood JC, Burke JT, Zhao H, Vaillancourt B, et al. Construction of a chromosome-scale long-read reference genome assembly for potato. Gigascience. 2020;9:giaa100.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117:9451–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Campbell MS, Law M, Holt C, Stein JC, Moghe GD, Hufnagel DE, et al. MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 2014;164:513–24.

    Article  CAS  PubMed  Google Scholar 

  32. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–2.

    Article  Google Scholar 

  34. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. Whole-Genome Annotation with BRAKER. In: Kollmar M, editor. Gene Prediction: Methods and Protocols. Springer, New York: New York, NY; 2019. p. 65–95.

    Google Scholar 

  36. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–66.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Campbell MA, Haas BJ, Hamilton JP, Mount SM, Buell CR. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics. 2006;7:327.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019;20:278.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2020;37:1639–43.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We acknowledge the efforts of Dr. Dongyan Zhao in preliminary genome assembly efforts of the genome. We acknowledge the sequencing performed at the Michigan State University Research Technology Support Facility and the University of Georgia Genomics and Bioinformatics Core. We thank Pamela and Doug Soltis of the University of Florida for providing a Petrea volubilis plant.

Funding

Funding for this work was provided via grants to CRB from the National Science Foundation (IOS-1444499), the Georgia Research Alliance, and the University of Georgia. The funders had no role in the design, execution, interpretation, or written summary of this study. 

Author information

Authors and Affiliations

Authors

Contributions

B.V. and J.C.W. generated sequence, performed quality assessments, and performed data management. J.P.H. assembled and annotated the genome. J.P.H. and C.R.B. wrote the manuscript. C.R.B. conceived of the study and obtained project funding. All authors approved the manuscript.

Corresponding authors

Correspondence to John P. Hamilton or C. Robin Buell.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Data file 1.

Petrea volubilis libraries used in this study. Data file 2. Genomescope k-mer frequency distribution plot. Data file 3. Hi-C contact map. Data file 4. Assembly metrics for the Petrea volubilis assembly. Data file 5. Pseudomolecule lengths and gap content for the Petrea volubulis assembly. Data file 6. KAT k-mer comparison plot. Data file 7. Benchmarking universal single copy orthologs (BUSCO) results on the Petrea volubilis assembly and annotation. Data file 8. Repetitive sequence content in the Petrea volubilis assembly. Data file 9. Petrea volubilis gene annotation summary.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamilton, J.P., Vaillancourt, B., Wood, J.C. et al. Chromosome-scale assembly of the Verbenaceae species Queen’s Wreath (Petrea volubilis L.). BMC Genom Data 24, 14 (2023). https://doi.org/10.1186/s12863-023-01110-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12863-023-01110-z

Keywords