Skip to main content

Genome assembly of Erythrophleum Fordii, a special “ironwood” tree in China

Abstract

Objectives

Erythrophleum is a genus in the Fabaceae family. The genus contains only about 10 species, and it is best known for its hardwood and medical properties worldwide. Erythrophleum fordii Oliv. is the only species of this genus distributed in China. It has superior wood and can be used in folk medicine, which leads to its overexploitation in the wild. For its effective conservation and elucidation of the distinctive genetic traits of wood formation and medical components, we present its first genome assembly.

Data description

This work generated ~ 160.8 Gb raw Nanopore whole genome sequencing (WGS) long reads, ~ 126.0 Gb raw MGI WGS short reads and ~ 29.0 Gb raw RNA-seq reads using E. fordii leaf tissues. The de novo assembly contained 864,825,911 bp in the E. fordii genome, with 59 contigs and a contig N50 of 30,830,834 bp. Benchmarking Universal Single-Copy Orthologs (BUSCO) revealed 98.7% completeness of the assembly. The assembly contained 471,006,885 bp (54.4%) repetitive sequences and 28,761 genes that coded for 33,803 proteins. The protein sequences were functionally annotated against multiple databases, facilitating comparative genomic analysis.

Peer Review reports

Objective

Erythrophleum is a genus in the Fabaceae family and contains only about 10 acceptable species in total [1]. However, these species are widely distributed throughout the world, with six found in Africa, three in Asia and one in Australia, displaying a clearly disjunct distribution pattern. All Erythrophleum species grow as medium-sized or large trees, up to tens of metres [2,3,4]. Erythrophleum species have high-quality wood that is hard, dense, heavy and tough and contains a variety of secondary metabolites (e.g. alkaloids, terpenoids and flavonoids) in different parts (leaf, bark, stem or seed), which are valuable for the treatment of many illnesses [1, 4,5,6,7]. Therefore, Erythrophleum species are threatened due to their hardwood and/or biomedical properties in different distribution areas [2,3,4, 6, 7]. In addition to timber and medicinal uses, Erythrophleum species can be used as ornamental and agroforestry trees [8, 9].

Erythrophleum fordii Oliv. is the only species of this genus distributed in China [10]. Except for China, E. fordii is also found in Vietnam. In both countries, it is best known for its superior wood, which has a highly condensed lignin structure, leading to its hardness, heaviness and durableness [11]. Erythrophleum fordii is also a medicinal plant containing various bioactive components [1, 12,13,14] and a high alkaloid content [1]. Some triterpenoids in E. fordii are species specific [1]. Due to its high economic value, it has been overexploited in history in both China and Vietnam, making it endangered in the wild [3, 10, 11]. For endangered species, contiguous, accurate and annotated genome assemblies greatly enhance their conservation [15]. Therefore, we present here the first fully annotated E. fordii genome for its effective conservation in the future. The genome will also help elucidate distinctive genetic traits related to wood formation and secondary metabolites in E. fordii, aiding in the molecular breeding of trees.

Data description

Leaf samples from one E. fordii individual planted in the South China Botanical Garden were collected. After total RNA and genomic DNA were extracted from the samples, three sequencing libraries were conducted for the whole genomic and transcriptomic sequencing. The Nanopore PromethION sequencer was used for long-read whole genomic sequencing (WGS), and the MGI DNBSEQ-T7 sequencer for short-read WGS and RNA-seq under 150 bp paired-end mode. After sequencing, different programmes were performed for analysis and default parameters were used unless otherwise mentioned.

Sickle v1.33 [16] was used to trim the WGS short reads with the parameter “-q 30 -l 80”. The trimmed reads were used to estimate the E. fordii genome size with KmerGenie v1.7044 [17] using the parameter of “-k 141 --diploid”. Porchop v0.2.4 [18] was used to trim the adapters for WGS long reads with the parameter “--check_reads 500000”. The reads were then filtered by ontbc v1.1 [19] with the parameters of “-min_score 7 -min_length 10000.” The filtered long reads were used to assemble the assembly using NextDenovo v2.3.1 [20]. Pseudohaploid [21] and Purge_Dups v1.2.6 [22] were used to remove duplicated sequences in the assembly. The assembly was further polished by racon v1.5.0 [23], hapo-G v1.3.2 [24] and polypolish v0.5.0 [25]. In the steps using racon and hapo-G, they were each run for two rounds. The completeness of the assembly was assessed by BUSCO v5.4.6 [26] using the Eudicots odb10-2020-09-10 database.

The assembly was parsed through RED v2.0 [27] and EDTA v2.1.0 [28] to identify repeat sequences, and the repeat regions were subsequently soft-masked. The genes were first predicted by braker v.2.0 [29] using both transcriptome data and reference protein sequences (Data file 1) [30]. The braker results were then integrated into Funannotate pipeline v1.8.16 [31] to obtain the non-redundant gene set. The performance of Funannotate gene prediction included three steps: “train”, “predict” and “update”. In each step, the parameter of “--max_intronlen 1000000” was used. In the “predict” step, additional parameters of “--busco_seed_species arabidopsis --organism other --busco_db embryophyta” were used. The predicted genes were functionally annotated against multiple databases using the “funannotate annotate” command in the Funannotate pipeline.

Three sequencing libraries produced ~ 126.0 Gb raw data for WGS short read sequencing (Data file 2) [32], ~ 160.8 Gb for WGS long read sequencing (Data file 3–7) [33,34,35,36,37] and ~ 29.0 Gb for RNA-seq (Data file 8) [38]. The estimated genome size by KmerGenie was 853,550,132 bp. The genome assembly measured 864,825,911 bp with 59 contigs (N50 = 30,830,834 bp) (Data file 9) [39] and a BUSCO completeness of 98.7% (Data file 10) [40]. Repeat prediction by RED and EDTA identified 376,075,788 bp (43.5%) (Data file 11) [41] and 417,133,422 repetitive sequences (48.2%) (Data file 12) [42], respectively. Their combination was 471,006,885 bp, accounting for 54.4% of the genome (data file 13) [43]. A total of 28,761 genes that coded for 33,803 proteins were predicted (Data files 14–16) [44,45,46] and their annotation was shown in Data files 17 and 18 [47, 48].

Limitations

The continuousness of the assembled genome could be further improved using ultra-long Nanopore sequencing and Hi-C data.

Table 1 Overview of all data files/data sets

Data Availability

Raw sequenced reads have been uploaded to the NCBI Sequence Read Archive under accession number SRR26105794 for short-WGS sequencing reads [32], SRR26143820, SRR26143821, SRR26143822, SRR26152992 and SRR26152993 for long-WGS reads [33,34,35,36,37], SRR26075053 for RNA-seq reads [38], and JAVQMF000000000 for the assembled genome [39]. Please further see Table 1 for details and references [30, 41,42,43,44,45,46,47,48] of the results of the annotations submitted to figshare.

References

  1. Son NT. Genus Erythrophleum: Botanical description, traditional use, phytochemistry and pharmacology. Phytochem Rev. 2019;18:571–99. https://doi.org/10.1007/s11101-019-09640-0.

    Article  CAS  Google Scholar 

  2. Cook GD, Taylor RJ, Williams RJ, Banks JCG. Sustainable harvest rates of ironwood, Erythrophleum chlorostachys, in the Northern Territory, Australia. Aust J Bot. 2005;53(8):821–6. https://doi.org/10.1071/BT05003.

    Article  Google Scholar 

  3. Wang ZF, Liu HL, Dai SP, Cao HL, Wang RJ, Wang ZM. Endangered but genetically stable—Erythrophleum fordii within Feng Shui woodlands in suburbanized villages. Ecol Evol. 2019;9:10950–63. https://doi.org/10.1002/ece3.5513.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Rufai SO, Olaniyi MB, Lawal IO, Iroko OA, Olaniyi AA. Growth response of Erythrophleum suaveolens (Gill and Perr.) Brenan as influenced by different organic manures. J for Res Manag. 2021;18(2):60–70.

    Google Scholar 

  5. Okhale SE, Ugbabe GE, Abubakar I, Mohammed SB, Egharevba HO, Adamu A, Ibrahim JA, Kunle OF. Chemical composition and antimicrobial activity of the leaf essential oil of Erythrophleum suaveolens Guill. and Perr. (Brenan) (Family: Fabaceae/Caesalpinioideae). Int. J. Modern Pharm. Res. 2018;2(4):8–12. https://doi.org/10.1080/14786419.2012.696252.

  6. Muvatsi P, Kahindo J-M, Snook L-K. Can the production of wild forest foods be sustained in timber concessions? Logging and the availability of edible caterpillars hosted by sapelli (Entandrophragma Cylindricum) and tali (Erythrophleum suaveolens) trees in the Democratic Republic of Congo. For Ecol Manag. 2018;410:56–65. https://doi.org/10.1016/j.foreco.2017.12.028.

    Article  Google Scholar 

  7. Miapia LM, Ariza-Mateos D, Lacerda-Quartín V, Palacios-Rodríguez G. Deforestation and Biomass production in Miombo forest in Huambo (Angola): a balance between local and global needs. Forests. 2021;12:11. https://doi.org/10.3390/f12111557.

    Article  Google Scholar 

  8. Zhao Z, Guo J, Sha E, Lin K, Zeng J, Xu J. Geographic distribution and phenotypic variation of fruit and seed of Erythrophleum Fordii in China. Chin Bull Bot. 2009;44(3):338–44. https://doi.org/10.3969/j.issn.1674-3466.2009.03.011.

    Article  Google Scholar 

  9. Gorel AP, Fayolle A, Doucet JL. Ecology and management of the multipurpose Erythrophleum species (Fabaceae-Caesalpinioideae) in Africa. A review. Biotechnol Agron Soc Environ. 2015;19(4):415–29.

  10. Huang S, Wu W, Chen Z, Zhu Q, Ng WL, Zhou Q. Characterization of the chloroplast genome of Erythrophleum Fordii (Fabaceae). Conserv Genet Resour. 2019;11:165–7. https://doi.org/10.1007/s12686-018-0990-7.

    Article  Google Scholar 

  11. Nguyen TD, Nishimura H, Imai T, Watanabe T, Kohdzuma Y, Sugiyama J. Natural durability of the culturally and historically important timber: Erythrophleum Fordii wood against white-rot fungi. J Wood Sci. 2018;64:301–10. https://doi.org/10.1007/s10086-018-1704-1.

    Article  CAS  Google Scholar 

  12. Li L, Chen L, Li Y, Sun S, Ma S, Li Y, Qu J. Cassane and nor-cassane diterpenoids from the roots of Erythrophleum Fordii. Phytochemistry. 2020;174:112343. https://doi.org/10.1016/j.phytochem.2020.112343.

    Article  CAS  PubMed  Google Scholar 

  13. Vo PHT, Nguyen TDT, Tran HT, Nguyen YN, Doan MT, Nguyen PH, Lien GTK, To DC, Tran MH. Cytotoxic components from the leaves of Erythrophleum fordii induce human acute Leukemia cell apoptosis through caspase 3 activation and PARP cleavage. Bioorg Med Chem Lett. 2021;31:127673. https://doi.org/10.1016/j.bmcl.2020.127673.

    Article  CAS  PubMed  Google Scholar 

  14. Chen Z, Mou Y, Zhong H, Xu J, Zhang X, Li G, He J, Zhang W, Huang W, Tian H. Cassaine diterpenoids from the seeds of Erythrophleum Fordii Oliv. And their antiangiogenic activity. Phytochemistry. 2022;203:113399. https://doi.org/10.1016/j.fitote.2018.02.028.

    Article  CAS  PubMed  Google Scholar 

  15. European Reference Genome Atlas (ERGA) Consortium. The era of reference genomes in conservation genomics. Trends Ecol Evol. 2022;37(3):197–202. https://doi.org/10.1016/j.tree.2021.11.008.

    Article  CAS  Google Scholar 

  16. Joshi NA, Fass JN, Sickle. A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. (2011) Available at: https://github.com/najoshi/sickle. Accessed 24 Aug 2022.

  17. Chikhi R, Medvedev P. Informed and automated k-mer size selection for genome assembly. Bioinformatics. 2014;30:31–7. https://doi.org/10.1093/bioinformatics/btt310.

    Article  CAS  PubMed  Google Scholar 

  18. Porchop v0.2.4. Available at: https://github.com/rrwick/Porechop. Accessed 4 November 2022.

  19. Ontbc v1.1. : Pipeline for oxford nanopore barcoding. Available at: https://github.com/FlyPythons/ontbc. Accessed 26 Aug 2022.

  20. NextDenovo v2. 3.1: Fast and accurate de novo assembler for long reads. Available at: https://github.com/Nextomics/NextDenovo. Accessed 24 January 2023.

  21. Pseudohaploid. Create a pseudohaploid assembly from a partially resolved diploid assembly. Available at:https://github.com/schatzlab/pseudohaploid. Accessed 26 January 2023.

  22. Guan DF, McCarthy SA, Wood J, Howe K, Wang YD. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36:2896–8. https://doi.org/10.1093/bioinformatics/btaa025.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737–46. https://doi.org/10.1101/gr.214270.116.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Aury JM, Istace B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genom Bioinform. 2021;3(2):lqab034. https://doi.org/10.1093/nargab/lqab034.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Wick RR, Holt KE, Polypolish. Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol. 2022;18(1):e1009802. https://doi.org/10.1371/journal.pcbi.1009802.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Seppey M, Manni M, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness. Methods Mol Biol. 2019;1962:227–45. https://doi.org/10.1007/978-1-4939-9173-0_14.

    Article  CAS  PubMed  Google Scholar 

  27. Girgis HZ. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform. 2015;16:227. https://doi.org/10.1186/s12859-015-0654-5.

    Article  Google Scholar 

  28. Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. https://doi.org/10.1186/s13059-019-1905-y.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Bruna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. (2021). BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP + and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 2021;3(1):lqaa108. https://doi.org/10.1093/nargab/lqaa108.

  30. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24303265.v1.

  31. Palmer J, Funannotate. Eukaryotic Genome Annotation Pipeline. Available at:https://github.com/nextgenusfs/funannotate. Accessed 20 Sep 2022.

  32. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra:SRR26105794.

  33. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra: SRR26143820.

  34. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra: SRR26143821.

  35. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra: SRR26143822.

  36. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra: SRR26152992.

  37. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra: SRR26152993.

  38. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Sequence Read Archive. 2023. https://identifiers.org/ncbi/insdc.sra: SRR26075053.

  39. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum fordii, a special“ironwood” tree in China. NCBI Nucleotide. 2023. https://identifiers.org/nucleotide: JAVQMF000000000.1.

  40. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24303397.v1.

  41. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24304657.v1.

  42. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24303487.v1.

  43. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24305008.v1.

  44. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24305032.v1.

  45. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24305245.v1.

  46. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24305251.v1.

  47. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24305284.v1.

  48. Wen C-Y, Lian J-Y, Peng W-X, Wang Z-F, Yang Z-G, Cao H-L. Genome assembly of Erythrophleum Fordii, a specialironwood tree in China. Figshare. 2023. https://doi.org/10.6084/m9.figshare.24305290.v1.

Download references

Acknowledgements

We thank the reviewers for their time, expertise, and helpful suggestions to improve our manuscript.

Funding

The study is supported by the Key-Area Research and Development Program of Guangdong Province (2022B1111230001) and its sub-project (2022B1111230001-2-5); The Project of Department of Natural Resources of Guangdong Province: Monitoring and Evaluation of Nature Reserves in Guangdong Province; Guangdong Provincial Forestry Bureau Project — Planning of the Provincial Plant Ex Situ Protection System and National Key Protected Plant Ex Situ Protection and Propagation.

Author information

Authors and Affiliations

Authors

Contributions

C-Y W collected the samples, analyzed the data, and wrote the manuscript. J-Y L generated the sequencing data and wrote the manuscript. W-X P collected the samples. Z-F W collected the samples, analyzed the data and wrote the manuscript. Z-G Y and H-L C conceived and designed the project. All of the authors have read and approved the final version of this manuscript.

Corresponding authors

Correspondence to Zheng-Feng Wang or Hong-Lin Cao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, CY., Lian, JY., Peng, WX. et al. Genome assembly of Erythrophleum Fordii, a special “ironwood” tree in China. BMC Genom Data 24, 73 (2023). https://doi.org/10.1186/s12863-023-01176-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12863-023-01176-9

Keywords