High-quality genome assembly and annotation of five bacteria isolated from the Abu Dhabi sabkha-shore region

Objectives Sabkhas represent polyextreme environments characterized by elevated salinity levels, intense ultraviolet (UV) radiation exposure, and extreme temperature fluctuations. In this study, we present the complete genomes of five bacterial isolates isolated from the sabkha-shore region and investigate their genomic organization and gene annotations. A better understanding of the bacterial genomic organization and genetic adaptations of these bacteria holds promise for engineering microbes with tailored functionalities for diverse industrial and agricultural applications, including bioremediation and promotion of plant growth under salinity stress conditions. Data description We present a comprehensive genome sequencing and annotation of five bacteria (kcgeb_sa, kcgeb_sc, kcgeb_sd, kcgeb_S4, and kcgeb_S11) obtained from the shores of the Abu Dhabi Sabkha region. Initial bacterial identification was conducted through 16 S rDNA amplification and sequencing. Employing a hybrid genome assembly technique combining Illumina short reads (NovaSeq 6000) and Oxford Nanopore long reads (MinION), we obtained complete annotated high-quality gap-free genome sequences. The genome sizes of the kcgeb_sa, kcgeb_sc, kcgeb_sd, kcgeb_S4, and kcgeb_S11 isolates were determined to be 2.4 Mb, 4.1 Mb, 2.9 Mb, 5.05 Mb, and 4.1 Mb, respectively. Our analysis conclusively assigned the bacterial isolates as Staphylococcus capitis (kcgeb_sa), Bacillus spizizenii (kcgeb_sc and kcgeb_S11), Pelagerythrobacter marensis (kcgeb_sd), and Priestia aryabhattai (kcgeb_S4).


Objective
Sabkhas, also known as salt flats, represent polyextreme environments with high temperatures, salinities, and light intensities and are distributed globally in arid regions of the Middle East, North Africa, the USA, and Australia.Sabkhas pose a challenging environment for the survival of plants, animals, and various organisms due to their extreme conditions [1,2].Despite the harsh environmental conditions, these salt flats host remarkably robust and diverse microbial communities that are highly adaptable and metabolically diverse and have excellent abiotic stress resilience [3][4][5].
Previously, our unprecedented research effort cataloged the rich microbial diversity and distribution dynamics of the Abu Dhabi sabkha region using a combination of 16 S rDNA profiling and whole genome metagenomic approaches [6].However, there is a paucity of highquality complete genome sequences of bacteria isolated from the Abu Dhabi sabkha region.Consequently, in this study, we present complete genome sequences and gene annotations for five bacterial isolates isolated from the Abu Dhabi sabkha-shore region that exhibit higher salt tolerance.The genomic resources and datasets generated in this study will serve as a valuable repository for exploring genes and pathways associated with abiotic stress tolerance as well as understanding the mechanisms that bacteria use to survive in extreme environments.Nevertheless, the information gleaned from these bacterial species could be exploited for comparative genomics research programs and pave the way for engineering microbes endowed with high plant growth promotion activity for enhanced performance under high salt-stress conditions, opening up new avenues for sustainable agriculture for feeding burgeoning population.

Methodology
The five bacterial isolates used for whole-genome sequencing (WGS) were isolated from soil samples collected from the Abu Dhabi sabkha-shore region.Details on the systematic sample collection, bacterial culture strategy, and storage procedure are described in our previously published report [6].A snapshot of our data analysis workflow is presented in Table 1 (Data file 1).
For WGS, shotgun and long-read libraries were prepared as previously described [7] and sequenced on an Illumina NovaSeq 6000 (PE reads, 150 bp) and Min-ION, respectively.The genome sequencing read statistics generated for each isolate are summarized in Data file 2 (Table 1).Trimmomatic v.0.39 [8] was used to trim lowquality bases and adapters from the raw Illumina reads, whereas ONT-MinION reads were error corrected and trimmed using the CANU program [9].A hybrid genome assembly was used to assemble whole genomes of bacteria using Unicycler pipeline [10].The assembled genomes were polished with Illumina and ONT reads using Pilon v. 1.23 [11].Plausible plasmid sequences were extracted from the genome assembly using a homology-based approach.In addition, the assembled sample species were confirmed based on the average nucleotide identity (ANI) method [12].The gene predictions and annotations of the assembled genomes were performed using the Prokka/ NCBI-PGAP tools [13,14].
Our hybrid assembly strategy produced a gap-free, high-quality single circular genome for all five bacterial isolates.The kcgeb_sa isolate identified as Staphylococcus capitis had a genome size of 2,471,401 bp (G + C: ~33.1%), a BUSCO score of 100% and 2484 genes including 2340 protein-coding, 63 tRNA, 22 rRNA, and 5 ncRNA genes and two plasmids of 47,919 bp and 3530 bp (Table 1, Data files 3, 4, 5, 6 and 7).
The isolate kcgeb_sc was identified as Bacillus spizizenii with a genome size of 4,130,445 bp and a G + C percentage of ~ 43.9%, a BUSCO score of 100% and 4179 gene models, including 3963 protein-coding, 86 tRNA, 30 rRNA, and 5 ncRNA genes (Table 1, Data files 8, 9 and 10).

Limitations
We used a hybrid genome assembly method with highcoverage WGS data (both long and short reads) to produce a gap-free, high-quality single circular genome from all the bacterial isolates.In addition, we used Illumina and ONT-MinION reads to error-correct and polish the assembled genomes, and the Benchmarking Universal Single-Copy Orthologs (BUSCO) v.4.1.4[35] tool was used to assess the completeness of the final genome assemblies, which confirmed genome assembly completeness.As a result, the authors are unaware of any limitations in their genome assembly and annotation approaches.
Nevertheless, this data note focuses on the description and annotation of high-quality genomes of five bacteria isolated from the Abu Dhabi sabkha-shore region.More in-depth research is needed to understand the phylogenetics, gene functions, and metabolic pathways, as well as the distinct biosynthetic gene clusters associated with these bacterial isolates that allow them to survive in harsh environments.

Table 1
Overview of the data files/datasets