Skip to main content
Fig. 4 | BMC Genomic Data

Fig. 4

From: A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data

Fig. 4

Comparison of database classification accuracy at the species level according to different dereplication and amplicon restriction settings. Prediction accuracies are presented as F-measures for the ITS2 (A) and rbcL (B) databases developed using DB4Q2. Accuracy scores were computed by carrying out CV tests in pseudo-realistic (k-fold) and ideal (leaked) situations. No_derep: without sequence dereplication; Derep_uniq: dereplication in ‘uniq’ mode, i.e. where identical sequences displaying different taxonomies are all conserved with their respective taxonomic labels; derep_majority: dereplication in ‘majority’ mode, i.e. where only one sequence is retained from identical sequences displaying different taxonomies, together with the most abundant taxonomic label associated with these sequences; Restriction: database amplicon restriction by extracting from reference sequences the portion amplified by a specific primer set. The dereplication in ‘majority’ mode has been tested here but is not advised nor proposed in the DB4Q2 workflow, at least for rbcL, as it can lead to a higher proportion of mislabeled sequences after dereplication

Back to article page