Sample collection, DNA extraction and Molecular drug resistance assay
Random sputum samples (n = 10) that were collected in the month of July, 2019 from patients who were referred for diagnosis at the Centre of Tuberculosis Research (CTBR), a TB reference laboratory in Lagos State. Patients who had severe TB symptoms and tested positive with Xpert®MTB/RIF (Cephid, USA) to detect the presence of MTBC and resistance to RIF were recruited for this study. These samples were further subjected to DNA extraction using the GenoLyse kit (Hain Lifescience GmbH, Germany). The genomic bacterial DNA further underwent Line Probe Assay (LPA) using GenoType MTBDRplus version 2.0 (Hain Lifescience GmbH, Germany) to detect samples resistant to RIF and INH. Second-line drug resistance molecular test was done using GenoType MTBDRsl VER 2.0 (Hain Lifescience GmbH, Germany) strip-based assay. Samples of interest were cultured on Lowenstein Jensen (LJ) media and Middlebrook 7H9 broth using BACTEC™ MGIT machine (BD, Erembodegem, Belgium) and tested using proportion method for phenotypic drug susceptibility with first and second line drugs.
These molecular processes were carried out at CTBR, Microbiology Department, Nigerian Institute of Medical Research (NIMR), Lagos.
Library preparation and whole genome sequencing
DNA extracts were quantified using Qubit fluorometer (ThermoFisher Scientific) using the dsDNA High sensitivity assay and diluted to 0.2 ng/ul. Sequencing libraries were prepared using Nextera XT DNA preparation kit (Illumina, USA) and libraries were quantified with Qubit fluorometer and the library length was estimated using the Agilent 2100 Bioanalyzer (Agilent Technologies, USA). Library preparation protocol was adopted from the US CDC PulseNet SOP24. Individual libraries were loaded on the Illumina iSeq 100 (Illumina, USA) with 151 cycles to generate paired end reads using the manufacturer’s instructions (Illumina, USA). TB genome sequencing was done at the African Centre of Excellence for Genomics of Infectious Diseases (ACEGID), Redeemer’s University, Nigeria.
Bioinformatics analysis
The paired-end reads were demultiplexed using the GenerateFASTQ module version 2.0 on the iSeq 100 Software System Suite local run manager. 230 MTBC genomes were retrieved from NCBI SRA using sra-toolkit v2.10.9 (https://github.com/ncbi/sra-tools) with paired-end reads and Illumina platform as the selection criteria from the following BioProjects: PRJEB15857, PRJEB25506, PRJNA300846, PRJEB28842, PRJNA633244, PRJNA480117, PRJEB27244, PRJEB36076, PRJNA534674 and, PRJNA655747.
All reads were processed with the Bacterial Genome Pipeline (BAGEP)25, which does quality control on raw reads, taxonomic classification and variants detection by mapping reads to the reconstructed ancestral MTBC sequence26 with minimum base quality set at 20, minimum site depth for calling alleles set at 10 (default value) and also excluding duplicate reads. Average coverage of the genomes were calculated using SAMtools v1.1227 and genomes that are ≤ 20× was excluded from further downstream analysis.
Drug resistance prediction and lineage classification
TBProfiler v3.0.423 was used to analyze all 222 genomes in-silico to infer phylogenetic lineages and drug resistance profiles using small variants and big deletions associated with drug resistance that are present in a robust library of recently updated mutation database which contains new anti-TB drugs such as cycloserine, delamanid, bedaquiline, clofazimine, para-aminosalicylic acid, linezolid and ethionamide [commit b2af444 on 21st December 2020 (https://github.com/jodyphelan/tbdb)] using freebayes28 as the variant calling option. In order to validate the results, Mykrobe predictor29 and Resistance Sniffer30 was used to analyze the newly sequenced Nigerian samples for comparison.
Phylogenetic reconstruction
Outputs from all the variant call analysis with Snippy were combined to produce core genome alignment of all 223 samples (including Mycobacterium canettii NC_015848.1) were generated and aligned using Snippy-core (https://github.com/tseemann/snippy) by excluding regions such as PPE, PE-PGRS, insertion sequences, phages, repeat sequences and, regions that are at least 50 bp long. The core genome alignment containing polymorphisms was used to infer a maximum-likelihood phylogenetic tree using IQTREE31 v1.6.12 general time reversible model (GTR) and ultrafast bootstrap value of 10,000. The phylogenetic tree was rooted to M. canettii and annotated with Interactive tree of life (iTOL) v532.
Ethical approval
The study was performed in accordance with the Declaration of Helsinki, and the protocol was approved by the ethical review boards of Redeemer’s University (Osun State, Nigeria) (RUN-IREC/19/012) and Nigerian Institute of Medical Research (Lagos State, Nigeria) (IRB/19/053). Sputum samples used for this study were obtained under a waiver of consent granted by the Institutional Review Board of the Nigerian Institute of Medical Research (ref: IRB/19/053).

