Sequencing performance of precision ID NGS STR panel v2
Quality control parameters such as Locus balance (LB), Heterozygous balance (HB) and Stutter ratio of the 31 autosomal STR markers have been mentioned in Fig. 2. Out of all the STR markers, D4S2408 showed the most perfect average LB value (0.992) whereas, D16S539 showed greatest deviation from the ideal LB value (1.0), with an average value of 1.925. Other markers which showed a greater deviation from the ideal LB value included D18S51 (0.394), D2S1338 (0.411), D3S1358 (1.513), FGA (0.371), Penta D (0.167), Penta E (0.371), TH01 (1.708), and TPOX (1.579). With an ideal value of 1.0, STR markers showed HB value in the range of 1.031 (D8S1179) and 1.722 (TH01). Out of 31 STR markers tested, relatively higher heterozygous imbalance was observed in the D12ATA63 (1.396), D19S433 (1.307), D1S1656 (1.376), D22S1045 (1.325), and TH01 (1.722). None of the markers showed a deviation for the threshold set for the stutter ratio i.e., 1.4. The occurrence of the stutter products was observed to be highest in the number for D1S1656 and null stutter product was observed for D3S4529. The average value of stutter ratio ranged from 0.104 (D16S539) to 0.127 (D6S474). As the use of NGS technology is still at its nascent stage in the forensic DNA applications, quality issues of some STR markers need to be addressed by the kit manufacturers prior to their efficient use in routine forensic casework.


Sequence performance of Precision ID NGS STR panel v2. (a) Locus Balance for all STRs measured as coverage of each locus divided by average coverage of all locus per sample. (b) Heterozygous Balance for all STRs measured as coverage ratio of one allele to another, for heterozygous genotypes only. (c) Stutter ratio measured as ratio of coverage of stutter peak and allele peak.
Concordance study, allele frequency, forensic and paternity parameters
Out of 31 autosomal STR markers viz. CSF1PO, D10S1248, D12ATA63, D12S391, D13S317, D14S1434, D16S539, D18S51, D19S433, D1S1656, D1S1677, D21S11, D22S1045, D2S1338, D2S1776, D2S441, D3S1358, D3S4529, D4S2408, D5S2800, D5S818, D6S1043, D6S474, D7S820, D8S1179, FGA, Penta D, Penta E, TH01, TPOX and vWA analyzed in this study; 22 overlapped STRs were compared with the length-based allele data obtained by the CE analysis. For all the samples, the length-based allele data was found to be consistent irrespective of the CE analysis or NGS data. To the best of our knowledge, this is the first report wherein sequence-based analysis of the 31 STR markers has been carried out on studied markers in any Indian population. Besides, this is also the first allelic report on nine STR markers i.e., D12ATA63, D14S1434, D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S1043, and D6S474 in the Indian population. The calculated length-based allele frequency values are given in the Supplementary Table S1. Forensic and paternity parameters of the length-based and sequence-based alleles have been provided in Table 1. The average total allele number of all the genetic markers was calculated as 9.26 and the highest number of size-based alleles (18) was observed on marker Penta E, whereas, D1S1677, D4S2408, and D6S474 showed the lowest number of alleles i.e., 6 (Fig. 3). The newly analyzed markers i.e., D12ATA63, D14S1434, D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S1043, and D6S474 generated a total allele number of 8, 7, 6, 8, 7, 6, 8, 11, and 6 respectively. Besides, Penta E showed the highest power of discrimination (0.978), polymorphic information content (0.90), Expected Heterozygosity (0.905) value, and the lowest matching probability (0.022), whereas, FGA showed the highest value for Power of Exclusion (0.778), Typical Paternity index (4.60) and observed heterozygosity (0.891). These findings suggested the usefulness of Penta E and FGA marker in the central Indian population based on the length-based analysis of alleles. D2S441 showed its least usefulness in the terms of polymorphic information content (0.64), power of exclusion (0.329), typical paternity index (1.35), observed and expected heterozygosity (0.630 and 0.690). Similarly, the calculated power of discrimination (0.855) and matching probability (0.145) values did not advocate the usefulness of the D5S818 marker in the studied population. On the contrary, when sequence-based forensic and paternity parameters were calculated in 31 autosomal STR markers, D2S1338 emerged to be the most useful marker in the studied population with the highest values of power of discrimination (0.984), polymorphic information content (0.920), power of exclusion (0.822), and typical paternity index (5.75), and the lowest matching probability (0.016). This suggested that the individual markers should be assessed on the basis of sequence-based alleles to get a clear idea on their usefulness in a specific population.


Allele gains by sequences at 31 autosomal STR markers due to SNPs in flanking regions and isometric heterozygous conditions.
The previous studies also suggested the utility of the Penta E marker with higher forensic and paternity parameters in the Indian population16,17,18. This marker has already been established with high forensic efficiency for its effective use in the personal identification in the Portuguese population19, Austrian Caucasian population20, Northern Italy population21 and Mexican population22. When the newly inducted STR markers i.e., D12ATA63, D14S1434, D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S1043, and D6S474 were analyzed, they showed a similar allelic range and other statistical parameters in the limited published literature from Inner Mongolia, China23, Tujia population24.
Out of 81 male samples, four samples were found to be of AMELY deletion cases; where, AMELY could not be amplified, but a positive amplification was present in three alternative sex-determining markers i.e., DYS391, SRY, and Y InDel. This result was found to be consistent with the corresponding CE data. Allele no. 10 was found to be present dominantly in 63 samples followed by allele 11 (16 samples) and allele 9 (2 samples). Similarly, Y InDel showed allele 2 in 74 samples and allele 1 in only 7 male samples. AMELY deletion is a global problem25 and simultaneous amplification of the alternative sex-determining markers26,27 is highly useful in assigning the sex of a sample appropriately as evidenced in four samples of the current study.
Increment in allele number by sequencing
A huge increase in the sequence-based allele number was detected in the studied STRs in comparison to the length-based allele numbers (Fig. 3). It has been previously studied that the presence of SNPs in STR flanking regions and allele sequence variation with similar length, majorly contribute to such increment in the allele numbers28. Substantial gain in allele numbers has been detected at D13S317, D16S539, D1S1656, D5S2800, D5S818, D7S820, and vWA with D5S2800 showing a significant increase in allele numbers due to the variation in flanking region and D3S1358 showed the highest allele gain due to the differing repeat sequence conditions. On the contrary, the genetic markers which showed no gain in allele numbers either by SNPs in flanking regions or sequence length variation included CSF1PO, D18S51, D19S433, D1S1677, D22S1045, D22S1045, D3S4529, D6S1043, FGA, Penta D, Penta E, and TPOX. Besides, the markers which showed an increment in allele number only due to SNPs in flanking regions were D10S1248, D13S317, D14S1434, and D7S820. The increased allele number in D12ATA63, D12S391, D21S11, D2S1338, D3S1358, D4S2408, D8S1179, and TH01, was due to the variation in the repeat sequences only.
Short nucleotide polymorphism (SNPs) associated with the flanking region of STRs has widely been reported throughout the globe13,29,30. The SNP-STR links SNPs with the STR polymorphism which allows the generation of an STR allele subtype, based on the observed SNP allele in the flanking region. Although many other marker combinations such as deletion-insertion polymorphisms amplified with STRs (DIP-STR) are used widely, a recent study advocated the use of SNP-STRs for forensic application, where an imbalanced DNA mixture is expected31. In this regard, the current study depicted the existence of many SNPs in the flanking region of STRs in the studied population (Table 2). rs25768 showed the highest occurrence in the central Indian population associated at upstream of D5S818 marker, whereas, rs73250432, rs369257353, and rs561924992 located at upstream of D13S317, downstream of D5S818, and downstream of D16S539 respectively showed their least occurrence.
Detection of alleles with identical size but different internal sequence variation has been acknowledged as one of the advantages of using NGS for studying STRs32,33. The marker-wise isoalleles observed in the central Indian population have been reported in the Table S2. Out of 31 autosomal STR markers analyzed in this study, the isometric heterozygous pattern was observed at only 16 loci i.e., D3S1358, D21S11, vWA, D5S2800, D6S474, D2S441, D12ATA63, D2S1338, D1S1656, D16S539, D8S1179, D12S391, D2S1776, TH01, D5S818, and D4S2408. Allele no. 15 of D3S1358, allele no. 19 of D2S1338, and allele no. 22 of D12S391 showed a maximum number of isoalleles with the same size and different intervening sequences (Fig. 4).


A representative image showing allele sequence variations with the same length (a) 15 repeats at D3S1358, (b) 19 repeats at D2S1338, and (c) 22 repeats at D12S391.
A previous report has suggested a correlation between the allele number and various paternity and forensic parameters of an STR marker such as total possible genotypes, Power of discrimination, Matching probability, Polymorphic information content, power of exclusion, total paternity index, and gene diversity18. Keeping this in view, a substantial increase in sequence-based allele numbers in the STRs as observed in the present study increased their evidentiary value. With the increase in the allele number, the potential forensic and paternity applications of the STR markers are substantially increased. An increase in the allele number has further been correlated with the increase in heterozygosity of an STR marker which also increased its informativeness9.
Population genetics
When the observed size-based allelic data were compared at 15 consistent STR markers of the different populations and a neighbor-joining tree was constructed (Fig. 5a), the dendogram showed two distinct branches of the population clusters. One cluster included the population of Tibet, Nepal, China Han population from Yunnan Province, Southwest China, northeastern Thai people of Thailand, Hainan Li population from China, Kathmanduand Newar population, Nepal. The studied Central Indian population showed a close affinity with the population of Rajasthan, India, and the population of Odisha, India. Further, a consistent result was obtained in PCA plot based on the component 1 and component 2 (Fig. 5b), where, clustering of populations from Madhya Pradesh (Gond), Jharkhand, Uttar Pradesh, Tamilnadu, Rajasthan, Himachal Pradesh and Odisha states was observed. Therefore, the genetic sharing largely mimiced the geographical clustering. The heat map drawn using Nei’s Da distance matrix has been shown in Fig. 6. The overall result of the heat map was found in concordance with the outcomes of the NJ and PCA plot for the interpopulation comparison.


(a) Neighbour Joining phylogenetic tree, and (b) PCA plot showing relatedness of the observed size-based alleles of consistent 15 autosomal STR markers with different populations.


Interpopulation genetic structure at 15 consistent autosomal STR loci.

