Preloader

Curated variation benchmarks for challenging medically relevant autosomal genes

  • 1.

    Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    CAS 
    Article 

    Google Scholar 

  • 2.

    Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    CAS 
    Article 

    Google Scholar 

  • 3.

    Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

    CAS 
    Article 

    Google Scholar 

  • 4.

    Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).

    CAS 
    Article 

    Google Scholar 

  • 5.

    Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).

    Article 

    Google Scholar 

  • 6.

    De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).

    Article 

    Google Scholar 

  • 7.

    Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet. Med. 18, 1282–1289 (2016).

    CAS 
    Article 

    Google Scholar 

  • 8.

    Ebbert, M. T. W. et al. Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight. Genome Biol. 20, 1–23 (2019).

    Article 

    Google Scholar 

  • 9.

    Lincoln, S. E. et al. One in seven pathogenic variants can be challenging to detect by NGS: an analysis of 450,000 patients with implications for clinical sensitivity and genetic test implementation. Genet. Med. 23, 1673–1680 (2021).

  • 10.

    Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).

    CAS 
    Article 

    Google Scholar 

  • 11.

    Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020) ; erratum 38, 1357 (2020).

    CAS 
    Article 

    Google Scholar 

  • 12.

    Olson, N. D. et al. precisionFDA Truth Challenge V2: calling variants from short- and long-reads in difficult-to-map regions. Preprint at bioRxiv https://doi.org/10.1101/2020.11.13.380741 (2020).

  • 13.

    Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.07.24.212712 (2020).

  • 14.

    Chin, C.-S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun. 11, 4794 (2020).

    CAS 
    Article 

    Google Scholar 

  • 15.

    Goldfeder, R. L. et al. Medical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).

    Article 

    Google Scholar 

  • 16.

    Ball, M. P. et al. A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA 109, 11920–11927 (2012).

    CAS 
    Article 

    Google Scholar 

  • 17.

    Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–D947 (2019).

    CAS 
    Article 

    Google Scholar 

  • 18.

    Ross, M. G. et al. Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013).

    Article 

    Google Scholar 

  • 19.

    Prior, T. W., Leach, M. E. & Finanger, E. Spinal muscular atrophy. In GeneReviews [Internet] (University of Washington, 2020).

  • 20.

    Biros, I. & Forrest, S. Spinal muscular atrophy: untangling the knot? J. Med. Genet. 36, 1–8 (1999).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • 21.

    Leiding, J. W. & Holland, S. M. Chronic granulomatous disease. In GeneReviews [Internet] (University of Washington, 2016).

  • 22.

    Innan, H. A two-locus gene conversion model with selection and its application to the human RHCE and RHD genes. Proc. Natl. Acad. Sci. USA 100, 8793–8798 (2003).

    CAS 
    Article 

    Google Scholar 

  • 23.

    Hayakawa, T. et al. Coevolution of Siglec-11 and Siglec-16 via gene conversion in primates. BMC Evol. Biol. 17, 228 (2017).

    Article 

    Google Scholar 

  • 24.

    Garg, P. et al. Pervasive cis effects of variation in copy number of large tandem repeats on local DNA methylation and gene expression. Am. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2021.03.016 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • 25.

    Lennerz, J. K. et al. Addition of H19 ‘loss of methylation testing’ for Beckwith-Wiedemann syndrome (BWS) increases the diagnostic yield. J. Mol. Diagn. 12, 576–588 (2010).

    CAS 
    Article 

    Google Scholar 

  • 26.

    Nurk, S. et al. The complete sequence of a human genome. Preprint at bioRxiv https://doi.org/10.1101/2021.05.26.445798 (2021).

  • 27.

    Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Preprint at bioRxiv https://doi.org/10.1101/2021.07.12.452063 (2021).

  • 28.

    Boisson, B. et al. Rescue of recurrent deep intronic mutation underlying cell type–dependent quantitative NEMO deficiency. J. Clin. Invest. 129, 583–597 (2018).

    Article 

    Google Scholar 

  • 29.

    1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article 

    Google Scholar 

  • 30.

    Schmidt, K., Noureen, A., Kronenberg, F. & Utermann, G. Structure, function, and genetics of lipoprotein (a). J. Lipid Res. 57, 1339–1359 (2016).

    CAS 
    Article 

    Google Scholar 

  • 31.

    Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).

    Article 

    Google Scholar 

  • 32.

    Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinform. 37, 1639–1643 (2020).

  • 33.

    Theunissen, F. et al. Structural variants may be a source of missing heritability in sALS. Front. Neurosci. 14, 47 (2020).

    Article 

    Google Scholar 

  • 34.

    Guo, Y. et al. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.Genomics 109, 83–90 (2017).

    CAS 
    Article 

    Google Scholar 

  • 35.

    Pan, B. et al. Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinform. 20, 101 (2019).

  • 36.

    Miller, C. A. et al. Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence. Preprint at bioRxiv https://doi.org/10.1101/2021.05.07.442430 (2021).

  • 37.

    Li, H. et al. Exome variant discrepancies due to reference-genome differences. Am. J. Hum. Genet. 108, 1239–1250 (2021).

    CAS 
    Article 

    Google Scholar 

  • 38.

    Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 590, E55 (2021).

    CAS 
    Article 

    Google Scholar 

  • 39.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinform. 26, 841–842 (2010).

  • 40.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinform. 34, 3094–3100 (2018).

  • 41.

    Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    CAS 
    Article 

    Google Scholar 

  • 42.

    Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).

  • 43.

    Farek, J. et al. xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments. Preprint at bioRxiv https://doi.org/10.1101/295071 (2018).

  • 44.

    Edge, P. & Bansal, V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 10, 4660 (2019).

    Article 

    Google Scholar 

  • 45.

    Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Meth. 18, 1322–1332 (2021).

  • 46.

    Sahraeian, S. M. E. et al. Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10, 1041 (2019).

    Article 

    Google Scholar 

  • 47.

    Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).

    Article 

    Google Scholar 

  • 48.

    Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 6, 498–509 (2015).

  • 49.

    Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).

    CAS 
    Article 

    Google Scholar 

  • 50.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  • 51.

    Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects.Nat. Commun. 9, 4038 (2018).

    Article 

    Google Scholar 

  • 52.

    Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).

  • 53.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinform. 25, 2078–2079 (2009).

  • 54.

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinform. 28, 333–339 (2012).

  • 55.

    Cameron, D. L. et al. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res. 27, 2050–2060 (2017).

    CAS 
    Article 

    Google Scholar 

  • 56.

    Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    Article 

    Google Scholar 

  • 57.

    Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinform. 32, 1220–1222 (2016).

  • 58.

    Kronenberg, Z. N. et al. Wham: identifying structural variants of biological consequence. PLoS Comput. Biol. 11, e1004572 (2015).

    Article 

    Google Scholar 

  • 59.

    Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    CAS 
    Article 

    Google Scholar 

  • 60.

    De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinform. 34, 2666–2669 (2018).

  • 61.

    Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    CAS 
    Article 

    Google Scholar 

  • 62.

    Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).

    CAS 
    Article 

    Google Scholar 

  • 63.

    Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinform. 31, 2032–2034 (2015).

  • 64.

    Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinform. 30, 2503–2505 (2014).

  • 65.

    Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    CAS 
    Article 

    Google Scholar 

  • Source link