Preloader

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

  • 1.

    Altshuler, D. M. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article 
    CAS 

    Google Scholar 

  • 2.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 3.

    Li, W. & Freudenberg, J. Mappability and read length. Front. Genet. 5, 381 (2014).

    PubMed 
    PubMed Central 

    Google Scholar 

  • 4.

    Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).

  • 5.

    Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 6.

    Falconer, E. & Lansdorp, P. M. Strand-seq: a unifying tool for studies of chromosome segregation. Semin. Cell Developmental Biol. 24, 643–652 (2013).

    CAS 
    Article 

    Google Scholar 

  • 7.

    Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 8.

    Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 9.

    Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 10.

    Jain, C., Rhie, A., Hansen, N., Koren, S. & Phillippy, A. M. A long read mapping method for highly repetitive reference sequences. Preprint at https://doi.org/10.1101/2020.11.01.363887 (2020).

  • 11.

    Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 12.

    Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 7857 (2021).

  • 13.

    Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).

  • 14.

    Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

  • 15.

    Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 16.

    Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 17.

    Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 18.

    nanoporetech/medaka: sequence correction provided by ONT Research, https://github.com/nanoporetech/medaka (Oxford Nanopore Technologies, 2018).

  • 19.

    Luo, R. et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat. Mach. Intell. 2, 220–227 (2020).

    Article 

    Google Scholar 

  • 20.

    Edge, P. & Bansal, V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 10, 1–10 (2019).

    CAS 
    Article 

    Google Scholar 

  • 21.

    Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 22.

    Ebler, J., Haukness, M., Pesout, T., Marschall, T. & Paten, B. Haplotype-aware diplotyping from noisy long reads. Genome Biol. 20, 116 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 23.

    Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 24.

    Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 25.

    Patterson, M. D. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22, 498–509 (2015).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 26.

    Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 27.

    Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Preprint at https://doi.org/10.1101/2020.07.24.212712 (2020).

  • 28.

    Olson, N. D. et al. precisionFDA Truth Challenge V2: calling variants from short-and long-reads in difficult-to-map regions. Preprint at https://doi.org/10.1101/2020.11.13.380741 (2020).

  • 29.

    Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 30.

    Jain, M. et al. Linear assembly of a human centromere on the Y chromosome. Nat. Biotechnol. 36, 321 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 31.

    Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. 28, 1029–1038 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 32.

    Eichler, E. E., Clark, R. A. & She, X. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat. Rev. Genet. 5, 345 (2004).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 33.

    Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol. 134, 691–703 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 34.

    Rang, F. J., Kloosterman, W. P. & de Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19, 90 (2018).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar 

  • 35.

    Chin, C.-S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun. 11, 1–9 (2020).

    Article 
    CAS 

    Google Scholar 

  • 36.

    Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983 (2018).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 37.

    Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 38.

    Rodriguez, O. L. et al. A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus. Front. Immunol. 11, 2136 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 39.

    Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 40.

    Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174 (2018).

    CAS 
    Article 

    Google Scholar 

  • 41.

    Porubsky, D. et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 39, 302–308 (2021).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 42.

    Harrow, J. et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 43.

    Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 44.

    Baid, G. et al. An extensive sequence dataset of gold-standard samples for benchmarking and development. Preprint at https://doi.org/10.1101/2020.12.11.422022 (2020).

  • 45.

    Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 46.

    Heller, D. & Vingron, M. SVIM-asm: Structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 22–23 (2020).

    Google Scholar 

  • 47.

    Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).

  • 48.

    Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 49.

    Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 50.

    Glusman, G., Cox, H. C. & Roach, J. C. Whole-genome haplotyping approaches and genomic medicine. Genome Med. 6, 1–16 (2014).

    Article 
    CAS 

    Google Scholar 

  • 51.

    Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar 

  • 52.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 53.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 54.

    Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 

  • 55.

    Cleary, J. G. et al. Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. J. Comput. Biol. 21, 405–419 (2014).

    CAS 
    PubMed 
    Article 

    Google Scholar 

  • 56.

    Newey, W. K. Adaptive estimation of regression models via moment restrictions. J. Econom. 38, 301–339 (1988).

    Article 

    Google Scholar 

  • 57.

    K. Shafin, et al. PEPPER-Margin-DeepVariant (version r0.4), https://doi.org/10.5281/zenodo.5275510 (Zenodo, 2021).

  • Source link