Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

1.

Altshuler, D. M. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

Article
CAS

Google Scholar

2.

McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

CAS
PubMed
PubMed Central
Article

Google Scholar

3.

Li, W. & Freudenberg, J. Mappability and read length. Front. Genet. 5, 381 (2014).

PubMed
PubMed Central

Google Scholar

4.

Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).

5.

Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).

CAS
PubMed
Article

Google Scholar

6.

Falconer, E. & Lansdorp, P. M. Strand-seq: a unifying tool for studies of chromosome segregation. Semin. Cell Developmental Biol. 24, 643–652 (2013).

CAS
Article

Google Scholar

7.

Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).

CAS
PubMed
PubMed Central
Article

Google Scholar

8.

Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351 (2015).

CAS
PubMed
PubMed Central
Article

Google Scholar

9.

Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

CAS
PubMed
Article

Google Scholar

10.

Jain, C., Rhie, A., Hansen, N., Koren, S. & Phillippy, A. M. A long read mapping method for highly repetitive reference sequences. Preprint at https://doi.org/10.1101/2020.11.01.363887 (2020).

11.

Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).

CAS
PubMed
PubMed Central
Article

Google Scholar

12.

Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 7857 (2021).

13.

Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).

14.

Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

15.

Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

CAS
PubMed
PubMed Central
Article

Google Scholar

16.

Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

CAS
PubMed
Article

Google Scholar

17.

Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).

CAS
PubMed
Article

Google Scholar

18.

nanoporetech/medaka: sequence correction provided by ONT Research, https://github.com/nanoporetech/medaka (Oxford Nanopore Technologies, 2018).

19.

Luo, R. et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat. Mach. Intell. 2, 220–227 (2020).

Article

Google Scholar

20.

Edge, P. & Bansal, V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 10, 1–10 (2019).

CAS
Article

Google Scholar

21.

Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

CAS
PubMed
PubMed Central
Article

Google Scholar

22.

Ebler, J., Haukness, M., Pesout, T., Marschall, T. & Paten, B. Haplotype-aware diplotyping from noisy long reads. Genome Biol. 20, 116 (2019).

PubMed
PubMed Central
Article

Google Scholar

23.

Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).

CAS
PubMed
PubMed Central
Article

Google Scholar

24.

Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

CAS
PubMed
PubMed Central
Article

Google Scholar

25.

Patterson, M. D. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22, 498–509 (2015).

CAS
PubMed
Article

Google Scholar

26.

Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).

CAS
PubMed
PubMed Central
Article

Google Scholar

27.

Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Preprint at https://doi.org/10.1101/2020.07.24.212712 (2020).

28.

Olson, N. D. et al. precisionFDA Truth Challenge V2: calling variants from short-and long-reads in difficult-to-map regions. Preprint at https://doi.org/10.1101/2020.11.13.380741 (2020).

29.

Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338 (2018).

CAS
PubMed
PubMed Central
Article

Google Scholar

30.

Jain, M. et al. Linear assembly of a human centromere on the Y chromosome. Nat. Biotechnol. 36, 321 (2018).

CAS
PubMed
PubMed Central
Article

Google Scholar

31.

Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. 28, 1029–1038 (2018).

CAS
PubMed
PubMed Central
Article

Google Scholar

32.

Eichler, E. E., Clark, R. A. & She, X. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat. Rev. Genet. 5, 345 (2004).

CAS
PubMed
Article

Google Scholar

33.

Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol. 134, 691–703 (2017).

CAS
PubMed
PubMed Central
Article

Google Scholar

34.

Rang, F. J., Kloosterman, W. P. & de Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19, 90 (2018).

PubMed
PubMed Central
Article
CAS

Google Scholar

35.

Chin, C.-S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun. 11, 1–9 (2020).

Article
CAS

Google Scholar

36.

Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983 (2018).

CAS
PubMed
Article

Google Scholar

37.

Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).

CAS
PubMed
PubMed Central
Article

Google Scholar

38.

Rodriguez, O. L. et al. A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus. Front. Immunol. 11, 2136 (2020).

CAS
PubMed
PubMed Central
Article

Google Scholar

39.

Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050 (2016).

CAS
PubMed
PubMed Central
Article

Google Scholar

40.

Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174 (2018).

CAS
Article

Google Scholar

41.

Porubsky, D. et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 39, 302–308 (2021).

CAS
PubMed
Article

Google Scholar

42.

Harrow, J. et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

CAS
PubMed
PubMed Central
Article

Google Scholar

43.

Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561 (2019).

CAS
PubMed
PubMed Central
Article

Google Scholar

44.

Baid, G. et al. An extensive sequence dataset of gold-standard samples for benchmarking and development. Preprint at https://doi.org/10.1101/2020.12.11.422022 (2020).

45.

Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

CAS
PubMed
Article

Google Scholar

46.

Heller, D. & Vingron, M. SVIM-asm: Structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 22–23 (2020).

Google Scholar

47.

Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).

48.

Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).

CAS
PubMed
PubMed Central
Article

Google Scholar

49.

Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).

CAS
PubMed
PubMed Central
Article

Google Scholar

50.

Glusman, G., Cox, H. C. & Roach, J. C. Whole-genome haplotyping approaches and genomic medicine. Genome Med. 6, 1–16 (2014).

Article
CAS

Google Scholar

51.

Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).

PubMed
PubMed Central
Article
CAS

Google Scholar

52.

Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

CAS
PubMed
PubMed Central
Article

Google Scholar

53.

Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

PubMed
PubMed Central
Article

Google Scholar

54.

Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

CAS
PubMed
PubMed Central
Article

Google Scholar

55.

Cleary, J. G. et al. Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. J. Comput. Biol. 21, 405–419 (2014).

CAS
PubMed
Article

Google Scholar

56.

Newey, W. K. Adaptive estimation of regression models via moment restrictions. J. Econom. 38, 301–339 (1988).

Article

Google Scholar

57.

K. Shafin, et al. PEPPER-Margin-DeepVariant (version r0.4), https://doi.org/10.5281/zenodo.5275510 (Zenodo, 2021).

Vasiprak Blog

You might also like

Late-onset retinal degeneration mechanism and potential Rx — ScienceDaily

Advanced Housekeeping Keeps Space Station in Tip-Top Shape

Ned Seeman (1945–2021)

Stay tuned!