Preloader

Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity

Study design

A set of contrived reference samples were created using the established Sample A, which was genotyped to have ~40,000 “known variants” and ~10.2 Mb of “known negatives” in defined coding regions (a.k.a., consensus target region (CTR)), and Sample B, which is a non-cancer background cell line5. Sample A and Sample B were mixed at different ratios and enzymatically fragmented to create Sample Df (a.k.a., LBx-high), Ef (a.k.a., LBx-low), and Ff, which harbor “known variants” at different variant allele frequency (VAF) levels. Enzymatically fragmented Accugenomics spike-in control6 and 0.1% enzymatically fragmented AcroMetrix synthetic hotspot controls7 were added for testing their usability. Five ctDNA assays, i.e., Burning Rock Biotech Lung Plasma v4 ctDNA assay (Burning Rock Biotech), Integrated DNA Technologies (IDT) xGen Non-small Cell Lung Cancer ctDNA assay (Integrated DNA Technologies), Illumina TruSight Tumor 170 + UMI (Illumina, Inc.), Roche AVENIO ctDNA Expanded Kit (Roche Sequencing Solutions), and Thermo Fisher Oncomine Lung Cell-Free Total Nucleic Acid Research Assay (Thermo Fisher Scientific) were used in this study. Twelve independent laboratories across the United States, United Kingdom, China, and Australia were recruited to perform the ctDNA assays. Each assay was performed at 2–3 independent testing laboratories, with four technical replicates per lab for each ctDNA sample at each input quantity. The LBx-low sample (Sample Ef or EfIS) was analyzed at three input quantity levels, while other samples were analyzed at one input quantity level. Not all reference samples at every input quantity levels were analyzed with all five ctDNA assays. A total of 359 DNA libraries were prepared and sequenced with one of the three sequencing platforms, i.e., Illumina NextSeq500, Illumina NovaSeq6000, and Thermo Fisher Ion S5 XL (Fig. 1).

Fig. 1
figure 1

Study design. (a) Sample A and Sample B were mixed at ratios of 0:1, 1:4, 1:24, and 1:124, and then enzymatically fragmented to create Bf, Df, Ef, and Ff. Enzymatically fragmented Accugenomics spike-in control was added to Bf, Df, and Ef to create BfIS, DfIS, and EfIS. Sample Ef was suspended in synthetic plasma to create Ep. 0.1% enzymatically fragmented Acrometrix Synthetic Hotspot controls were added to Sample Bf to create Sample AC01. Sample Ef (Sample EfIS for panel IDT) was suspended in synthetic plasma solutions to create Sample Ep. (b) The illustration demonstrates the workflow of sample distribution, library preparation, and sequencing data processing. A set of reference samples were distributed to each of the testing laboratories. Four technical library replicates were prepared for each sample at each laboratory. DNA libraries were then sequenced using one of the three sequencing platforms. This figure is modified from Figure 3 in our related work4.

Genomic DNA libraries construction of reference samples

Sample A is composed of an equal mass pooling of 10 gDNA samples prepared from Agilent’s Universal Human Reference RNA (UHRR) cancer cell lines8. Over 42K small variants are known with high confidence in the defined regions of over 22 million bases5.

Sample B is a gDNA sample from a normal male cell line (Agilent Human Reference DNA, Male, Agilent part #: 5190-8848). Over 10M negative positions (positions absent of variation in all cell-lines) were genotyped with high confidence in the defined regions5.

Sample A and Sample B were mixed at ratios of 1:4, 1:24, and 1:124 to create Sample D, E, and F. In order to mimic the nature that ctDNA usually exists as small fragments, Sample B, D (a.k.a., LBx-high), E (a.k.a., LBx-low), and F were enzymatically fragmented to create Sample Bf, Df, Ef, and Ff. Enzymatically fragmented Accugenomics spike-in control6 was added to Bf, Df, and Ef to create another set of samples, namely BfIS, DFIS, and EfIS. 0.1% enzymatically fragmented Acrometrix Synthetic Hotspot controls were added to Sample Bf to create Sample AC01. Sample Ef was suspended in synthetic plasma solutions to create Sample Ep (a.k.a., LBx-low-plasma), except for panel IDT. And then in a second batch, Sample EfIS and the corresponding Sample Ep were made for panel IDT after the failure of the initial experiment. Details of the ctDNA sample preparation can be found in the related research manuscript4, while Ff and FfIS were not described in the manuscript, but the sample preparation was the same except the dilution ratio (Fig. 1a).

Participating panels sign-up, test sites recruitment and sample distribution

Five oncopanel providers signed up to participate in this study. Each panel provider recruited 2–3 independent laboratories to perform their panels, following the panel providers’ standard operating procedure. A total of 12 testing laboratories were initially recruited. We then distributed the reference samples to each laboratory. Four DNA libraries were then made as technical replicates at each laboratory for each sample at each DNA input quantity levels. A total of 359 DNA libraries were prepared (Fig. 1b). Detailed information of the five participating oncopanels are listed in Table 1. For brevity, panel codes were used to identify the associated panels. Laboratory codes are listed in Table 1 to identify the test laboratories for each oncopanels.

Table 1 Detailed information for five participating ctDNA assays.

Experiment protocols

Basic information about the experimental procedures and wet-lab QC metrics are summarized in Table 1. These experimental protocols are expanded versions of descriptions in our related work4.

BRP: Burning Rock Biotech, Lung Plasma v4 ctDNA assay

Sample Ep was extracted using the QIAamp Circulating Nucleic Acid kit (Qiagen) according to the manufacturer’s instructions. After extraction, DNA concentration was quantified using Qubit 3.0 Fluorometer and concentration adjustment was performed following the organizers’ recommendation. The library prep and enrichment process were performed using Burning Rock HS UMI library preparation kit without modification. In brief, pre-fragmented SEQC2 DNA samples were end- repaired, UMI adapter ligated, and PCR enriched. About 1 μg of purified pre-enrichment UMI library were hybridized to LungPlasmaTM panel and further enriched following manufacturer’s instructions. The LungPlasmaTM panel is about 250 Kb in size and covers 168 human lung cancer related genes. Final DNA libraries were quantified using Qubit Fluorometer with dsDNA HS assay kit (Life Technologies, Carlsbad, CA). A LabChip GX Touch System, Agilent 2100 bioanalyzer or Agilent 4200 TapeStation D1000 ScreenTape was then performed to assess the quality and size distribution of the library. The libraries were sequenced on NovaSeq 6000 sequencer (Illumina, Inc., California, US) with 2 × 150 bp pair-end reads with unique dual index.

IDT: Integrated DNA Technologies, xGen Non-small Cell Lung Cancer ctDNA assay

Sample LBx-low-plasma was purified using the QIAamp® Circulating Nucleic Acid kit and quantified according to the methods described in the SEQC2 WG2 Sample Processing and Sequence Data Reporting SOP. Libraries were constructed using mock cfDNA samples in quadruplicate using the KAPA Hyper Prep Kit (Roche Sequencing Solutions) and IDT custom adapters. End repair and A-tailing were performed according to the manufacturer’s recommendations. For adapter ligation, 3 μM, 7.5 μM, and 15 μM stocks were used for 10 ng, 25 ng, and 50 ng input samples, respectively. Libraries were purified using 0.8X AMPure and amplified using unique dual index primers with 10, 9, and 8 cycles of PCR for 10 ng, 25 ng, and 50 ng input samples. Libraries were purified using 1X AMPure and quantified using Qubit. 500 ng of each library was captured with a custom NSCLC xGen Lockdown® Probe Panel (Integrated DNA Technologies) using the xGen Universal Blockers–TS Mix (Integrated DNA Technologies). After enrichment, libraries were amplified with the KAPA HiFi HotStart ReadyMix (Roche Sequencing Solutions) using 13 cycles for amplification. Post-capture libraries were purified with 1.5X AMPure, quantified, and pooled for sequencing on the Illumina NovaSeq S4.

ILM: Illumina, TruSight Tumor 170 + UMI

Libraries were prepared using the TruSight Tumor 170 Reference Guide, with modifications outlined in the TruSight UMI toolkit reference guide. Briefly, DNA samples, provided as enzymatically- fragmented material to mimic cfDNA, were end-repaired and A-tailed in a single reaction, followed by ligation to a universal adapter containing UMIs to uniquely tag each molecule going into the library preparation. Post-ligation clean-up was performed using Solid Phase Reversible Immobilization (SPRI) beads and then libraries were indexed using unique dual indexes by PCR. Target regions were captured using an overnight hybridization to biotinylated target-specific oligos which covers ~533 Kb of genomic targets across 154 genes, followed by capture with streptavidin magnetic beads. A second hybridization and capture reaction were performed followed by PCR amplification using the universal primers compatible with the sequencing flowcell. Libraries were quantified and manually normalized to 6 nM before being pooled in equal parts per library. Libraries were then further diluted and loaded using the Xp workflow on a NovaSeq 6000 S4 flowcell, with 6 libraries per lane on the flowcell. Sequencing was performed as 2 × 151 bp with 8 bp dual-indexed reads.

ROC: Roche Sequencing Solutions, AVENIO ctDNA Expanded Kit

The AVENIO ctDNA Expanded Kit (For Research Use Only; not for use in diagnostic procedures) is a hybridization-based workflow requiring only DNA, allowing the detection of single nucleotide variations (SNVs), insertions and deletions (Indels), fusions, and copy number variants (CNVs). Prior knowledge of the fusion breakpoint is not required, since the hybridization method targets whole introns of the genes of interest. In brief, the extracted cell-free DNA sample is initially ligated with adapters containing unique molecular identifiers, which allows for the deduplication of the eventual sequencing reads back to the original input molecules, significantly reducing undesired errors. After the ligation, PCR is used to universally amplify the ligated material; gene enrichment does not occur during the PCR. The sample is then incubated overnight with the gene panel, consisting of biotinylated probes designed for optimal enrichment of the genes of interest. The desired DNA-probe complexes are then captured on streptavidin beads, and after a series of washes, the samples are PCR-amplified. The final product of the workflow is enriched libraries ready for sequencing. The final sequencing libraries were sequenced using the Illumina NextSeq 500 sequencing platform. Sequencing results were analyzed by the AVENIO ctDNA Analysis Server v1.1 (for Research Use Only; not for use in diagnostic procedures).

TFS: Thermo Fisher Scientific, Oncomine Lung Cell-Free Total Nucleic Acid Research Assay

For samples that required nucleic acid extraction, the MagMAX™ Cell-Free Total Nucleic Acid Isolation Kit (https://www.thermofisher.com/order/catalog/product/A36716) (Cat. No. A3716) was used and extraction was carried out according to manufacturer’s instructions. Sequencing libraries were constructed according to manufacturer’s specifications found in the Oncomine™ Lung cfTNA Assay User Guide (https://www.thermofisher.com/order/catalog/product/A35864). Included in the user guide is the protocol for constructing and templating sequencing libraries using the Ion Chef™ Instrument (Cat. No. 4484177). Subsequently, each library was loaded on to an Ion 530™ chip & Ion 530™ Kit – Chef (Cat. Nos. A27757, A30010) which was then loaded on to Ion S5™ XL (Cat No. A27214) next generation sequencing system. Each sequencing library has a sample- specific Tag Sequencing barcode (Tag Sequencing Barcode Set 1–24, Cat. No. A31830) attached to each amplicon to enable identification of an individual sample which has been pooled with other multiplexed samples loaded on an Ion 530™ chip.

Sequencing, data processing, and collection

The libraries from the same panel were sequenced on one of the three sequencing platforms chosen by the panel providers, including Illumina NextSeq 500, Illumina NovaSeq 6000, and one ThermoFisher IonTorrent S5 XL. Each library was sequenced only on one of the platforms, so the comparison across sequencing platforms of the exact same DNA library is not available with the dataset. Sequencing data was required to be shared within the SEQC2 Oncopanel Working Group via Illumina BaseSpace Sequence Hub or by uploading data to the sFTP server hosted at Stanford University. Either FASTQ or BAM format was used for data sharing. All the data was collected at the National Center for Toxicological Research (NCTR), organized, and renamed in a consistent manner. The data was then submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) data repository. The data became publicly available upon the publication of the related research manuscript4.

Bioinformatics pipelines and variant calling results

In this study, we considered each recommended or in-house pipeline to be part of the solution of variant detection along with the according oncopanel. In the related research manuscript, the variant calling results were reported by the panel providers’ recommended or in-house pipelines for the associated panels. The reproducibility, sensitivity, and false positive rate of the participating oncopanels were reported in the related research manuscript4. This data descriptor is focusing on describe the raw sequencing data for possible reuse, however, if researchers want to compare their variant calling results with the results of the recommended or in-house pipelines, the results presented in our related manuscript4 can be downloaded from figshare9 in VCF format.

Source link