Preloader

Comparison of seven single cell whole genome amplification commercial kits using targeted sequencing

Large scale SC experiments are in increasing demand but choosing the correct WGA technology may not be derived by true comparisons of kits as such comparison is costly and laborious. Some comparative studies were previously performed, but they are either based on non-NGS analysis9, sequence non-eukaryotic cells5 or are limited by the number of cells per kit (< 9 cells, and in some cases only 2–3 cells)6,7,8,10,12. The high costs involved in SC genomics, which include the cost of scWGA reaction and costs of downstream analyses (e.g. whole exome sequencing (WES) or whole genome sequencing (WGS) of many cells) are the reason for the lack of large-scale comparison experiments.

In this study, we opted to conduct the largest scale (125 cells) scWGA kit comparison, containing, for the first time, all currently available commercial scWGA kits. We chose to analyze the scWGA products with a proven targeted sequencing benchmark analysis system3. To rule out batch effects we analyzed cells from the same clone and at the same day, and randomly distributed scWGA product in the analysis system PCR wells. Notably, improvements on existing kits were developed, namely upon MDA that is simpler than PCR based methods. These modifications require specific equipment15,16 or experimental design13 (cell stage or limited amplification). In this study, we therefore compared only commercially available kits and followed their manuals. Our experimental design advantages are: (a) cheap and therefore enables a large examined cohort of cells per kit, (b) comprises of a large amplicon panel (3401 amplicons) for improved statistics, and (c) relies on a genome template of a diploid normal cell line. This cell, when analyzed for X chromosome only, yields mono-allelic signal. We compared the following categories: genome coverage, reliability, reproducibility and error rate.

The reproducibility of a kit is sometimes of higher importance than of its genomic coverage. Phylogenetics algorithms compare the same data points (e.g. loci, SNPs) in every analyzed sample and later generate trees that reflect that comparison17. When analyzing SCs for their cell lineage relationship18, ADO plays an important role as it reduces the number of analyzed loci17. However, in such algorithms the comparable number of loci effect is even larger that the successful coverage per cell. For example: a data set of 70% genome coverage for two cells can range between fully reproducible loci number (70% of the data is comparable) to low reproducibility (until 40% comparable loci). Taking Ampli1’s internal bias against MseI containing amplicons (Supplementary Fig. 2) into account, it is the best kit in coverage and reproducibility. This biased amplification should be considered when planning an experiment. In most cases, one can order targeting probes/primers that consider Ampli1 biased amplification; however, this kit may not be suitable for several application types, and its protocol is much more laborious and not automation friendly as MDA based technologies.

In this experiment, PicoPlex was the most reliable kit, showing reproducible results for all cells, both in the coverage perspective and both in reproducibility perspective, with low variance for all analyzed cells (Fig. 2a,c). We chose to also present the data of the upper median of the most successful cells (Fig. 2b,d) as a simulation of a real experiment, where the best cells are chosen for analysis. In specific cases, such as rare cell populations, this selection is not an option. Moreover, the high cost per sample, ranging between 15 and 36$ per cell, makes the reliability improvement a key cost factor for a large-scale experiment. We believe that a fine calibration of every step of an experiment, from the cell picking, to the WGA procedure can achieve improved results for all kits. Results show that GenomePlex and TruePrime did not work in our hands. We suggest that further calibrations of their protocol may improve their results but was not in the scope of this study.

The current methodology to track and compare scWGA error rate is comparing the sequencing data generated from scWGA products to a reference genome, and therefore relies on a prior knowledge, which in the case of STR can be prone to errors or even not exist. We used our STR genotyping tool for “de novo” interpretation of error rate per locus without prior knowledge of assumption of its original STR length14. As expected, RepliG-SC excels as it is based on isothermal amplification that was previously described as having a low error rate than other WGA protocols2. This makes it favorable for variant analysis, specifically in SC experiments. GPHI-SC and TruePrime, which are also MDA based kits are amongst the three following kits, together with GenomePlex. Nevertheless, both TruePrime and GenomePlex have much less data points, due to their low success in this experiment. Although the starting template for PCR analysis was scWGA products, not normalized for their concentration, all of the kits manufacturers declare that the yield per SC is micrograms to tens of micrograms, a onefold difference. Since every PCR process yields sufficient amplification that presumably reaches a plateau, the difference between the amplification cycles per kit should be of maximum 3–4 cycles. The presented data on Fig. 3 simulated the number of noisy amplification cycles per kit. Even after addition of these 3–4 amplification cycles to RepliG-SC, it is still the best kit in the error rate aspect.

This study has several limitations (1) the experiment is limited to a targeted enrichment panel and is not a true random WGS experiment. The use of targeted enrichment as a subset of the genome is biased by its technology and by panel selection. In addition, in previous comparisons it was shown that with a low sequencing depth one could detect the coverage at a high significance8. However, even at a low depth of coverage, the cost per genome is not scalable to simulate a real SC experiment that usually comprises of tens to thousands of analyzed cells. The use of targeted sequencing therefore offers a cheap and reliable measurement that mimics a real experiment. (2) The panel is mostly comprised of STR containing amplicons that may bias the probability of amplification and affect the conclusions of the genome coverage and reproducibility results. A biased effect of this kind would result in a change in the composition of read counts per sample (STR containing amplicons and non-STR containing amplicons), compared with the original panel composition. To rule out this option we examined the above amplicon count composition of H1 bulk templates and compared them to the composition of the original panel. While the original panel composition is 95% and 5% (STR containing amplicons and non-STR containing amplicons, respectively), the compositions of amplicons count of H1 bulk template duplicate are 95.3% and 4.7% for duplicate 1, and 95.4% and 4.6% for duplicate 2, respectively, hence, amplicon count was not biased by amplicon composition. (3) One can choose to increase the panel size to improve statistics (e.g. exome panel). Increasing the probe/primers panel to larger genome panel will probably enable better statistical analysis. However, this will also dramatically affect the cost as in most cases, a change in targeted enrichment protocol will be required, and the cost per sequencing (of more bases) will also be increased. (4) Other cell properties cannot be detected by amplicon sequencing: uniformity analysis may be hampered as amplification is template sensitive, making its read coverage less informative for accurate original copy count inference e.g. for CNV profiling. Chimaeras, artefact joining of two separated genomic regions is also overlooked in amplicon sequencing: affiliated to MDA based analysis, chimaeras will not be detected as it will either not be amplified (if amplicon was not joined as a whole) or will be amplified without tracking of its occurrence.

It is clear from previous scWGA kit comparison experiments and from the data presented here that there is not a single winner in the race for the best scWGA kits, but several exceling kits, depending on the category of interest. Overall, this comparative assay demonstrates a cost-effective benchmark to compare different WGA kit properties of analyzed SCs and enables an educated selection of a WGA of choice, depending on the required application.

Source link