Preloader

Versioning biological cells for trustworthy cell engineering

CellRepo is a cloud version control for engineering biology

We created a cloud-based community resource built on top of a modern software engineering stack for web applications. As in any cloud-based application, the user needs to register, providing a name, e-mail address and password. An avatar picture can be uploaded to personalize the experience and to be more recognizable by other users (e.g., collaborators). After registering, the user will receive a confirmation by e-mail. Finally, the user can sign in by typing the registration e-mail and password. Once a user signs in, they land on the homepage (Fig. 2a) that contains everything needed to build repositories of engineered strains, manage their accounts and the teams they work with. From the initial page, it is also possible to access the system documentation (“Knowledge Base”). The blue upper horizontal quick menu links to all the aforementioned features and is present on every page on the website. This menu also contains a search bar. This allows users to look for repositories and commits accessible to them (i.e., cell repositories they own, that belong to their teams or public cell repositories) and look for identifiers to find the documentation on specific strains/plasmids.

Fig. 2: CellRepo workspace.
figure 2

a Homepage after a user signs in. From there they can search and browse their own strain repositories or those they participate as a team member. Also, they have access to any strain repository that has been made public. Users can also create new version control repositories, make commits to them, add new species, etc. The landing page also shows a recent activity registry of the users and the repositories they have access to. b Species search functionality: users can look up a species database and select the ones they want to use as a base for a cell engineering project. If a species is not available in the database users can make a request to add it.

The first step to start a repository is to select a species. The server is linked to up-to-date databases of organisms (Fig. 2b). This ensures that the users are always able to use the species they need and that these are well documented. To ease the finding of new species to work with, the users need to pre-select them from the database and add the species to their unique list of in-use organisms.

Repositories are projects or experiments (e.g., compound production, protein expression, etc.) and are usually linked to a specific species. Metadata information like the name and description of the project, as well as information about the purpose or how to use a specific strain repository, can be added. Repositories may have different “visibilities”: public (anyone can see the content of the repository), team (visible just for members of the same team or laboratory) and private (just the user can see and add changes). A user may change its repository visibility at any point in time. A repository can have many branches and in turn, branches are made of commits. The name of the initial branch can be set during the creation of the repository.

A repository (Fig. 3a) has a main or leader branch (named during the repository creation) and many other branches. Each branch represents a new direction or idea the users want to pursue in their cell engineering activities (e.g., novel protocols, different gene edition order, etc.).

Fig. 3: Cell repository details.
figure 3

a Strain Engineering Repositories contain all the digital footprint produced during the engineering of a cell line. The repository provides general information about the cell line project (in this example, the barcoding proof of principle of S. cerevisiae). It also contains all the different “commits” that were made during the engineering process. b A commit represents a related set of changes introduced into a cell line. All commits have a unique digital identifier and some commits (decided by the user) may also have a physical identifier barcode that is physically inserted into the cell chromosome. Recovering the barcode by sequencing allows a cell engineer to recover the id of the commit containing all the digital footprint of the cell line.

A commit (Fig. 3b) captures the status of the engineered strain at a specific point in time. The amount of information contained in a commit is up to the user (it can be as simple as a new strain name or as complex as a brand-new strain creation by modifying the genome and adding several documents). In addition, the commits are the containers of the uploaded documentation (which can be in the form of documents, models, sequences, etc.).

Once in a repository, the user can choose a branch to commit. The “new commit” button opens a form in which various types of information can be inserted. For example, the user can name and describe the commit (what is being done? why? what for?). Importantly, all types of documentation (supporting the commit) can be uploaded on this page such as construct sequence, electrophoresis gel pictures, SBOL24 files, growths and fluorescence curves, sequencing results, automation worklist instructions, computer models, etc. The user can also provide genotype and phenotype information, the storage location of the strain, safety information, acceptable material transfer agreements for the strain, etc.

The user can choose the level of granularity of commits that best fits its laboratory practice, e.g., a commit might represent a single cell modification or multiple multi-loci genetic changes.

When creating a new commit, the user can decide whether the change is important enough (e.g., a milestone) to be physically barcoded into the cell. If that is the case, the system allows the generation of a unique barcode sequence. The barcode can then be synthesized and inserted into the strain. Once created, the commit will be linked unequivocally to the strain carrying the barcode sequence.

CellRepo allows users to be part of collaboration “teams” for cell engineering. Team members of a strain repository can make commits and create new branches to the cell line history. Furthermore, teams of researchers can share repositories, track strains and be up to date on the experiments being carried out in their projects. Creating a new team is as easy as providing a name to the team and adding CellRepo users to it. Once established, it is possible to see all the members and the shared repositories and keep track of the activity taking place on the repository.

In vivo barcoding experiments

Different barcoding protocols (detailed step-by-step protocols can be found in the extensive Supplementary material) were assessed for E. coli, B. subtilis, S. albidoflavus, P. putida, S. cerevisiae and K. phaffii—previously known as P. pastoris. These protocols are used to introduce into the chromosome of the cell the barcodes that are automatically generated by CellRepo when a user creates a new commit in the version control system. CellRepo maps the unique commit identifier into DNA sequences that are then used as barcodes. All the tested barcoding procedures successfully barcoded the target species (Supplementary Table 5). URL links and QR codes for all CellRepo repositories for these experiments can be found in Supplementary Table 6.

For all species tested, barcodes are genetically, physiologically innocuous and stable over a range of growth conditions

Barcoding a strain should have little to no effect on its growth profile; growth profiles of barcoded cells were compared to wild-type (i.e., non-barcoded) strains (Supplementary Fig. 5).

The six growth profiles show no significant differences between barcoded strains and the corresponding non-barcoded parental strains. This confirms that the barcode insertion has little effect on the growth of the different species.

We also evaluated whether or not the barcoding protocols introduced unplanned mutations in the recipient cell. For instance, this can help choose a specific barcoding method over another. To do this, the whole genome of the barcoded strains and the wild-type strains used was sequenced. The results can be found in Supplementary Tables 7–12.

E. coli results show that the three clones barcoded using Lambda-Red method had the same point mutation in an intergenic region (Supplementary Table 7). This may be explained by the fact that the initial colony chosen to start the insertion process already contained the mutation or it was acquired during the process. In any case, the mutation is intergenic and does not seem to affect the cells. In one of the clones barcoded using gRNA1, two-point mutations appear in different CDS. Strains barcoded using gRNA2 do not show any mutation.

B. subtilis was barcoded using three different methods. Both CRISPR (only one gRNA tested) and Toxin-mediated barcoded cells show no mutations in all the different clones. In one of the barcoded strains using Cre-Lox, a mutation appears. In a different clone, two different SNPs could be detected. All the mutations are in different CDS (Supplementary Table 8).

For P. putida, two clones were barcoded using the CRISPR/targetron system and both showed one mutation in different CDS (Supplementary Table 9).

S. cerevisiae was barcoded using two methods. In both of them, most of the mutations that appear are tandem repeat related. These could be acquired during the insertion process or could be sequencing artifacts related to this type of repetitive sequence. Two strains barcoded using Cre-Lox show single point mutations. Four mutations (tandem repeats) are observed in Strain 2 (CRISPR). A single point SNP can be observed in Strain 3 (Supplementary Table 10).

One of the two strains of K. phaffii shows two-point mutations in intergenic regions (Supplementary Table 11).

Finally, S. albidoflavus was barcoded using CRISPR. NGS analysis of S. albidoflavus shows a larger number of variants. Strain 1 barcoded with gRNA1 shows two different base-pair changes in different CDS. However, no mutations were found for Strain 2 (gRNA1) (Supplementary Table 12). The three strains produced using gRNA2 count six, three and two SNPs, respectively. In eukaryotes, CRISPR-caused double-strand breaks (DSB) can be repaired by non-homologous end joining (NHEJ) or homologous recombination (HR) in the presence of a repair template. NHEJ repair is usually imprecise and indels occur. In the case of S. cerevisiae, however, it has been observed that NHEJ hugely decreases cell survival and, when a repair template is provided, HR is the prevalent repair mechanism in this species25. Prokaryotes, on the other hand, usually lack NHEJ repair mechanisms. Nevertheless, it has been described in Streptomyces coelicolor among other bacterial species. In this Actinobacteria, closely related to S. albidoflavus, researchers knocked out genes using CRISPR without providing a repair template and allowing the native NHEJ system to—wrongly—repair the DSB26. It may be possible that the gRNAs caused CRISPR off-target activity that was repaired by NHEJ causing mutations to appear. Together with the fact that the genome of S. albidoflavus has a high GC content and produced the worst sequencing quality of the analyzed species, this may explain the higher mutation count. Even though other mechanisms may explain the observed variation (see next), users wanting to use pSA-CRISPR-gRNA2 should bear in mind these extra mutations.

The NGS analysis shows that for all the sequenced strains the number of total mutations found in each method is low. The mutations are not constant in the different replicas sent to sequencing. We hypothesize that the mutations (if not sequencing artifacts) are caused by the natural mutation rate in each species during several cycles of growth (both in liquid and solid media). This is supported by26. In this S. coelicolor CRISPR edition experiment, the control strain, in which an empty (no target) gRNA was provided to the cells caused a total of seven mutations, three of which were in coding regions. Similar results were found in a CRISPR experiment in S. cerevisiae where they detected 10 SNPs that were probably caused by the successive transformation rounds required for the experiment27.

Importantly, we found no structural variants in any of the sequenced strains.

The NGS analysis suggests that the barcoding procedures do not change the genome of the strains more than what would be expected while carrying out conventional genetic engineering protocols. CellRepo users can use this information to choose the barcoding method specific for each species that best fits them.

We also carried out stability evaluation of the barcodes where the stability of the barcode sequences was assessed under five different growth conditions. The barcodes were stable both in terms of presence (Supplementary Table 13) and sequence integrity after the long-term experiments (Supplementary Figs. 6–35). Finally, the usage of barcoded strains as a way to track the dissemination of GMOs is described (Supplementary Fig. 36). In the particular case of a gene drive (which has been proposed as a solution to some infectious diseases transmitted to humans from animal and insect vectors), barcode sequences could pinpoint the source of the released modified organisms (29) (intentionally or accidentally) in the environment.

Barcode survival after long-term growth

Stationary phase mutagenesis occurs to microorganisms when they are deprived of nutrients. Mutations may arise without active cell division or global DNA replication28,29. This phenomenon has been demonstrated in E. coli30,31, B. subtilis32, P. putida33 and S. cerevisiae34,35. Because of that, we evaluated whether the barcoding DNA sequence introduced is stable during continuous stationary phase growth and other non-exponential growth profiles, common in laboratory and industrial processes like batch fermentation growth and restreaks on solid media.

To assess that the barcode stays in the insertion site and that its sequence is still retrievable even after long periods of growth, we ran on all six species five different experiments for 10 days.

As a preliminary experiment, for each condition, the final day single colonies were restreaked and the barcode region was PCR amplified and sent to Sanger sequencing (Eurofins Genomics). For all the colonies tested, we were able to confirm the barcode presence by PCR in all the cases. Supplementary Table 13 describes in detail the sequencing results of this experiment.

To have a more thorough view of what happened to the barcode sequence during the long-term experiments, an NGS analysis of the PCR purified product of the barcoded region of the cell population in conditions 1–4 was performed.

Mutation analysis of the barcode sequences for all species can be found in Supplementary Figs. 6–35.

E. coli control (Supplementary Fig. 6) showed a base-pair change in 50% of the reads. To check if the glycerol stock had any mutation, ten colonies were isolated and the PCR product of the barcode region was sent to Sanger sequencing. No mutations were found. A point mutation in the initial PCR cycles of the reaction sent to NGS explains this result.

The percentage of reads showing either INDELs or base-pair changes stayed at the same value as the one observed in the control experiment (lower than 0.05%).

Both the single colonies and the population level experiments show that the barcode was still present after the long-term incubation period and that the sequence was stable on all five experimental conditions.

Barcodes provide a backtrack signal for GMO dispersion experiments

Gene drive technology allows the researchers to propagate a specific genetic modification through a population36,37. The scientific community needs to assess the risk of this kind of research. Barcode sequences can be helpful in this matter and uniquely identify the laboratories where a gene drive experiment was carried out, the purpose of the modification and any other relevant data (e.g., safety measures implemented).

Supplementary Fig. S36a graphically describes the gene drive molecular mechanism. The barcode identifier was coupled with the intended modification (ADE2 deletion cassette). Supplementary Fig. 36b shows that the barcoded cells (red pigment) can survive in SC-Uracil media. Haploid cells coming from the unmodified parent cell show red pigment only when pCas9 plasmid was also present. In all cases, it was possible to PCR and sequence the barcode sequence from each haploid individual.

Source link