Massively parallel characterization of engineered transcript isoforms using direct RNA sequencing

Designing transcriptional valves to control transcript isoforms

To demonstrate how transcriptional valves might be built, we attempted to construct proof-of-concept designs for T7 RNAP. T7 RNAP was selected due to its broad use in synthetic biology, which stems from the fact that it is a single-subunit RNAP with high processivity, making it ideal for both in vitro use²⁵ as well as an orthogonal transcription system in vivo^26,27. While diverse terminators are available for the native E. coli RNAP^7,16, for T7 RNAP only a single terminator exists in the T7 phage genome²⁸ and only a few alternatives have been characterized^22,29,30. Furthermore, while termination of RNAP in model microorganisms like E. coli and S. cerevisiae has been extensively studied³¹, T7 RNAP termination has many unknowns such as alternative intrinsic terminators beyond those in the T7 phage genome and the bidirectionality of termination.

Specific features of a terminator, such as its hairpin structure and U-tract, can strongly influence termination efficiency^7,32. Therefore, as a basis for an initial library of transcriptional valves, we chose 13 different intrinsic terminators to act as “core terminator” elements (T). We began by selecting the single terminator from the T7 phage genome (T27)²⁸, which has previously been characterized in vitro²². To test its possible bidirectionality, a feature that terminators for other RNAPs have been shown to exhibit⁷, it was included in a reverse orientation in our designs²². Beyond the native T7 phage terminator, E. coli terminators present another source of these parts and have been shown to terminate T7 RNAP in vitro^22,33. Therefore, 11 intrinsic rho-independent terminators were selected from the E. coli genome spanning a wide range of termination efficiencies in vivo⁷. Finally, a negative control terminator (T33) was designed. This consisted of a random non-coding sequence generated by R2oDNA designer³⁴ and was further verified to not contain a strong hairpin in the mRNA secondary structure using the Vienna RNAfold tool³⁵.

The genetic sequence immediately upstream of a terminator-hairpin also influences termination^16,17 and we reasoned that this region could be used to fine-tune the termination efficiency of a valve. We, therefore, included a “modifier” part (M) in our transcriptional valve design and developed 13 different modifier sequences (Fig. 1b). Our first set of modifiers contained motifs designed to interact with canonical regions of a terminator hairpin sequence. Specifically, modifiers M10 and M11 were designed to interact with possible U- and A-tracts within a terminator by containing complementary homopolymers of adenine or uracil, respectively⁷. A further modifier M13 was designed to encode a small RNA secondary structure with the goal of affecting RNA structure formation near the terminator part. Beyond tuning termination efficiency with RNA interactions and structures, it has been shown that inert random sequences can play an insulating role, improving the robustness of a genetic part’s performance when used in different genetic contexts^17,36. It is also known that the upstream genetic context of intrinsic terminators influences termination in a distant-dependent manner^16,17. Thus, we decided to include a selection of modifiers of different lengths (M13–M16: 15 bp, M17–M19: 30 bp, and M20–M22: 45 bp) where each was a random non-coding sequence generated by R2oDNA designer³⁴.

To assess the robustness of each valves’ termination efficiency to local upstream genetic context, our library also included “spacer” elements (S) (Fig. 1b). These did not form part of the transcriptional valve, but instead allowed us to see how a particular valve might behave when used in combination with other components (e.g., coding regions). Using the NullSeq tool³⁷, we generated 7 random and genetically diverse 33 bp long spacers with a nucleotide composition similar to coding regions of E. coli that could be placed at the 5′ end of a valve. Each spacer had a stop codon “TAA” at its 3′-end, though this was not utilized in our in vitro transcription assay. Taken together, our spacers, modifiers, and core terminators could be combinatorially assembled to create a total of 1183 unique designs able to regulate RNAP flux and provide valuable information regarding the design principles of transcriptional valves.

Combinatorial assembly of a transcriptional valve library

A one-pot pooled combinatorial DNA assembly method was used to physically construct the final set of transcriptional valve designs (Fig. 1c, see “Methods” section)³⁸. E. coli cells were transformed with this pooled library and ~500,000 colonies (>400-fold library coverage) were selected from plates via scraping before their pooled DNA was extracted. Such a high fold-coverage ensured representation of each design in the sample (Supplementary Note 1)³⁹.

Nanopore-based long-read DNA sequencing (DNA-seq) was then used to verify the successful combinatorial assembly of every design. This showed that all designs were present with an even distribution of parts but an uneven distribution of designs (Supplementary Fig. 1). Part frequencies matched the ratios expected from the equiprobable assembly, except for short parts 15 bp long which were under-represented. This may be due to reduced assembly efficiency for shorter parts. These part frequencies were used to predict design frequencies, of which 91 had >20% absolute deviation between predicted and measured frequency. Certain parts were overrepresented in these designs (M14: 3-fold and T18: 2.3-fold) indicating that the abundance of constituent parts did not solely dictate design abundance within the library.

Finally, we investigated DNA assembly fidelity by generating accurate consensus sequences from the long-read DNA-seq data (see “Methods” section). Comparing the reference and consensus sequence for each design we found a mean of 0.6 single nucleotide polymorphisms (SNPs) per design, with 40% of designs having no SNPs (Supplementary Fig. 1).

Pooled characterization using direct RNA sequencing

Existing fluorescence⁷ and sequencing-based^18,19,40 methods for measuring transcriptional termination are ill-suited to characterizing a large pooled library of genetic parts at nucleotide resolution. Therefore, we used nanopore-based direct RNA sequencing (dRNA-seq) to provide full-length reads of transcript sequences⁴¹. Crucially, each transcript isoform encodes its associated transcriptional value sequence either at the 3′-end, if termination was successful, or within the body of the transcript, if transcriptional read-through had occurred—the design’s sequence, therefore, acts as an ‘intrinsic barcode’ that is present in every read. This allowed for individual reads to be attributed to a particular design without the need to separate and barcode them before preparing the sequencing library.

Such an approach is not possible when using more common short-read RNA-seq because the transcriptional valve sequence for reads generated by read-through events will not always be located near the transcript end, and so would not be captured by the short read length if sequenced using normal approaches. Targeted short read sequencing could potentially be used to overcome this issue but would suffer from biases present during the reverse transcription (RT) step and subsequent PCR amplification^42,43,44. Long-read dRNA-seq allows for the whole library to be directly assayed as a single pooled sample, without the need for RT, PCR or any assembly, and the data demultiplexed to simultaneously produce separate read depth profiles for each design. Finally, by comparing the ratio of transcript isoforms for each design (i.e., read depth directly before and after the transcriptional valve) a termination efficiency can be calculated (Fig. 1c).

While this approach removes the need to separate and attach unique ‘barcode’ sequences to each design when characterizing the library, it does rely on each transcriptional valve having a sufficiently different sequence for each read to be accurately mapped to a single design. Read accuracy for nanopore-based dRNA-seq is, at present, lower than for standard Illumina-based short-read RNA-seq (median read accuracy of 80–90% versus >99.9%, respectively⁴⁵) and although this gap is closing with improvements to basecallers and sequencing chemistries, analysis pipelines need to be carefully tuned and validated to ensure accurate demultiplexing of reads when dRNA-seq is used in this way.

Optimizing the computational analysis pipeline

To optimize the analysis pipeline and ensure that our library could be accurately characterized, we developed a simple computational model to simulate the error-ridden reads that would be obtained after dRNA-seq. In our simulations, errors took the form of random nucleotide substitutions that occurred at a 15% substitution frequency. While other types of error such as insertions, deletions and elevated error rates at homopolymers were not included in our model, we found that our simulations were able to identify key parameters and criteria for designing parts with a sufficient dissimilarity for effective demultiplexing.

To demultiplex the dRNA-seq reads, the BLASTN tool was used to find all possible alignments between a read and the library of designs, with the best matching design being chosen^46,47. Optimizing the BLASTN parameters is crucial for accurate characterization and so computational analyses were performed where a smaller library (540 designs, Supplementary Fig. 2) was used to systematically explore the role of each BLASTN parameter. Each design was given a random termination efficiency and had a set of full-length terminated and non-terminated reads generated based on our model. These reads were then pooled for all the designs and attempts made to demultiplex and infer the original termination efficiencies for each design. This allowed us to generate an optimized set of parameters that allows each design to be accurately identified (see “Methods” section).

The final computational demultiplexing and analysis pipeline (Fig. 1d) involved aligning the sequences of all designs against all reads using BLASTN with optimized parameters and then associating each read with the design that had the best alignment score (see “Methods” section). Reads for each design were then mapped to the appropriate reference sequence and design-specific read depth profiles generated, filtering out any reads where no terminator was present after alignment and mapping. Finally, termination efficiencies were calculated for each read depth profile as T_e = [R(x_s) − R(x_e)]/R(x_s), where R(x) is the read depth at position x in the genetic construct, and x_s and x_e are the start and end nucleotide position of the transcriptional valve, respectively.

Characterizing transcription termination at nucleotide resolution

In vitro transcription using T7 RNA polymerase of the entire pool of DNA constructs followed by dRNA-seq enabled us to rapidly assay the performance of each design simultaneously. To ensure the accuracy of our measurements, a detailed analysis of the generated read depth profiles was performed, which revealed several key features in line with other dRNA-seq studies (Supplementary Note 2)⁴⁸. We also developed a mathematical model that allowed us to correct for unwanted deviations between actual and measured T_e (Supplementary Note 3; Supplementary Figs. 3–6). We found that transcript abundances were weakly correlated with DNA construct frequencies (R² = 0.22), with strong terminators over-represented in the dRNA-seq data. A good reproducibility was observed for T_e values between replicates (R² = 0.99) and for terminators shared across separately assembled libraries with different part compositions (Supplementary Fig. 7). This resulted in 98% of designs having a difference of <5% in T_e across the experimental replicates of our initial library.

A valuable feature of part characterization by RNA-seq is the ability to extract nucleotide resolution insights from the read depth profiles. To enable comparisons between our designs where total numbers of reads for each varied, we generated profiles normalized by the read depth at the start of the transcriptional valve such that drops due to termination corresponded to a fractional change (Fig. 2a). We also calculated Δ-values corresponding to the change in normalized read depth at each nucleotide position with respect to the previous nucleotide, enabling us to pinpoint and compare changes more easily. We found that the maximum Δ-value for each design was proportional to its T_e and amounted to approximately 40% of the total T_e value. Each terminator maintained a predominant termination pattern (as shown by its Δ-profile), which varied in amplitude depending on the upstream modifier and spacer (Fig. 2a). The ability to observe these nucleotide resolution changes demonstrates a further benefit of the pooled dRNA-seq over more commonly used methods based on fluorescent reporter proteins.

**Fig. 2: Nucleotide resolution read depth profiles reveal terminator phenotypes.**

The termination pattern is an important phenotype and we found that termination does not occur at a single nucleotide location; for each of our core terminators, it occurred over several nucleotides. While the T_e of a terminator did vary across genetic contexts, in general, the termination pattern remained consistent. These patterns revealed that termination often fluctuates nucleotide by nucleotide, resulting in multiple drops in the Δ-profiles and therefore multiple transcript isoforms.

As expected, drops in read depth for each valve occurred within the corresponding U-tract (Fig. 2b). We found that termination was possible with as few as 2 U’s, but that maximum drops in the profiles occurred at a similar point (after 4 or 5 U’s) for the stronger core-terminators. This likely captures a position where a combination of optimal T7 RNAP pausing and weakened stability of the transcription elongation complex leads to the formation of the core terminator hairpin sufficient to effectively facilitate termination. The number of U’s in the U-tract at the point of maximal termination showed a correlation with T_e, although there were some outliers (T20, T14; Fig. 2c). This matches a previous finding for E. coli RNAP termination⁷. However, termination was found to always reach a peak with a U-tract that comprises fewer than the maximum possible number of U’s in the U-tract.

Using this data, it was possible to predict RNA secondary structures at the various points of termination and assess their potential influence. To do this we simulated co-transcriptional folding⁴⁹ of the terminated sequence at the point of maximal termination (Fig. 2d), assuming a previously reported transcription rate of 333 nt/s⁵⁰ (see “Methods” section). We removed the final 8 nt, which have been shown to base-pair with the DNA template in the T7 RNAP transcription elongation complex⁵¹, and studied the structure with the lowest folding energy. The strongest terminators were predicted to form terminating hairpins at their 3′-end, proximal to the elongating T7 RNAP. On the contrary, T20 was found to get locked in a secondary structure involving a hairpin ending 16 nt upstream of the T7 RNAP, meaning that it remained a weak terminator despite having a long U-tract. To reach an effective terminating hairpin T20 would have to surpass a large energy barrier.

There were three inactive terminators (T17, T21, T27) and the negative control (T33). Each of these could have a maximum of 4 U’s in the U-tract and none were predicted to form a hairpin proximal to the U-tract, which is likely the cause of their inactivity. This showed that T27, the reverse oriented phage terminator, was not able to efficiently terminate T7 RNAP bidirectionally. The three weakest active terminators (T14, T18, and T15) all had only 2 U’s in their U-tract of their most common transcript isoform, saw termination at multiple separated points, and were predicted to form a hairpin proximal to the U-tract (Fig. 2d). It is not clear why T14 gives stronger termination than T15 and T18 though one element that might increase T_e is the unique double hairpin structure that can form in the core terminator at the later points of termination. The second peak in termination for T14 coincides with the last point at which this double hairpin is predicted to exist (Fig. 2e). T15 and T18 on the other hand are predicted to form a single long hairpin structure.

An ability to engineer the precise point(s) of termination and therefore dominant 3′-UTR sequences may be important in deciding gene stoichiometries in vivo as it could potentially affect mRNA degradation rates, however, this is contested^52,53. To assess this, we reviewed the possible transcripts produced by the characterized terminators and this showed that the final nucleotide of transcript variants can be either A, C, U or G. Frequently the dominant transcript terminated within a stretch of U’s though less frequently observed transcripts were found to terminate immediately after a stretch of U’s. Further investigation revealed that in some cases, upstream sequence could tune the major point of termination. For example, modifiers M10, M11, M12, and M15 showed different stoichiometries of two types of transcripts produced by T13.

We also undertook an analysis of other general biophysical parameters that may play a role in termination (e.g., GC content and minimum free folding energy). However, none of these were correlated with measured T_e (Supplementary Fig. 8).

General termination properties of the initial valve library

Overall, T_e varied from 0 to 0.94 across the library with core terminators displaying varying levels of T_e and sensitivity to different modifier and spacer parts (Fig. 3). Grouping designs by their core terminator showed that each had a unique median T_e and variability that differed between terminators (Fig. 3b). We found that valves containing the non-terminator part (T33), reverse oriented T7 phage terminator (T27) and two of the E. coli terminators (T21 and T17) showed little to no termination (T_e < 0.05). The remaining nine E. coli terminators displayed a range of termination efficiencies for T7 RNAP with median T_e varying from 0.01 to 0.91, which was heavily influenced by upstream sequence context. The variety of T_e values observed would allow for a wide range of transcript isoform stoichiometries to be produced from 1:1 to 11:1. However, we were interested to know if patterns within this data might offer insight into the capacity of each terminator to be tuned or insulated. For example, valves displaying a wide range of T_e values for the same core terminator would indicate that the terminator is highly tunable, while a small range of T_e for a valve used with differing spacer elements would suggest that it is able to insulate its function from upstream sequence context.

**Fig. 3: Characterization of a T7 RNA polymerase transcriptional valve library.**

Tuning the strength of transcriptional valves

It is known that local sequence context can also be used to effectively alter the function of many types of genetic part^8,54,55,56. We, therefore, designed modifiers in our initial library with the aim of being able to tune the strength of a valve. Analysis of the characterization data showed that changes in upstream genetic context (both spacer and particularly modifier sequences) could significantly influence termination strength, allowing T_e to be varied over a range of up to 0.68. The ability to tune each core terminator varied, with the T_e of T10 being most tunable and T16 being the least. The capacity to tune terminator strength could arise from the diversity of co-transcriptional structures that form proximal to the U-tract when interacting with the modifier (Fig. 2d). Therefore, to create a library of highly tuned transcriptional valves, it is important to ensure the core terminators are themselves tunable.

Large variability in the magnitude of tuning was seen across the different valves we tested suggesting that sequence-specific features play a key role in modulating the precise termination efficiency. Spacers were found to not have such a systematic effect. Nonetheless, some valves were highly influenced by spacers, emphasizing the importance of upstream sequence in the region up to 120 nt upstream of the point of termination. For designs grouped by spacer, the median percentage deviation from the median T_e of the valve they contained was found to be less than 5% (Fig. 3c), suggesting that tuning of termination efficiency is best achieved by varying sequence context close to the core terminator part.

In general, each modifier tuned each terminator in a different way. However, some modifiers were found to have a similar tuning effect across many different core terminators (Fig. 3c). Some had a generally positive influence (e.g., M21) or negative influence (e.g., M20). Furthermore, the U- and A-tract interactors generally exerted opposite tuning effects on stronger terminators and weaker terminators, tuning them up and down in strength, respectively. Therefore, when tuning a T7 RNAP terminator, while bespoke modifiers are likely required, our library offers some starting points for features that are likely to have a desired effect.

Insulating transcriptional valves from local genetic context

It has been shown for many types of genetic part that more reliable performance can be achieved by inserting random non-coding sequencers around a part to insulate its function from potential interactions with other nearby sequences^8,17,36. Our library specifically included random non-coding modifiers of varying length to assess the insulating effects for transcriptional valves. In general, we found that an increase in the length of these modifiers led to a reduction in T_e variability when an identical valve design was used across numerous genetic contexts (i.e., upstream spacer sequences; Fig. 3d). This matches findings for bacterial promoters and terminators where longer upstream insulating sequences resulted in more predictable gene expression^17,36. Notably, these effects were also terminator-specific, with some core terminators showing more predictable behavior across modifiers (e.g., T16) than others (e.g., T10) (Fig. 3f). This suggests that some terminators are better suited to tuning T7 RNAP in vitro, while others are better placed to maintain a consistent termination efficiency.

Exploring modifier-terminator base-pairing

It was evident from this initial library that the general modifiers we had used limited our ability to understand the role of key interactions between the modifier and terminator parts because no terminator specific interactions had been designed. To rectify this, a further library was built to understand the effect of the modifier region upstream of the core-terminator in a more comprehensive way (Fig. 4a). Informed by our findings that longer modifiers were better insulators of T_e (Fig. 3d), we designed all-new modifiers as length 45 nt to enhance the robustness of the valves function across different genetic contexts. Co-transcriptional folding simulations had highlighted that the sequences we had designed to interact with the U-tract and A-tract were insufficient. These modifiers seldom formed structures that would influence termination by virtue of base-pairing because the A-tract is often short (<6 consecutive A’s) and at the point of termination, the U-tract is sequestered by the RNAP. Therefore, we designed modifier sequences that would target specific sequences within three strong core terminators (T29, T16, T10).

**Fig. 4: Engineering modifiers that tune and insulate core terminators.**

Motifs containing an 8 nt reverse complement sequence of three different regions of the core terminators were designed into modifiers. Gaps were filled with non-structural RNA sequences (see “Methods” section). While ideally we would have used identical padding sequences, we instead chose padding sequences with identical RNA secondary structure (i.e., no predicted structure) as these unique sequences would help ensure accurate read demultiplexing after nanopore sequencing. These motifs targeted the 5′-stem, loop and 3′-stem regions of the terminator hairpin (Fig. 4a). Two variants of each motif were designed to explore the distance dependence of the engineered motifs: “near” which was incorporated into the modifier at its 3′-end (~30 nt from the U-tract), and “far” which was incorporated at the 5′-end of the modifier (~70 nt from the U-tract).

Characterization of this library revealed that base-pairing can significantly reduce termination efficiency in a distance-dependent manner (Fig. 4b). The motifs designed to base-pair with the loop consistently caused a reduction (13–16%) in T_e. Loop-modifier interaction during termination could alter the core terminator hairpin during its formation, at the point of termination, or both, affecting the probability of a termination event. The largest effect on T_e was caused by motifs designed to interact with the stem of the terminator. We hypothesize that this is because core terminator stem base-pairing is essential for hairpin formation whereas the loop can base-pair with an interacting motif at the same time as the completing hairpin. Therefore, a motif that can base-pair with the stem could outcompete the core terminator hairpin.

For the two strongest terminators (T29, T16) the 5′-stem interactor caused a large drop in T_e, while the 3′-stem interactor did not. In the case of T16 this is likely because 5 of the 8 targeted nucleotides are predicted to be concealed within the T7 RNAP (2 nt for T29 and 3 nt for T10), where they cannot base-pair at the point of termination since they are in the U-tract. This effect on T_e was greater than any other drop caused by a modifier tested so far. As with previous modifiers, T10 behaved differently to T16 and T29. The 3′-stem interactor had a large effect on termination, while the 5′-stem interactor did not. This could be a consequence of the extra native sequence context between the hairpin and the motifs (5 nt and 3 nt more than T29 and T16, respectively) resulting in a location in which the motif can base-pair with the 3′-stem. However, it may also arise from the inherent tunability of T10. In either case, the effect on T_e of motifs that base-pair with terminators diminishes when they are placed far from the terminator.

Exploring structural interactions

It has been shown that structure can stabilize the activity of genetic parts^1,57. Therefore, we looked to further investigate whether RNA structure could insulate terminator function by designing and testing a library of modifiers containing secondary structure motifs (Fig. 4a). To explore the effect of upstream structure on termination, we designed three short (stem length 3 nt) and three long (stem length 6 nt) hairpins. The sets of short and long hairpins contained one of three loops (UUCG, GAAA, GAGA) known to facilitate strong hairpin formation⁵⁸. Furthermore, modifiers were designed with all of these secondary structures at both the 5′-end and 3′-end to test the distance-dependence of secondary structure influence on termination efficiency. Again, gaps were filled with non-structural RNA sequences (see “Methods” section). A variety of other more complex RNA secondary structures are known and one modifier containing an “elbow” (the TAR element) and one containing a pseudoknot were also designed to see any role these might play⁵⁹.

After assembling and characterizing this new library, we were able to confirm that RNA secondary structure upstream of terminators affected the robustness of terminator function with T7 RNAP (Fig. 4c), similar to that reported previously for E. coli RNAP⁶⁰. We found that short hairpin structures and complex RNA structures were the best insulators of terminator function (Fig. 4d), while long hairpin structures made termination efficiency more sensitive to upstream genetic context. The rigid requirements of short hairpin formation mean that they are likely rarely influenced by base-pairing with upstream structure. This would explain why they are good insulators since they offer dependable upstream secondary structure that does not base-pair or interact structurally with the terminator hairpin. In contrast, since long hairpins can form with a variety of stem lengths they could influence and be influenced by neighboring sequences. The resultant diversity of secondary structures that can then arise upstream of the terminator hairpin would mean that these modifiers significantly affect T_e and therefore act as poor insulators, as seen in our results.

Since we only tested stronger terminators, conclusions cannot be drawn on the capacity of base-pairing and structural sequence motifs to increase T_e. Nonetheless, these results revealed motifs that could alter the T_e or robustness of terminator function and therefore should be considered when designing genetic circuits that involve uncharacterized gene-terminator combinations.

Understanding core terminator design principles

Our results had shown that many of the relationships observed were terminator dependent, and so a final library was designed and tested to investigate variations of the core terminator part that had the greatest influence on T_e (Fig. 5a). We constructed a library of designs including U-tract variants, native context variants, and terminators from diverse organisms. This set of core-terminators was assembled and tested in just one upstream genetic context (the gfp gene) and so our analysis is focused on understanding the influence of sequence context proximal to the core-terminator hairpin.

**Fig. 5: Exploring design features of the core terminator.**

Analysis showed that the T_e of weak core-terminators used in our initial library could be increased by increasing the number of U’s in the U-tract (Fig. 5b). Our initial results indicated that a U-tract of at least 5 U’s consistently resulted in termination (Fig. 2c). Therefore, we re-engineered weaker core-terminators so that they contained a U-tract of length 8 nt. This increased T_e to varying extents (Fig. 5b). Despite each of these newly designed terminators having the possibility to terminate at a U-tract complete with 8 U’s, the dominant transcript isoform had only 4 or 5 U’s in each case. Nonetheless, the increase in T_e showed some correlation with the number of extra U’s in the dominant U-tract (Supplementary Fig. 9). For active terminators, the largest increases in termination occurred when additional U’s increased the number of U’s in the dominant transcript isoform to 4 or 5 (T14, T15, T18). Finally, in two cases (T17 and T21), termination of inactive terminators was found to be rescued.

The location of termination and thus the specific transcript produced consistently changed following these modifications (Fig. 5b). For these designs, transcript isoforms became either 1–2 nt shorter and longer. The location of termination and termination prior to a complete “UUUUUUUU” U-tract may arise since the extra U’s were not added to the dominant transcript variant. Instead, they were added at positions that would give a sequence of 8 U’s with the minimal number of single nucleotide substitutions. Therefore, to generate strong terminators from a template sequence, our results suggest first characterizing the dominant transcript form and then increasing the number of U’s within its U-tract.

To complement the modifier library, the same core terminators (T10, T16, T29) were used as a basis for various U-tracts containing no U’s. Variants of each of these core terminators with an 8 nt tract of A, C, or G were designed. To ensure their transcripts could be distinguished after dRNA-seq we put unique non-structural RNA sequences as barcodes upstream of the core terminators (see “Methods” section). These variants were inspired by data showing that T7 RNAP can slip and terminate at sites of 8 consecutive A’s in vitro^59,61. However, we found this not to be the case for T29, T16, T10, or the T7 phage T-theta terminator (Fig. 5c). The poly-C tract showed very weak termination (T_e < 0.1). For our previous designs, some native sequence context immediately upstream of the terminator hairpin (and before the modifier) is retained. Changing this decreased T_e for all the terminators studied (Fig. 5c), indicating that to maintain T_e, there is an optimal position upstream of core terminators to add modifiers.

Characterization of the T7 phage terminator (T-theta) revealed wide diversity in the points of termination (Fig. 5d). We found T-theta to be strong (median T_e = 0.82) and tunable. The progression of minimum free energy structures predicted to form as each nucleotide is transcribed suggests that a variety of structures form approaching the point of maximum termination (Fig. 5e). This is likely to account for the ability for termination to occur at various positions. Changing the native context immediately upstream of the core-terminator hairpin decreased T_e by 59%. This was despite a change to the “GC” at the end of the U-tract to “UU”. Furthermore, these modifications to the terminator changed the distribution of transcript isoforms significantly, resulting in a single peak of termination (Fig. 5d). This native context variant changes the co-transcriptional structures predicted in the buildup to the point of maximal termination, preventing a hairpin immediately upstream of the U-tract from forming for the first two possible transcript variants (Fig. 5e). These insights into how upstream genetic context influences terminator hairpin structures are potentially important for not only tuning T_e, but also transcript isoform abundances.

T-theta was also tested with a variety of upstream modifiers designed to base-pair with the core-terminator or form secondary structures. Of these, one short hairpin near to the core-terminator (M81) tuned the ratios of transcript variants depending on the upstream genetic context. Co-transcriptional simulations of this valve indicated that a variety of secondary structures can form immediately upstream of the terminating hairpin, which can extend and therefore be influenced by sequences further upstream. The effect of versatile secondary structures upstream of the valve could potentially influence the transcripts produced as well as T_e. The ability for T-theta to form a complex mixture of transcript variants whose ratio can be tuned by upstream sequence could arise from co-evolution of this terminator with the T7 RNA polymerase. This would result in a high capacity for tuning both transcription (via T_e) and mRNA stability (via RNA degradation) following mutation of the core-terminator or upstream sequence.

Finally, strong terminators highlighted by previous studies were also characterized (Fig. 5f). These comprised a set of three strong (in their host) core-terminators from four different bacteria characterized by Lalanne et al.⁴, along with 3 further E. coli terminators with long U-tracts⁷. At least one example of a terminator with T_e > 0.5 in vitro was present in the selection for each organism. This selection sought to expand the options for engineering strong T7 RNAP valves. While each of these terminators has evolved to function in different cellular contexts, we found that they behave similarly with T7 RNAP in vitro: termination invariably occurred in a region with multiple U’s in the U-tract downstream of a hairpin. These results highlight that terminators sourced from many organisms can terminate T7 RNAP and provide yet more options for core terminator parts when designing valves.

Controlling expression stoichiometry of a CRISPR guide RNA array

The ability for our valves to control the stoichiometry of transcript isoforms makes them ideally suited for multiplexed regulation of RNA-based parts. To demonstrate how this might be achieved, we chose to focus on the expression of a CRISPR–Cas9 guide RNA (gRNA) array. While gRNAs have been co-expressed as arrays^{62,63,64,65,66}, few efforts have been made to rationally regulate the relative levels of gRNAs within an array. This could be important for implementing complex patterns of gene activation or repression. Promoters of varying strength have been used to achieve a similar goal⁶⁷. However, promoters do not couple gRNA stoichiometries to one another in the same way as can be achieved by using transcriptional valves and are sensitive to noise and genetic context that can affect each promoter independently.

We designed four arrays, each containing the same three gRNAs (complete with handles) separated by two unique valves (Fig. 6a; see “Methods” section). A set of designs were selected from the initial valve library (Fig. 3) to give a range of gRNA expression stoichiometries. We pooled the arrays and used T7 RNAP to transcribe the pool in vitro and then performed dRNA-seq characterization to calculate the ratios of expressed gRNAs from each array. We found that as expected each design produced different stoichiometries of the gRNAs (Fig. 6a). We calculated predicted ratios based on the characterization of the valve library and compared those to the measured ratios from the arrays (Fig. 6b). While valves ranked the same in terms of T_e, the absolute termination observed was significantly lower in the array, with a decrease that correlated with proximity to the promoter. This feature has been previously observed⁶⁸, though to our knowledge the cause is not fully understood. One hypothesis is that proximity to the promoter has been predicted to increase transcriptional read-through of protein “roadblocks” by virtue of an increased force from RNAP traffic, which is cumulative⁶⁹, and a similar effect could be occurring in our case.

**Fig. 6: Using transcriptional valves to regulate an array of CRISPR sgRNAs.**

To test this hypothesis further, we characterized a small library of 40 terminators using varying concentrations of T7 RNAP for the in vitro transcription reactions to vary the RNAP traffic present on the DNA (1× and 0.2× concentrations). We found a strong correlation (R² = 0.93) in the T_e values. However, for some terminators T_e significantly decreased at the higher concentration of T7 RNAP (Fig. 6c). Therefore, the systematic reduction in T_e for the CRISPR gRNA arrays, may be a result of the much closer position of the promoter to the terminator in these constructs or other affects our normal termination assay does not capture. Nevertheless, these results demonstrate the ability to use transcriptional valves as a means of multiplexed regulation of RNA-based parts.

Source link

Vasiprak Blog