Designing transcriptional valves to control transcript isoforms
To demonstrate how transcriptional valves might be built, we attempted to construct proof-of-concept designs for T7 RNAP. T7 RNAP was selected due to its broad use in synthetic biology, which stems from the fact that it is a single-subunit RNAP with high processivity, making it ideal for both in vitro use25 as well as an orthogonal transcription system in vivo26,27. While diverse terminators are available for the native E. coli RNAP7,16, for T7 RNAP only a single terminator exists in the T7 phage genome28 and only a few alternatives have been characterized22,29,30. Furthermore, while termination of RNAP in model microorganisms like E. coli and S. cerevisiae has been extensively studied31, T7 RNAP termination has many unknowns such as alternative intrinsic terminators beyond those in the T7 phage genome and the bidirectionality of termination.
Specific features of a terminator, such as its hairpin structure and U-tract, can strongly influence termination efficiency7,32. Therefore, as a basis for an initial library of transcriptional valves, we chose 13 different intrinsic terminators to act as “core terminator” elements (T). We began by selecting the single terminator from the T7 phage genome (T27)28, which has previously been characterized in vitro22. To test its possible bidirectionality, a feature that terminators for other RNAPs have been shown to exhibit7, it was included in a reverse orientation in our designs22. Beyond the native T7 phage terminator, E. coli terminators present another source of these parts and have been shown to terminate T7 RNAP in vitro22,33. Therefore, 11 intrinsic rho-independent terminators were selected from the E. coli genome spanning a wide range of termination efficiencies in vivo7. Finally, a negative control terminator (T33) was designed. This consisted of a random non-coding sequence generated by R2oDNA designer34 and was further verified to not contain a strong hairpin in the mRNA secondary structure using the Vienna RNAfold tool35.
The genetic sequence immediately upstream of a terminator-hairpin also influences termination16,17 and we reasoned that this region could be used to fine-tune the termination efficiency of a valve. We, therefore, included a “modifier” part (M) in our transcriptional valve design and developed 13 different modifier sequences (Fig. 1b). Our first set of modifiers contained motifs designed to interact with canonical regions of a terminator hairpin sequence. Specifically, modifiers M10 and M11 were designed to interact with possible U- and A-tracts within a terminator by containing complementary homopolymers of adenine or uracil, respectively7. A further modifier M13 was designed to encode a small RNA secondary structure with the goal of affecting RNA structure formation near the terminator part. Beyond tuning termination efficiency with RNA interactions and structures, it has been shown that inert random sequences can play an insulating role, improving the robustness of a genetic part’s performance when used in different genetic contexts17,36. It is also known that the upstream genetic context of intrinsic terminators influences termination in a distant-dependent manner16,17. Thus, we decided to include a selection of modifiers of different lengths (M13–M16: 15 bp, M17–M19: 30 bp, and M20–M22: 45 bp) where each was a random non-coding sequence generated by R2oDNA designer34.
To assess the robustness of each valves’ termination efficiency to local upstream genetic context, our library also included “spacer” elements (S) (Fig. 1b). These did not form part of the transcriptional valve, but instead allowed us to see how a particular valve might behave when used in combination with other components (e.g., coding regions). Using the NullSeq tool37, we generated 7 random and genetically diverse 33 bp long spacers with a nucleotide composition similar to coding regions of E. coli that could be placed at the 5′ end of a valve. Each spacer had a stop codon “TAA” at its 3′-end, though this was not utilized in our in vitro transcription assay. Taken together, our spacers, modifiers, and core terminators could be combinatorially assembled to create a total of 1183 unique designs able to regulate RNAP flux and provide valuable information regarding the design principles of transcriptional valves.
Combinatorial assembly of a transcriptional valve library
A one-pot pooled combinatorial DNA assembly method was used to physically construct the final set of transcriptional valve designs (Fig. 1c, see “Methods” section)38. E. coli cells were transformed with this pooled library and ~500,000 colonies (>400-fold library coverage) were selected from plates via scraping before their pooled DNA was extracted. Such a high fold-coverage ensured representation of each design in the sample (Supplementary Note 1)39.
Nanopore-based long-read DNA sequencing (DNA-seq) was then used to verify the successful combinatorial assembly of every design. This showed that all designs were present with an even distribution of parts but an uneven distribution of designs (Supplementary Fig. 1). Part frequencies matched the ratios expected from the equiprobable assembly, except for short parts 15 bp long which were under-represented. This may be due to reduced assembly efficiency for shorter parts. These part frequencies were used to predict design frequencies, of which 91 had >20% absolute deviation between predicted and measured frequency. Certain parts were overrepresented in these designs (M14: 3-fold and T18: 2.3-fold) indicating that the abundance of constituent parts did not solely dictate design abundance within the library.
Finally, we investigated DNA assembly fidelity by generating accurate consensus sequences from the long-read DNA-seq data (see “Methods” section). Comparing the reference and consensus sequence for each design we found a mean of 0.6 single nucleotide polymorphisms (SNPs) per design, with 40% of designs having no SNPs (Supplementary Fig. 1).
Pooled characterization using direct RNA sequencing
Existing fluorescence7 and sequencing-based18,19,40 methods for measuring transcriptional termination are ill-suited to characterizing a large pooled library of genetic parts at nucleotide resolution. Therefore, we used nanopore-based direct RNA sequencing (dRNA-seq) to provide full-length reads of transcript sequences41. Crucially, each transcript isoform encodes its associated transcriptional value sequence either at the 3′-end, if termination was successful, or within the body of the transcript, if transcriptional read-through had occurred—the design’s sequence, therefore, acts as an ‘intrinsic barcode’ that is present in every read. This allowed for individual reads to be attributed to a particular design without the need to separate and barcode them before preparing the sequencing library.
Such an approach is not possible when using more common short-read RNA-seq because the transcriptional valve sequence for reads generated by read-through events will not always be located near the transcript end, and so would not be captured by the short read length if sequenced using normal approaches. Targeted short read sequencing could potentially be used to overcome this issue but would suffer from biases present during the reverse transcription (RT) step and subsequent PCR amplification42,43,44. Long-read dRNA-seq allows for the whole library to be directly assayed as a single pooled sample, without the need for RT, PCR or any assembly, and the data demultiplexed to simultaneously produce separate read depth profiles for each design. Finally, by comparing the ratio of transcript isoforms for each design (i.e., read depth directly before and after the transcriptional valve) a termination efficiency can be calculated (Fig. 1c).
While this approach removes the need to separate and attach unique ‘barcode’ sequences to each design when characterizing the library, it does rely on each transcriptional valve having a sufficiently different sequence for each read to be accurately mapped to a single design. Read accuracy for nanopore-based dRNA-seq is, at present, lower than for standard Illumina-based short-read RNA-seq (median read accuracy of 80–90% versus >99.9%, respectively45) and although this gap is closing with improvements to basecallers and sequencing chemistries, analysis pipelines need to be carefully tuned and validated to ensure accurate demultiplexing of reads when dRNA-seq is used in this way.
Optimizing the computational analysis pipeline
To optimize the analysis pipeline and ensure that our library could be accurately characterized, we developed a simple computational model to simulate the error-ridden reads that would be obtained after dRNA-seq. In our simulations, errors took the form of random nucleotide substitutions that occurred at a 15% substitution frequency. While other types of error such as insertions, deletions and elevated error rates at homopolymers were not included in our model, we found that our simulations were able to identify key parameters and criteria for designing parts with a sufficient dissimilarity for effective demultiplexing.
To demultiplex the dRNA-seq reads, the BLASTN tool was used to find all possible alignments between a read and the library of designs, with the best matching design being chosen46,47. Optimizing the BLASTN parameters is crucial for accurate characterization and so computational analyses were performed where a smaller library (540 designs, Supplementary Fig. 2) was used to systematically explore the role of each BLASTN parameter. Each design was given a random termination efficiency and had a set of full-length terminated and non-terminated reads generated based on our model. These reads were then pooled for all the designs and attempts made to demultiplex and infer the original termination efficiencies for each design. This allowed us to generate an optimized set of parameters that allows each design to be accurately identified (see “Methods” section).
The final computational demultiplexing and analysis pipeline (Fig. 1d) involved aligning the sequences of all designs against all reads using BLASTN with optimized parameters and then associating each read with the design that had the best alignment score (see “Methods” section). Reads for each design were then mapped to the appropriate reference sequence and design-specific read depth profiles generated, filtering out any reads where no terminator was present after alignment and mapping. Finally, termination efficiencies were calculated for each read depth profile as Te = [R(xs) − R(xe)]/R(xs), where R(x) is the read depth at position x in the genetic construct, and xs and xe are the start and end nucleotide position of the transcriptional valve, respectively.
Characterizing transcription termination at nucleotide resolution
In vitro transcription using T7 RNA polymerase of the entire pool of DNA constructs followed by dRNA-seq enabled us to rapidly assay the performance of each design simultaneously. To ensure the accuracy of our measurements, a detailed analysis of the generated read depth profiles was performed, which revealed several key features in line with other dRNA-seq studies (Supplementary Note 2)48. We also developed a mathematical model that allowed us to correct for unwanted deviations between actual and measured Te (Supplementary Note 3; Supplementary Figs. 3–6). We found that transcript abundances were weakly correlated with DNA construct frequencies (R2 = 0.22), with strong terminators over-represented in the dRNA-seq data. A good reproducibility was observed for Te values between replicates (R2 = 0.99) and for terminators shared across separately assembled libraries with different part compositions (Supplementary Fig. 7). This resulted in 98% of designs having a difference of <5% in Te across the experimental replicates of our initial library.
A valuable feature of part characterization by RNA-seq is the ability to extract nucleotide resolution insights from the read depth profiles. To enable comparisons between our designs where total numbers of reads for each varied, we generated profiles normalized by the read depth at the start of the transcriptional valve such that drops due to termination corresponded to a fractional change (Fig. 2a). We also calculated Δ-values corresponding to the change in normalized read depth at each nucleotide position with respect to the previous nucleotide, enabling us to pinpoint and compare changes more easily. We found that the maximum Δ-value for each design was proportional to its Te and amounted to approximately 40% of the total Te value. Each terminator maintained a predominant termination pattern (as shown by its Δ-profile), which varied in amplitude depending on the upstream modifier and spacer (Fig. 2a). The ability to observe these nucleotide resolution changes demonstrates a further benefit of the pooled dRNA-seq over more commonly used methods based on fluorescent reporter proteins.


a Normalized dRNA-seq read depth profiles for functioning terminators and non-terminator control (T33). Each line corresponds to a design. Dotted lines denote the start and end of the core terminator. Red triangle indicates the final nucleotide of the dominant point(s) of termination. The gray-shaded region is expanded in the lower panel showing the change in normalized read depth at each nucleotide position with respect to the previous nucleotide (Δ). b Termination locations for functioning terminators. Many terminators terminate at more than one position and these points are indicated with shading. The most common point of termination is colored red. c Median termination efficiency (Te) for designs containing each core terminator compared to the number of U residues in the U-tract, which is the 8 nt sequence upstream of the point of termination. R2 is the square of the Pearson correlation coefficient. d Secondary structure predictions of transcripts using a co-transcriptional folding simulation at the measured point of termination for each terminator (for inactive terminators, the sequence downstream of the hairpin with a maximal number of U residues was used). Two structures are shown for T20; T20-A is formed first, and T20-B lies over a large energy threshold. For weak terminators, the structure prediction for both the shorter (S) and the longer (L) transcripts produced are shown. e Co-transcriptional RNA secondary structures predicted for T14 as it is transcribed, with dominant points of termination indicated. Source data are provided as a Source Data file.
The termination pattern is an important phenotype and we found that termination does not occur at a single nucleotide location; for each of our core terminators, it occurred over several nucleotides. While the Te of a terminator did vary across genetic contexts, in general, the termination pattern remained consistent. These patterns revealed that termination often fluctuates nucleotide by nucleotide, resulting in multiple drops in the Δ-profiles and therefore multiple transcript isoforms.
As expected, drops in read depth for each valve occurred within the corresponding U-tract (Fig. 2b). We found that termination was possible with as few as 2 U’s, but that maximum drops in the profiles occurred at a similar point (after 4 or 5 U’s) for the stronger core-terminators. This likely captures a position where a combination of optimal T7 RNAP pausing and weakened stability of the transcription elongation complex leads to the formation of the core terminator hairpin sufficient to effectively facilitate termination. The number of U’s in the U-tract at the point of maximal termination showed a correlation with Te, although there were some outliers (T20, T14; Fig. 2c). This matches a previous finding for E. coli RNAP termination7. However, termination was found to always reach a peak with a U-tract that comprises fewer than the maximum possible number of U’s in the U-tract.
Using this data, it was possible to predict RNA secondary structures at the various points of termination and assess their potential influence. To do this we simulated co-transcriptional folding49 of the terminated sequence at the point of maximal termination (Fig. 2d), assuming a previously reported transcription rate of 333 nt/s50 (see “Methods” section). We removed the final 8 nt, which have been shown to base-pair with the DNA template in the T7 RNAP transcription elongation complex51, and studied the structure with the lowest folding energy. The strongest terminators were predicted to form terminating hairpins at their 3′-end, proximal to the elongating T7 RNAP. On the contrary, T20 was found to get locked in a secondary structure involving a hairpin ending 16 nt upstream of the T7 RNAP, meaning that it remained a weak terminator despite having a long U-tract. To reach an effective terminating hairpin T20 would have to surpass a large energy barrier.
There were three inactive terminators (T17, T21, T27) and the negative control (T33). Each of these could have a maximum of 4 U’s in the U-tract and none were predicted to form a hairpin proximal to the U-tract, which is likely the cause of their inactivity. This showed that T27, the reverse oriented phage terminator, was not able to efficiently terminate T7 RNAP bidirectionally. The three weakest active terminators (T14, T18, and T15) all had only 2 U’s in their U-tract of their most common transcript isoform, saw termination at multiple separated points, and were predicted to form a hairpin proximal to the U-tract (Fig. 2d). It is not clear why T14 gives stronger termination than T15 and T18 though one element that might increase Te is the unique double hairpin structure that can form in the core terminator at the later points of termination. The second peak in termination for T14 coincides with the last point at which this double hairpin is predicted to exist (Fig. 2e). T15 and T18 on the other hand are predicted to form a single long hairpin structure.
An ability to engineer the precise point(s) of termination and therefore dominant 3′-UTR sequences may be important in deciding gene stoichiometries in vivo as it could potentially affect mRNA degradation rates, however, this is contested52,53. To assess this, we reviewed the possible transcripts produced by the characterized terminators and this showed that the final nucleotide of transcript variants can be either A, C, U or G. Frequently the dominant transcript terminated within a stretch of U’s though less frequently observed transcripts were found to terminate immediately after a stretch of U’s. Further investigation revealed that in some cases, upstream sequence could tune the major point of termination. For example, modifiers M10, M11, M12, and M15 showed different stoichiometries of two types of transcripts produced by T13.
We also undertook an analysis of other general biophysical parameters that may play a role in termination (e.g., GC content and minimum free folding energy). However, none of these were correlated with measured Te (Supplementary Fig. 8).
General termination properties of the initial valve library
Overall, Te varied from 0 to 0.94 across the library with core terminators displaying varying levels of Te and sensitivity to different modifier and spacer parts (Fig. 3). Grouping designs by their core terminator showed that each had a unique median Te and variability that differed between terminators (Fig. 3b). We found that valves containing the non-terminator part (T33), reverse oriented T7 phage terminator (T27) and two of the E. coli terminators (T21 and T17) showed little to no termination (Te < 0.05). The remaining nine E. coli terminators displayed a range of termination efficiencies for T7 RNAP with median Te varying from 0.01 to 0.91, which was heavily influenced by upstream sequence context. The variety of Te values observed would allow for a wide range of transcript isoform stoichiometries to be produced from 1:1 to 11:1. However, we were interested to know if patterns within this data might offer insight into the capacity of each terminator to be tuned or insulated. For example, valves displaying a wide range of Te values for the same core terminator would indicate that the terminator is highly tunable, while a small range of Te for a valve used with differing spacer elements would suggest that it is able to insulate its function from upstream sequence context.


a Structure of the transcriptional valve library. Gray ‘spacer’ elements fused to gfp are 33 bp random sequences used to assess the sensitivity of a transcriptional valve to nearby sequence context (full sequences available in Supplementary Data 1). b Termination efficiency (Te) for designs in the library (Supplementary Data 2). Each point denotes the Te value for a unique genetic construct color-coded by the modifier present. Points are grouped by core terminator with a box plot summarizing the data for all associated constructs. n = 91 for each core terminator, apart from T13 where n = 90 and T18 where n = 89. c Percentage deviation in Te (as a percentage of maximum possible deviation) for each construct from the median of all constructs containing the same core terminator. Each point corresponds to a single construct and points are grouped by modifier. Only data for active terminators (median Te > 0.05; T29, T16, T12, T10, T13, T14, T20, T18, T15) are shown. n = 819 for spacers and n = 63 for all modifiers, apart from M14 where n = 60. d Coefficient of variation (CV) of Te values across spacer variants for active terminators grouped by modifier. n = 9 for each modifier. e Terminator and modifier breakdown of the CV of Te values across spacer variants for active terminators. f Terminator and modifier breakdown of the percentage deviation in Te (as a percentage of maximum possible deviation) for active terminators. Random sequence modifier parts in all plots: (left–right) M10–M22. All box plots show the median as their center value, the bounds of the box are the interquartile range (IQR; 25–75%), and whiskers are from Q1 – 1.5 × IQR to Q3 + 1.5 × IQR, where Q1 and Q3 are the 25% and 75% quartile, respectively. Source data are provided as a Source Data file.
Tuning the strength of transcriptional valves
It is known that local sequence context can also be used to effectively alter the function of many types of genetic part8,54,55,56. We, therefore, designed modifiers in our initial library with the aim of being able to tune the strength of a valve. Analysis of the characterization data showed that changes in upstream genetic context (both spacer and particularly modifier sequences) could significantly influence termination strength, allowing Te to be varied over a range of up to 0.68. The ability to tune each core terminator varied, with the Te of T10 being most tunable and T16 being the least. The capacity to tune terminator strength could arise from the diversity of co-transcriptional structures that form proximal to the U-tract when interacting with the modifier (Fig. 2d). Therefore, to create a library of highly tuned transcriptional valves, it is important to ensure the core terminators are themselves tunable.
Large variability in the magnitude of tuning was seen across the different valves we tested suggesting that sequence-specific features play a key role in modulating the precise termination efficiency. Spacers were found to not have such a systematic effect. Nonetheless, some valves were highly influenced by spacers, emphasizing the importance of upstream sequence in the region up to 120 nt upstream of the point of termination. For designs grouped by spacer, the median percentage deviation from the median Te of the valve they contained was found to be less than 5% (Fig. 3c), suggesting that tuning of termination efficiency is best achieved by varying sequence context close to the core terminator part.
In general, each modifier tuned each terminator in a different way. However, some modifiers were found to have a similar tuning effect across many different core terminators (Fig. 3c). Some had a generally positive influence (e.g., M21) or negative influence (e.g., M20). Furthermore, the U- and A-tract interactors generally exerted opposite tuning effects on stronger terminators and weaker terminators, tuning them up and down in strength, respectively. Therefore, when tuning a T7 RNAP terminator, while bespoke modifiers are likely required, our library offers some starting points for features that are likely to have a desired effect.
Insulating transcriptional valves from local genetic context
It has been shown for many types of genetic part that more reliable performance can be achieved by inserting random non-coding sequencers around a part to insulate its function from potential interactions with other nearby sequences8,17,36. Our library specifically included random non-coding modifiers of varying length to assess the insulating effects for transcriptional valves. In general, we found that an increase in the length of these modifiers led to a reduction in Te variability when an identical valve design was used across numerous genetic contexts (i.e., upstream spacer sequences; Fig. 3d). This matches findings for bacterial promoters and terminators where longer upstream insulating sequences resulted in more predictable gene expression17,36. Notably, these effects were also terminator-specific, with some core terminators showing more predictable behavior across modifiers (e.g., T16) than others (e.g., T10) (Fig. 3f). This suggests that some terminators are better suited to tuning T7 RNAP in vitro, while others are better placed to maintain a consistent termination efficiency.
Exploring modifier-terminator base-pairing
It was evident from this initial library that the general modifiers we had used limited our ability to understand the role of key interactions between the modifier and terminator parts because no terminator specific interactions had been designed. To rectify this, a further library was built to understand the effect of the modifier region upstream of the core-terminator in a more comprehensive way (Fig. 4a). Informed by our findings that longer modifiers were better insulators of Te (Fig. 3d), we designed all-new modifiers as length 45 nt to enhance the robustness of the valves function across different genetic contexts. Co-transcriptional folding simulations had highlighted that the sequences we had designed to interact with the U-tract and A-tract were insufficient. These modifiers seldom formed structures that would influence termination by virtue of base-pairing because the A-tract is often short (<6 consecutive A’s) and at the point of termination, the U-tract is sequestered by the RNAP. Therefore, we designed modifier sequences that would target specific sequences within three strong core terminators (T29, T16, T10).


a Overview of the modifier library based on sequence interactions and structural elements used to explore tuning and insulation of core terminator function. A ‘near’ and ‘far’ modifier variant was designed for each motif (except for the random and structure options). Modifiers were designed to interact with the core terminator via sequence (upper) or structure (lower). For sequence interactions, 8 nt sequences were designed to be complementary to regions of the core terminator covering the loop (purple), 5′-stem (dark blue), 3′-stem (light blue). For structural elements, 3 short hairpins (light green), 3 long hairpins (dark green) and 2 further RNA structures (orange) were designed. b Termination efficiency (Te) for each valve designed to interact via sequence, grouped by modifier. Bars denote median Te values ± standard deviation (SD) and from left–right n = 6, 4, 2, 4, 5, 4, 3, 5, 4, 4, 3, 3, 6, 2, 7, 4, 3, 3, 4, 4 and 4. c Te for each designed structural element. Bars denote median Te values ± SD and from left–right n = 9, 10, 13, 12, 10, 17, 6, 14, 11, 9, 12, 14, 3, 8, 11, 8, 7 and 17. d Coefficient of variation of Te values for the structural elements. CV is calculated across spacers for each terminator and grouped by structural element. Bars denote mean CV values ± SD and from left–right n = 3, 3, 3, 3, 3 and 2. Source data are provided as a Source Data file.
Motifs containing an 8 nt reverse complement sequence of three different regions of the core terminators were designed into modifiers. Gaps were filled with non-structural RNA sequences (see “Methods” section). While ideally we would have used identical padding sequences, we instead chose padding sequences with identical RNA secondary structure (i.e., no predicted structure) as these unique sequences would help ensure accurate read demultiplexing after nanopore sequencing. These motifs targeted the 5′-stem, loop and 3′-stem regions of the terminator hairpin (Fig. 4a). Two variants of each motif were designed to explore the distance dependence of the engineered motifs: “near” which was incorporated into the modifier at its 3′-end (~30 nt from the U-tract), and “far” which was incorporated at the 5′-end of the modifier (~70 nt from the U-tract).
Characterization of this library revealed that base-pairing can significantly reduce termination efficiency in a distance-dependent manner (Fig. 4b). The motifs designed to base-pair with the loop consistently caused a reduction (13–16%) in Te. Loop-modifier interaction during termination could alter the core terminator hairpin during its formation, at the point of termination, or both, affecting the probability of a termination event. The largest effect on Te was caused by motifs designed to interact with the stem of the terminator. We hypothesize that this is because core terminator stem base-pairing is essential for hairpin formation whereas the loop can base-pair with an interacting motif at the same time as the completing hairpin. Therefore, a motif that can base-pair with the stem could outcompete the core terminator hairpin.
For the two strongest terminators (T29, T16) the 5′-stem interactor caused a large drop in Te, while the 3′-stem interactor did not. In the case of T16 this is likely because 5 of the 8 targeted nucleotides are predicted to be concealed within the T7 RNAP (2 nt for T29 and 3 nt for T10), where they cannot base-pair at the point of termination since they are in the U-tract. This effect on Te was greater than any other drop caused by a modifier tested so far. As with previous modifiers, T10 behaved differently to T16 and T29. The 3′-stem interactor had a large effect on termination, while the 5′-stem interactor did not. This could be a consequence of the extra native sequence context between the hairpin and the motifs (5 nt and 3 nt more than T29 and T16, respectively) resulting in a location in which the motif can base-pair with the 3′-stem. However, it may also arise from the inherent tunability of T10. In either case, the effect on Te of motifs that base-pair with terminators diminishes when they are placed far from the terminator.
Exploring structural interactions
It has been shown that structure can stabilize the activity of genetic parts1,57. Therefore, we looked to further investigate whether RNA structure could insulate terminator function by designing and testing a library of modifiers containing secondary structure motifs (Fig. 4a). To explore the effect of upstream structure on termination, we designed three short (stem length 3 nt) and three long (stem length 6 nt) hairpins. The sets of short and long hairpins contained one of three loops (UUCG, GAAA, GAGA) known to facilitate strong hairpin formation58. Furthermore, modifiers were designed with all of these secondary structures at both the 5′-end and 3′-end to test the distance-dependence of secondary structure influence on termination efficiency. Again, gaps were filled with non-structural RNA sequences (see “Methods” section). A variety of other more complex RNA secondary structures are known and one modifier containing an “elbow” (the TAR element) and one containing a pseudoknot were also designed to see any role these might play59.
After assembling and characterizing this new library, we were able to confirm that RNA secondary structure upstream of terminators affected the robustness of terminator function with T7 RNAP (Fig. 4c), similar to that reported previously for E. coli RNAP60. We found that short hairpin structures and complex RNA structures were the best insulators of terminator function (Fig. 4d), while long hairpin structures made termination efficiency more sensitive to upstream genetic context. The rigid requirements of short hairpin formation mean that they are likely rarely influenced by base-pairing with upstream structure. This would explain why they are good insulators since they offer dependable upstream secondary structure that does not base-pair or interact structurally with the terminator hairpin. In contrast, since long hairpins can form with a variety of stem lengths they could influence and be influenced by neighboring sequences. The resultant diversity of secondary structures that can then arise upstream of the terminator hairpin would mean that these modifiers significantly affect Te and therefore act as poor insulators, as seen in our results.
Since we only tested stronger terminators, conclusions cannot be drawn on the capacity of base-pairing and structural sequence motifs to increase Te. Nonetheless, these results revealed motifs that could alter the Te or robustness of terminator function and therefore should be considered when designing genetic circuits that involve uncharacterized gene-terminator combinations.
Understanding core terminator design principles
Our results had shown that many of the relationships observed were terminator dependent, and so a final library was designed and tested to investigate variations of the core terminator part that had the greatest influence on Te (Fig. 5a). We constructed a library of designs including U-tract variants, native context variants, and terminators from diverse organisms. This set of core-terminators was assembled and tested in just one upstream genetic context (the gfp gene) and so our analysis is focused on understanding the influence of sequence context proximal to the core-terminator hairpin.


a Overview of modifications made to core terminators, tested without spacers or modifiers: (left to right) U-tract enrichment to include up to 8 consecutive U residues down-stream of the core terminator hairpin; Context switch with the 19–25 nt immediately upstream of the terminator hairpin changed to a random sequence; U-tract switch to consecutive A, C, or G residues; Other source organisms used to find diverse terminator sequences. b Effect of U-tract enrichment on most common termination position and change in termination efficiency (Te). Substitution of nucleotides for U are indicated with gray shading. Termination position before the substitution is indicated in bold red text. Termination position after substitution is indicated with red dot. c Change in Te for U-tract enrichment (U Enrich, dark gray), context switch (Cxt, gray) and U-tract switches (A, C, G, light gray). d Normalized dRNA-seq read depth profiles for the T7 phage T-theta terminator (T99, gray solid line) and a variant (T99U, black dashed line). Dotted lines denote the start and end of the core terminator. The gray-shaded region is expanded in the lower panel showing the change in normalized read depth at each nucleotide position with respect to the previous nucleotide (Δ). e Secondary structures predicted by co-transcriptional folding simulation of terminator formation (left to right) for T99U and T99. Positions where measured termination occurs are indicated with red triangles. f Measured Te of T7 RNAP for terminators sourced from diverse organisms. Source data are provided as a Source Data file.
Analysis showed that the Te of weak core-terminators used in our initial library could be increased by increasing the number of U’s in the U-tract (Fig. 5b). Our initial results indicated that a U-tract of at least 5 U’s consistently resulted in termination (Fig. 2c). Therefore, we re-engineered weaker core-terminators so that they contained a U-tract of length 8 nt. This increased Te to varying extents (Fig. 5b). Despite each of these newly designed terminators having the possibility to terminate at a U-tract complete with 8 U’s, the dominant transcript isoform had only 4 or 5 U’s in each case. Nonetheless, the increase in Te showed some correlation with the number of extra U’s in the dominant U-tract (Supplementary Fig. 9). For active terminators, the largest increases in termination occurred when additional U’s increased the number of U’s in the dominant transcript isoform to 4 or 5 (T14, T15, T18). Finally, in two cases (T17 and T21), termination of inactive terminators was found to be rescued.
The location of termination and thus the specific transcript produced consistently changed following these modifications (Fig. 5b). For these designs, transcript isoforms became either 1–2 nt shorter and longer. The location of termination and termination prior to a complete “UUUUUUUU” U-tract may arise since the extra U’s were not added to the dominant transcript variant. Instead, they were added at positions that would give a sequence of 8 U’s with the minimal number of single nucleotide substitutions. Therefore, to generate strong terminators from a template sequence, our results suggest first characterizing the dominant transcript form and then increasing the number of U’s within its U-tract.
To complement the modifier library, the same core terminators (T10, T16, T29) were used as a basis for various U-tracts containing no U’s. Variants of each of these core terminators with an 8 nt tract of A, C, or G were designed. To ensure their transcripts could be distinguished after dRNA-seq we put unique non-structural RNA sequences as barcodes upstream of the core terminators (see “Methods” section). These variants were inspired by data showing that T7 RNAP can slip and terminate at sites of 8 consecutive A’s in vitro59,61. However, we found this not to be the case for T29, T16, T10, or the T7 phage T-theta terminator (Fig. 5c). The poly-C tract showed very weak termination (Te < 0.1). For our previous designs, some native sequence context immediately upstream of the terminator hairpin (and before the modifier) is retained. Changing this decreased Te for all the terminators studied (Fig. 5c), indicating that to maintain Te, there is an optimal position upstream of core terminators to add modifiers.
Characterization of the T7 phage terminator (T-theta) revealed wide diversity in the points of termination (Fig. 5d). We found T-theta to be strong (median Te = 0.82) and tunable. The progression of minimum free energy structures predicted to form as each nucleotide is transcribed suggests that a variety of structures form approaching the point of maximum termination (Fig. 5e). This is likely to account for the ability for termination to occur at various positions. Changing the native context immediately upstream of the core-terminator hairpin decreased Te by 59%. This was despite a change to the “GC” at the end of the U-tract to “UU”. Furthermore, these modifications to the terminator changed the distribution of transcript isoforms significantly, resulting in a single peak of termination (Fig. 5d). This native context variant changes the co-transcriptional structures predicted in the buildup to the point of maximal termination, preventing a hairpin immediately upstream of the U-tract from forming for the first two possible transcript variants (Fig. 5e). These insights into how upstream genetic context influences terminator hairpin structures are potentially important for not only tuning Te, but also transcript isoform abundances.
T-theta was also tested with a variety of upstream modifiers designed to base-pair with the core-terminator or form secondary structures. Of these, one short hairpin near to the core-terminator (M81) tuned the ratios of transcript variants depending on the upstream genetic context. Co-transcriptional simulations of this valve indicated that a variety of secondary structures can form immediately upstream of the terminating hairpin, which can extend and therefore be influenced by sequences further upstream. The effect of versatile secondary structures upstream of the valve could potentially influence the transcripts produced as well as Te. The ability for T-theta to form a complex mixture of transcript variants whose ratio can be tuned by upstream sequence could arise from co-evolution of this terminator with the T7 RNA polymerase. This would result in a high capacity for tuning both transcription (via Te) and mRNA stability (via RNA degradation) following mutation of the core-terminator or upstream sequence.
Finally, strong terminators highlighted by previous studies were also characterized (Fig. 5f). These comprised a set of three strong (in their host) core-terminators from four different bacteria characterized by Lalanne et al.4, along with 3 further E. coli terminators with long U-tracts7. At least one example of a terminator with Te > 0.5 in vitro was present in the selection for each organism. This selection sought to expand the options for engineering strong T7 RNAP valves. While each of these terminators has evolved to function in different cellular contexts, we found that they behave similarly with T7 RNAP in vitro: termination invariably occurred in a region with multiple U’s in the U-tract downstream of a hairpin. These results highlight that terminators sourced from many organisms can terminate T7 RNAP and provide yet more options for core terminator parts when designing valves.
Controlling expression stoichiometry of a CRISPR guide RNA array
The ability for our valves to control the stoichiometry of transcript isoforms makes them ideally suited for multiplexed regulation of RNA-based parts. To demonstrate how this might be achieved, we chose to focus on the expression of a CRISPR–Cas9 guide RNA (gRNA) array. While gRNAs have been co-expressed as arrays62,63,64,65,66, few efforts have been made to rationally regulate the relative levels of gRNAs within an array. This could be important for implementing complex patterns of gene activation or repression. Promoters of varying strength have been used to achieve a similar goal67. However, promoters do not couple gRNA stoichiometries to one another in the same way as can be achieved by using transcriptional valves and are sensitive to noise and genetic context that can affect each promoter independently.
We designed four arrays, each containing the same three gRNAs (complete with handles) separated by two unique valves (Fig. 6a; see “Methods” section). A set of designs were selected from the initial valve library (Fig. 3) to give a range of gRNA expression stoichiometries. We pooled the arrays and used T7 RNAP to transcribe the pool in vitro and then performed dRNA-seq characterization to calculate the ratios of expressed gRNAs from each array. We found that as expected each design produced different stoichiometries of the gRNAs (Fig. 6a). We calculated predicted ratios based on the characterization of the valve library and compared those to the measured ratios from the arrays (Fig. 6b). While valves ranked the same in terms of Te, the absolute termination observed was significantly lower in the array, with a decrease that correlated with proximity to the promoter. This feature has been previously observed68, though to our knowledge the cause is not fully understood. One hypothesis is that proximity to the promoter has been predicted to increase transcriptional read-through of protein “roadblocks” by virtue of an increased force from RNAP traffic, which is cumulative69, and a similar effect could be occurring in our case.


a Normalized dRNA-seq read depth profile for each array. Dotted lines denote the start and end of each valve. General array design is shown at the bottom, with the PT7 promoter followed by 3 guide RNAs (gRNAs) separated by two transcriptional valves (V1, V2). Resulting gRNA stoichiometries are indicated in top left of each profile (gRNA1:gRNA2:gRNA3) and valve designs are shown above their respective parts. b Comparison of predicted (lighter shading) and measured (darker shading) termination efficiency (Te) for each valve in the arrays. Bars are colored by valve position. c Comparison of Te measurements for in vitro transcription reactions performed with varying concentrations of T7 RNA polymerase (RNAP). Each point corresponds to a transcriptional valve. R2 is the square of the Pearson correlation coefficient. Source data are provided as a Source Data file.
To test this hypothesis further, we characterized a small library of 40 terminators using varying concentrations of T7 RNAP for the in vitro transcription reactions to vary the RNAP traffic present on the DNA (1× and 0.2× concentrations). We found a strong correlation (R2 = 0.93) in the Te values. However, for some terminators Te significantly decreased at the higher concentration of T7 RNAP (Fig. 6c). Therefore, the systematic reduction in Te for the CRISPR gRNA arrays, may be a result of the much closer position of the promoter to the terminator in these constructs or other affects our normal termination assay does not capture. Nevertheless, these results demonstrate the ability to use transcriptional valves as a means of multiplexed regulation of RNA-based parts.

