Coevolutionary cues help reestablish module-module interactions in hybrid repressors
In this study, we harnessed our DBM-LBM compatibility model to predict mutations that are expected to improve the functionality of hybrid repressors. A compatibility score C(S) for a hybrid repressor was computed using the inter-modular coevolutionary coupling strength parameters, eij (Ai, Aj), inferred from multiple sequence alignment of LacI homologs using global inference of the joint distribution of sequences in the family22 (see Online Methods). For a given hybrid sequence, a mutation at residue i updates all parameters, eij (Ai, Aj), which describe interactions of the mutated residue, i, with all of its coevolving partners, j, resulting in a change in C(S) score (Fig. 1a). Using this computational model, we systematically computed C(S) scores for mutations at LBMs. We did not consider mutations at DBMs because these modules are small (approximately 47 amino acid residues) and many residues are directly involved in DNA-binding and recognition23; mutating a DBM is likely to affect the DNA-binding properties of the protein. We then selected mutants with the best C(S) scores, which represent candidates for improving repressor activities.


a We identify a set of DNA-binding modules (DBMs) and ligand-binding modules (LBMs) interactions based on coevolutionary cues among LacI family members. These interacting residue pairs were used to develop a computational model for computing a compatibility score C(S) that reflects compatibility between a DBM and an LBM. b We then applied this model to improve the repressor activities of a hybrid repressor, LacI-RbsR. The heatmap shows the ΔC(S) scores of all LacI-RbsR candidates with a single mutation at its LBM relative to the score for original repressor (C(S) score: −62.10). Similarly, we calculated the C(S) scores of (c) double mutants and (d) triple mutants and plotted the ΔC(S) scores relative to the best single mutant and best double mutant, respectively.
To test whether our approach can rescue hybrid repressors, we applied it to 8 hybrid repressors. From our previous study9, we generated a total of 35 hybrid repressors; 18 of them are highly efficient and produce a dynamic range of induction over 10-fold; another 8 hybrids are poorly functional, which generate significantly reduced dynamic range of induction (3- to 10-fold); and the last 9 hybrids have no significant activities (induction is less than 3-fold). For the 8 cases with reduced activities, the fact that these hybrids still have some biological activity suggests that the LBM and the DBM can still interact to some extent; however, for those other 9 hybrids with no activities, all module-module interactions could be completely lost. We then selected those 8 repressors with reduced activities because they may represent cases where a majority of essential LBM-DBM interactions are maintained and only a few interacting pairs are disrupted; therefore, only a few mutations may be necessary to fully restore repressor activities. We first analyzed a hybrid repressor, LacI-RbsR, from our previous study, which generates a 5-fold increase of induction in response to its inducer, ribose. This hybrid contains a LacI DBM and an RbsR LBM (all hybrids in this study are named following this pattern). A heatmap is shown in Fig. 1b to illustrate the effect on the compatibility score of all possible single mutations for this hybrid repressor. Among the mutations that improve the compatibility score, the top 5 favorable mutations are K57V, K57A, F75G, N163Q, and K295F. We only considered mutations at the LBM, therefore the mutational effects on compatibility scores are additive since our model only analyzes coevolutionary cues between DBMs and LBMs. Consequently, mutants with two and three of these favorable mutations lead to a further improved compatibility score. Therefore, the top double mutations involve the same residue positions (57,75,163, and 295) as found in the single mutation profile (Fig. 1b). In the triple mutation profile, residue positions 57, 75, 163, and 295 were also involved in the top 2 mutants (Fig. 1c).
After analyzing LacI-RbsR, we applied this approach to study other hybrid repressors with similar performance. We studied a total of 8 hybrids, which originally only generated 3- to 10-fold induction in gene expression: LacI-RbsR, PurR-GalR, CelR-RbsR, RbsR-GalR, RbsR-LacI, XltR-GalR, MalR-LacI, and XltR-ScrR (Supplementary Fig. 1). We then predicted triple mutations for each hybrid repressor that possess the best C(S) score.
Designing hybrid repressors to improve allosteric regulation activities
We then tested whether our coevolutionary modeling approach is sufficient to rationally improve the performance of hybrid repressors. For all eight hybrid repressors mentioned above, two triple mutation candidates with top C(S) scores were selected for experimental characterization. In all eight cases, the two 3-mutation candidates have two shared mutations; within the three mutations of each candidate, we considered all possible combinations with single, double, and triple mutants, which led to a set of 11 mutants for each hybrid repressor (Source Data file). The additive effect of multiple mutations on the C(S) score ensured the single/double mutations were also among the top candidates in their respective cohort (Figs. 1b to 1d).
Each mutant was characterized with our in vivo transcriptional assay using Escherichia coli cells—the repressor was constitutively expressed in cells to repress the expression of GFP. Activities of allosteric response and transcriptional regulation were assessed by comparing GFP levels in cells that were exposed and unexposed to the corresponding inducer of the target repressor. Characterization data from all 88 mutants from the 8 hybrid repressors are listed in the Source Data file. Our predicted mutations significantly improved the activities of four hybrid repressors, such that they became capable to generate GFP signal inductions that were above 10 fold. These modified repressors include LacI-RbsR (Fig. 2a and Supplementary Fig. 2a), PurR-GalR (Fig. 2b and Supplementary Fig. 2b), CelR-RbsR (Fig. 2c and Supplementary Fig. 2c), and RbsR-GalR (Fig. 2d and Supplementary Fig. 2d). Among mutants with the best performance for each rescued hybrid, there are mutations at three homologous positions (Supplementary Fig. 3), in which some of them are far from the DBM-LBM interface, suggesting that residues at these positions are not directly involved in module-module interactions but they may play key roles in modulating protein confirmation at that interface for facilitating repressor function. These results also reveal the power of coevolutionary coupling analysis in discovering intra-protein interactions.


We assessed the ability of hybrid repressors in gene expression regulation and allosteric response, by using a transcriptional reporter assay in a strain of E. coli with GFP as a reporter of transcription activities. Four hybrid repressors were improved, including (a) LacI-RbsR, (b) PurR-GalR, (c) CelR-RbsR, and (d) RbsR-GalR. As shown in the table in each panel, our compatibility model predicts that each mutant gains improved performance, based on the compatibility scores C(S). Indeed, these mutants showed increases in a dynamic range of gene expression in response to their inducers. GFP fluorescence in cells measured from three biological replicates are illustrated with markers that are gray (uninduced) and red (induced). The inducer used is shown inside a bracket in each table. The blue number above each plot represents the corresponding fold-change of GFP induction. The mean ± S.D. of the three biological replicates is also shown in each plot. Source data are provided as a Source Data file.
For the original version of these four hybrid repressors, the poor dynamic range of induction can be due to defects in different protein properties—the original LacI-RbsR and CelR-RbsR exhibited high uninduced expression level which indicates weak DNA-binding; in contrast, the original PurR-GalR and RbsR-GalR generated low basal expression but repression was not fully released upon induction, suggesting allosteric properties of these repressors were reduced. Intriguingly, we successfully used our model to predict mutations that restore different functions among these repressors. K57 in LacI-RbsR and K60 in CelR-RbsR are homologous positions located at the hinge helix motif (Supplementary Fig. 3) and directly contacting the backbone of DNA but not the nucleobases; it is proposed that the hinge helix is involved in facilitating DNA-protein binding but not recognizing the operator sequence8. Thus, our results suggest that this position plays a key role in interacting with specific groups of the DNA backbone, such that the DNA and LBM reach a desirable orientation for forming a complex. For the other two rescued repressors, A85/A123 in PurR-GalR and A85/A123 in RbsR-GalR are distal to DNA and more likely to be involved in a role at inducing allosteric response only24,25. These results strongly imply that disruption of different residue pairs for DBM-LBM interactions can have a specific influence on DNA-binding and allosteric response.
On the other hand, we observed that hybrid repressors with similar functional defects can be caused by the disruption of different interacting pairs. For PurR-GalR and RbsR-GalR, the two hybrids are structurally similar as they both contain a GalR LBM. The original version of these two repressors also performed similarly, in which both bound tightly to DNA but did not release efficiently from the promoter upon induction. We first hypothesized that these two hybrid repressors had lost a homologous module-module interacting pair. However, our experimental characterization shows that different mutations are required for rescuing the two repressors. PurR-GalR needed three mutations to reach its best performance (245-fold induction), including A55V, A85C, and A123C. Using only the mutants A85C and A123C, there was only 12-fold induction in expression (Source Data file), suggesting that A55V is critical. However, RbsR-GalR gained an improvement of induction fold-change to 69 fold with only A85C and A123C, which are homologous to PurR-GalR’s A85C and A123C, respectively. These results suggest that there are no universal sites that are able to rescuing repressors; they form a complex network of interactions that can only be revealed with a global model and metric like the one introduced here.
Folding and structural constraints on highly compatible mutants
While we rescued four hybrid repressors, the performance of another four repressors were not improved based on our model predictions. Moreover, some repressors’ activities were enhanced with one or two mutations, but not with triple mutations, even though it was inferred that the triple mutant versions would have more favorable compatibility scores (Source Data file). For instance, the original RbsR-GalR generated a 3-fold induction in GFP fluorescence in response to the inducer, galactose; a single mutation A85C or two mutations, A85C and A123C, enhanced the induction to 43-fold and 69-fold, respectively; from our coevolutionary model, the compatibility scores are −69.08 for RbsR-GalR A85C and −71.93 for RbsR-GalR A85C/A123C. A third mutation on RbsR-GalR (G67T or H122M) was expected to further enhance the compatibility between the DBM of RbsR and LBM of GalR; however, the resulting mutants, G67T/A85C/H122M and A85C/H122M/A123C, were inactive, generating induced fold-change of 1.2 and 1.5, respectively. On the other hand, K57A in LacI-RbsR improved the induced fold-change level to 260-fold from an original 6-fold induction, while a double mutation containing K57A and N163Q, and a triple mutation of K57A/F75G/K295F resulted in relatively lower induction levels, 170-fold and 147-fold, respectively. These results suggest that residues at some positions may have multiple molecular roles; while our model identified that they are involved in DBM-LBM interactions, they may be critical for maintaining structure and function within the LBM, in which mutating these residues leads to a loss of protein function. Our original compatibility model only took into consideration of coevolutionary cues between inter-domain residue pairs and did not examine residue pairs within each module. Therefore, mutants designed by this model did not evaluate the structure and function of resulting LBMs.
In order to improve our model for the design of hybrid repressors with high induction, it is necessary to further understand molecular interactions within LBMs. For this purpose, we introduced an additional metric into our model, which is based on residue proximity and coevolutionary traits within LBMs. A structure-based score, SF(S), was computed by combining coevolutionary strength between residues within the LBM, with residue-residue distance below 10 Å in a LacI X-ray crystal structure (see Online Methods). Similar to the C(S) score, a mutant with increased (more positive) SF(S) is considered as less structurally stable and it may not maintain its protein function.
We then tested whether SF(S) can serve as a selection tool to eliminate mutations that lead to a loss of repressor activities. Among all mutants of LacI-RbsR (Fig. 3a), only the K57A mutant has a more favorable SF(S) score compared to its original repressor and indeed, it had the best performance (245-fold induction). Two additional LacI-RbsR mutants are also significantly improved, including K57A/N163Q (170-fold) and K57A/F75G/K295F (147-fold), and their SF(S) scores rank number 2 and number 4, respectively.


The structure-based scores, SF(S) are plotted against the compatibility scores, C(S) from experimentally tested mutant candidates of each hybrid repressors, including (a) LacI-RbsR (b) PurR-GalR, (c) CelR-RbsR,(d) RbsR-GalR, (e) RbsR-LacI, (f) XltR-GalR, (g) MalR-LacI and (h) XltR-ScrR. In these plots, mutants in the gray region have a more favorable SF(S) score compared to their original repressor. Colorbar indicates the ratio of GFP fluorescence induction for each mutant relative to the original repressor hybrid (marker filled with white), with red and blue to represent the increase and decrease of GFP fluorescence, respectively. For repressors with improved mutants (fold-change of the mutant is >10 and >3 times of the original hybrid repressor), the mutant with the largest dynamic range of induction is labeled in red. For repressors without improved mutants, the mutants are labeled in black if their SF(S) score is more favorable than their original repressor, which represents exceptional cases that do not follow our model prediction. Source data are provided as a Source Data file.
Similarly, for PurR-GalR (Fig. 3b), CelR-RbsR (Fig. 3c), and RbsR-GalR (Fig. 3d), the mutant with the best performance has an SF(S) score better than its original protein (Source Data file). In total, 11 mutants from these four hybrid repressors have an SF(S) score better than the original, and 10 of them have fold induction improved to above 10. In contrast, among those other four hybrid repressors that have not been rescued, including, RbsR-LacI (Fig. 3e), XltR-GalR (Fig. 3f), MalR-LacI (Fig. 3g), and XltR-ScrR (Fig. 3h), a majority of their mutants have an SF(S) score worse than their original repressor, indicating that these mutants may not be functional due to possible negative structural effects within the LBM. The few exceptions include MalR-LacI S195I, XltR-GalR A301G, and XltR-GalR E226L/A301G, which have improved SF(S) scores but they remained poorly functional; based upon the crystal structure of LacI24, the homologous residues at these positions (S195 of MalR-LacI and A301 of XltR-GalR) are at close proximity to the ligand-binding pocket and mutating them may directly interrupt ligand-binding (Supplementary Fig. 4), which provides a plausible explanation on the poor functionality of these three mutants. To enhance the robustness of our computational tool, we could prohibit mutations in the ligand-binding pocket. In our current model, we do not consider any mutations in the DNA-binding module because many residues there are highly involved in interacting with the DNA operator. Similarly, we could eliminate all mutations that are interacting with the ligand directly. Overall, these results strongly support that the SF(S) score reliably indicates whether an LBM mutation is expected to negatively affect repressor activity.
After evaluating the capability of our SF(S) model for predicting mutant performance, we also used this model and a native LacI crystal structure to understand how some mutations may disturb intra-module interactions. Taking the case of RbsR-LacI as an example, all four mutations do not improve the repressor function and all of them lead to less favorable SF(S) scores (Fig. 3e and Source Data file). We then identified interacting partners of these four mutations from our model, which are major contributors to the change of the SF(S) score (Supplementary Fig. 5). By studying these identified residues, we gained insights into how these mutations may affect protein function. For mutation V148W in RbsR-LacI (Supplementary Fig. 5a), valine 148 is surrounded by hydrophobic residues, such as A131 and L194; mutating V148 to a tryptophan may interrupt the hydrophobic interactions between these positions. Additionally, from the crystal structure, several polar residues are at close proximity to V148 but they face the opposite direction, including D127, S191, and S189; the mutation V148W may lead to new hydrogen bonds between the tryptophan and these polar residues, which can trigger significant change in protein confirmation. For mutation N123M (Supplementary Fig. 5b), N123 is likely to form a strong hydrogen bond with S67, which can be disrupted when mutated to a methionine. For A85C (Supplementary Fig. 5c), A85 is located at an α-helix and it interacts with residues at a neighboring β-sheet, including V92 and L60. Changing A85 to a cysteine may affect the hydrophobic interface, which is likely to destabilize the β-sheet. Finally, mutation G56V (Supplementary Fig. 5d) is located at the hinge helix, which plays an essential role in transmitting allosteric signals for controlling DNA-binding affinity. G56 interacts with two other residues on the hinge helix, R49, and A51, in which G56V may destabilize the hinge helix and affect its functionality. In addition to protein design, these examples show how we may also use the SF(S) model to facilitate the study of protein mechanisms at a molecular level.

