Engineering a monomeric and stable Corynactis californica GFP protein scaffold
Choosing a fluorescent protein
Corynactis californica a bright red colonial anthozoan similar to sea anemones and scleractinia stony corals, expresses several fluorescent proteins in its morphs. Schnitzler et al. identified two red fluorescent proteins24. One displays an as-yet-uncharacterized timer phenotype (slow conversion of chromophore from green to red) that varies according to expression conditions. The second red fluorescent protein has very poor fluorescence quantum yield. There are also a yellow and an orange fluorescent protein. Morphs of C. californica express at least two green fluorescent proteins. One is partially folded when expressed in E. coli, while the other is mostly misfolded and non-fluorescent. We chose to pursue the C. californica ccGFPs for three reasons. First, the multimeric red proteins appeared to have disadvantages (see above) and the yellow and orange fluorescent proteins were poorly characterized. Second, previous work by Tsien25 and others showed that considerable engineering may be required to retain red fluorescent phenotypes while re-engineering monomeric mutants. We have already published a superfolder monomeric RFP (sfCherry)20 derived from mCherry which has been used as a split protein19 and which we and others21,26 are engineering to make more efficient. Third, starting from our split GFP, we have engineered a split YFP (T203Y)22, as well as an efficient split CFP (Y66W) that contains several additional obligate folding mutations22. We chose to pursue the insoluble ccGFP variant in particular as a stringent test of our approach for engineering efficient split fluorescent proteins1,3,22 as well as to develop an orthogonal split fluorescent protein system for multiplex labeling in living cells.
Making a monomeric, cysteine-free scaffold
We posited that in order to be useful as a protein tagging and detection system, the split protein should be monomeric and have no cysteines to enable an accurate estimation of target proteins based on fluorescence signals. Predicted monomerizing mutations (V127E, N192R, I194E) were introduced to ccGFP following published protocols and based on structural homology with monomeric Azami Green27 (see “Methods”). The protein still failed to fold and was non-fluorescent when expressed in E. coli. Bright fluorescent colonies were obtained after six rounds of directed evolution using DNA shuffling (see “Methods”), converging on a small number of sequences. The brightest engineered protein retained all six native cysteine residues. Mutating the cysteines (C20S, C71A, C73S, C104S, C153S, C175A) to eliminate unwanted disulfide bond formation in the unfolded protein (or subsequent split protein fragments, below), resulted in misfolding and loss of fluorescence likely due to unforeseen effects on folding intermediates. After three additional rounds of directed evolution and gene shuffling, bright colonies were again obtained. The optimal final version (ccGFP E6) contains 24 mutations compared to the wild type protein (Fig. 1a): L3M, V8L, C21S, K36N, K42Q, E50K, A63P, C71V, C73T, E100D, C104S, G105A, H110R, N121K, V127E, K149T, C153S, H171Q, C105A, D184N, N192R, I194K, A207T, and I210L. Interestingly, an N121K mutation present in the template as the result of a gene synthesis error was retained. The other gene synthesis error H120Q reverted to wildtype H120. None of the amino acids replacing the six cysteines reverted to cysteine, but two had further mutated, A71V and S73T. Gel filtration chromatography confirmed the protein migrated as a monomer at ~ 15 mg/ml (Supplementary Fig. S1). ccGFP E6 was also crystallized as a monomer and its structure was successfully determined by X-ray crystallography (manuscript in preparation). Absorption and emission spectrum showed a strong peak at 501 nm and 520 nm, respectively (Supplementary Fig. S2).


Sequence alignment of Corynactis californica GFPs. (a) Sequences leading up to the well-folded ccGFP E6. Legend: ccGFP wt, starting sequence accession number AAZ14788.1; ccGFP m, monomerizing mutations; ccGFP syn, synthetic sequence containing the monomerizing mutations and additional unexpected mutations H120Q, N121K; ccGFP E6, optimal mutant after six rounds of gene shuffling to improve folding, and three additional rounds of gene shuffling after replacing cysteine residues with alanine or serine (see “Methods”). (b) Split protein fragments for strands 1–10 and S11 used in this study, showing the starting versions derived from ccGFP E6 (see a) and the indicated mutants.
Engineering an efficient split system from the engineered C. californica scaffold
Improving ccGFP 1–10 and eliminating autofluorescence
We followed the same strategy we used to engineer split GFP1. Using homology alignment with the structure (PDB 3ADF)28 of monomeric Azami GFP, the engineered ccGFP E6 protein scaffold was split into two pieces, the large ccGFP 1–10 E6 (amino acids 1–202, MSMSKQVLK•••RHKIEHRLVRS) and the small ccGFP S11 E6 (amino acids 205–221, GDTVQLQEHAVAKYFTV) (see also Fig. 1). Strand ccGFP S11 E6 was solubly expressed as a C-terminal tag on the carrier protein sulfite reductase1. The ccGFP 1–10 E6 protein aggregated when expressed alone in E. coli at either 37 °C or 20 °C from a pET vector, and soluble lysates did not complement with SR-ccGFP S11 E6. Directed evolution of ccGFP 1–10 (see “Methods”) dramatically improved the complementation rate and solubility. Unexpectedly, this version, termed ccGFP 1–10 v1 (Fig. 1), slowly gained fluorescence without the S11 fragment, (at about 1% the rate seen with excess ccGFP S11, Fig. 2a). To reduce the unwanted autofluorescence, after replating the ccGFP 1–10 library from the final round of directed evolution, we aligned images of plates after ccGFP 1–10 expression (to observe ccGFP 1–10 autofluorescence), and after SR-ccGFP S11 expression (to observe full complementation fluorescence). We identified several desirable colonies (8 out of 20,000) with ccGFP 1–10 clones that were faint or non-fluorescent alone, but that became highly fluorescent after SR-ccGFP S11 E6 expression. The best of these was isolated and termed ccGFP 1–10 v2 (Fig. 1). Relative to ccGFP 1–10 v1, ccGFP 1–10 v2 has the additional mutations D78Y, Q85R, and A109V. This variant exhibits no detectable autofluorescence (Fig. 2a).


Complementation and autofluorescence of purified ccGFP 1–10 fragments. (a) Progress curves for in vitro complementation after mixing indicated ccGFP 1–10 variant (800 pmol) with SR-ccGFP S11 v1 (50 pmol) in 200 µl reaction wells (upper traces); development of autofluorescence of each ccGFP 1–10 without added S11 (‘no S11’, lower traces). Due to lack of chromophore residues, S11 fragments are not autofluorescent as expected (not shown). Maximum arbitrary scale signal (~ 0.8) corresponds to 45,000 fluorescence units on BioTek instrument (99,999 full scale). Progress curves were normalized by dividing measured fluorescence by the fluorescence of sfGFP control to compensate for instrument drift and jitter noise (Supplementary Fig. S8). (b) Normalized progress curves for indicated ccGFP 1–10 variant from (a) after subtraction of progress curve of corresponding ccGFP 1–10 fragments alone (‘no S11’). (c) In vitro complementation of equal molar amounts of ccGFP S11 variants (50 pmol) with ccGFP 1–10 v2 (800 pmol) in 200 µl reaction wells.
Improving ccGFP S11
In our original work engineering a two-part split GFP, we found that the C-terminal GFP S11 wildtype dramatically reduced the solubility of hexulose phosphate synthase1 (HPS) from P. aerophilum, suggesting that the solubility and folding of this protein was sensitive to C-terminal split protein tags. Thus, we used HPS as ‘bait’ in a directed evolution schema in E. coli to discover improved mutants of ccGFP S11 for which the HPS-ccGFP S11 fusion solubility matched that of HPS alone. Libraries of ccGFP S11 variants as C-terminal fusions with HPS and ccGFP 1–10 v2 were expressed in succession in the same cells from independently inducible compatible plasmids (see “Methods”) to avoid false positives caused by cotranslational rescue of the folding of insoluble variants of HPS-ccGFP S11 that might occur with co-expressed ccGFP 1–10 v2 as previously noted for GFP1. The brightest clones all contained the mutations D206E, V208I, and V221E, were brighter and matured faster compared to the ccGFP S11 E6, and balanced a lack of perturbation of fusion protein solubility with good complementation (Fig. 2c). We termed this variant ccGFP S11 v1.
Supercharging the ccGFP 1–10 optima
ccGFP 1–10 v1 and v2 were each about 50% soluble expressed at 37 °C from pET T7 plasmids. In an attempt to increase the solubility, as had been done by others for fluorescent proteins29,30, we mutated some neutral or hydrophobic surface residues of ccGFP 1–10 v1 to charged residues such as Glu and Arg. The new version, ccGFP 1–10 v3, carried 8 additional negatively charged residues relative to ccGFP 1–10 v1: S4E, N23D, T28E, Q41E, S43E, N142E, S153E, T162E.
Characterization of split ccGFP fragments by renaturation, autofluorescence, and complementation
Renaturation yield after unfolding
GdnHCl-denatured inclusion bodies of ccGFP 1–10 variants were renatured in 100 mM Tris, 150 mM NaCl, 10% v/v glycerol (TNG) buffer as described in “Methods”. For the same amount of inclusion bodies (~ 75 mg/tube), after dilution of the denatured inclusion bodies in 20 ml TNG, ccGFP 1–10 v1 yielded ~ 0.46 mg/ml, while ccGFP 1–10 v2 yielded ~ 0.85 mg/ml. The − 8 charged version ccGFP 1–10 v3 yielded ~ 2.5 mg/ml, a 67% yield. To facilitate comparison of specific activities of complementation with S11, for subsequent experiments, all refolded ccGFP 1–10 samples were concentrated or diluted to ~ 0.75 mg/ml.
Autofluorescence of ccGFP 1–10 variants
We monitored the development of autofluorescence of the ccGFP 1–10 variants alone over time. Referring to Fig. 2a, autofluorescence was significant for ccGFP 1–10 v1 and v3 but not ccGFP 1–10 v2. To test the relative in vitro complementation efficiency of the different ccGFP 1–10 variants, the same amount of SR-ccGFP S11 v1 (50 pmol) was added to a large molar excess of ccGFP 1–10 (800 pmol) (see “Methods”) (Fig. 2a). After subtraction of the blank autofluorescence progress curves as appropriate, both ccGFP 1–10 v1 and ccGFP 1–10 v2 have similar complementation kinetics, while the − 8 charged ccGFP 1–10 v3 is slower (Fig. 2b). Supplementary Fig. S4 shows the appearance of raw fluorescence progress curves for different concentrations of SR-ccGFP S11 v1 complemented with ccGFP 1–10 v3. The background autofluorescence progress curve for ccGFP 1–10 v3 could be easily subtracted. The same amount of either SR-ccGFP S11 E6 or SR-ccGFP S11 v1 (50 pmol) was added to the plate and a large molar excess of ccGFP 1–10 v2 was added (800 pmol) (see “Methods”). SR-ccGFP S11 v1 complemented significantly faster than SR-ccGFP S11 E6 (Fig. 2c).
Use of the split ccGFP system for in vitro protein quantification
We measured fluorescence progress curves for complementation of purified SR–ccGFP S11 v1 and ccGFP 1–10 v2 in 200 µl reactions in a microtiter plate (Fig. 3). We avoided potential higher-order kinetic effects by initiating the complementation using a high concentration and large molar-excess of ccGFP 1–10 (800 pmol). Progress curves over a wide concentration range could be superimposed by linear scaling (Fig. 3a). Over the range of S11 analyte tested (1.56–200 pmol) it was not necessary to wait until the reactions approached their asymptotic limit (~ 6 h) to generate calibration curves. For example, linear calibration curves were easily generated at 1 h (Fig. 3b), or even as soon as 6 min (Fig. 3c) after the start of complementation. Progress curves were also measured for SR-ccGFP S11 v1 vs. either ccGFP 1–10 v1 (Supplementary Fig. S3a) or ccGFP 1–10 v3 (Supplementary Fig. S4a). After subtraction of the blank progress curves due to formation of intrinsic fluorescence (no SR-ccGFP S11) (Supplementary Figures S3b, S4b), calibration curves could be generated (Supplementary Figures S3c,d, S4c,d). The efficiency of complementation was measured as a function of pH (see Supplementary Fig. S5). The complementation rate was highest above pH 7.0, decreasing linearly with decreasing pH. Below pH 5.0 complementation was inefficient. Abosrption and emission spectrum of the SR-ccGFP S11 v1 complemented with ccGFP 1–10 v3 showed a strong peak at 501 nm and 515 nm, respectively (Supplementary Fig. S6) indicating that the split ccGFP system has similar fluorescent properties compared to the full length ccGFP E6.


In vitro characterization of split ccGFP complementation. (a) Superimposition of scaled progress curves for complementation of 200, 100, 50, 25, 12.5, 6.25, 3.13 and 1.56 pmol SR-ccGFP S11 v1 in 20 µl aliquots, mixed with 180 µl aliquots containing 800 pmol of ccGFP 1–10 v2. Maximum signal (~ 0.8) corresponds to 45,000 fluorescence units on BioTek instrument scale (99,999 full scale). Progress curves were normalized by dividing measured fluorescence by the fluorescence of sfGFP control to compensate for instrument baseline drift (Supplementary Fig. S8). The curves can be superimposed by linear scaling indicating that the shape of the progress curve does not depend on the concentration of the tagged protein or depletion of the pool of unbound ccGFP 1–10 fragment. Note, in the superposition (top), noisy traces naturally result from the required scaling of the lowest concentration progress curves. (b) In vitro sensitivity of SR-ccGFP S11 v1 complementation with ccGFP 1–10 v2. Values of progress curves at 1 h from (a) are plotted vs. concentration of SR-ccGFP S11 v1. (c) Same as Fig. 3b, but data from (a) taken at 6 min.
Expression and solubility screens of 18 control proteins from P. aerophilum
To test the utility of the split ccGFP screen for quantifying protein expression in vitro, 18 control proteins (see Supplementary Table S1) with different expression and solubility levels from P. aerophilum, carrying the C-terminal ccGFP S11 v1 tag, were expressed in E. coli at 37 °C from pET vectors using the strong T7 promoter, and split into soluble and pellet fractions. The same proteins had previously been used to test split GFP1, facilitating comparison with the performance of the new ccGFP (this work). Aliquots of the soluble fractions and solubilized denatured inclusion bodies, processed to allow direct comparison (see “Methods”) were complemented with ccGFP 1–10 v2 and the final fluorescence values were measured (Fig. 4, top). The final fluorescence was reflective of the amount of the corresponding protein in the soluble and inclusion body fractions as revealed by SDS-PAGE (Fig. 4, middle). Since several of the urea-solubilized inclusion bodies visibly aggregated soon after dilution in the assay buffer, the successful complementation implies that the ccGFP 1–10 fragment rapidly binds the S11 tag during the dilution step before the formation of insoluble aggregates, committing the chromophore to form regardless of the subsequent solubility of the complex. We previous observed rapid binding of complementary split GFP fragments1.


In vitro protein quantification and in vivo protein expression and solubility screens in E. coli. Protein quantification of eighteen P. aerophilum test proteins (see Supplementary Table S1) expressed as N-terminal fusions with ccGFP S11 v1 from the strong T7 promoter (bar graph, top). The ccGFP fragment complementation assay fluorescence of soluble (black bars) and unfolded pellet fractions (gray bars) using ccGFP 1–10 v2 (top). Arbitrary fluorescence units (A. U.). SDS-PAGE of the corresponding soluble (S), and pellet fractions (P) (middle). Note that protein #8, tartrate dehydratase (beta )-subunit, shows a second lower band at ~ 13 kDa. #14, nirD protein, shows secondary bands at ~ 27 kDa and ~ 13 kDa. Original pictures of the 2 SDS-PAGE gels showing the soluble and pellet fractions of 18 test proteins are included as Supplementary Figs. S9, S10. In vivo solubility and expression screen using split ccGFP (lower). The same P. aerophilum test proteins cloned with a C-terminal ccGFP S11 v1 tag on tet promoter plasmid, in E. coli BL21 (DE3) strain carrying a pET plasmid for expression of ccGFP 1–10 v2. Fluorescence images of colonies on plates after total expression screen by coinduction of the tagged constructs and ccGFP 1–10 v2 (upper row of colonies); or after soluble expression screen by transient expression of the tagged constructs followed by expression of the ccGFP 1–10 v2 (lower row of colonies). Fluorescence images of colonies were cropped from the same pictures. Note 1 cm scale bar (lower right) illustrating the size of the colonies.
Estimating total protein expression in vivo in living bacterial cells
To estimate total protein expression in living E. coli, the C-terminally ccGFP S11 v1-tagged protein (expressed from the moderately strong AnTET regulated promoter) and the ccGFP 1–10 v2 detector protein (expressed from the very strong IPTG inducible T7 promoter) were co-expressed (see “Methods”). Referring to Fig. 4, upper row of fluorescent colonies, the fluorescence can be easily detected regardless of the solubility of the protein (as estimated from the SDS-PAGE of the soluble and pellet fractions of the same protein expressed alone from the strong T7 promoter (Fig. 4, SDS gel (middle)). As previously noted for split GFP1, this is consistent with a model where the 1–10 fragment can rapidly bind the S11 tag as soon as it appears in the cell, committing the complex to folding and chromophore formation regardless of the subsequent fate (soluble or aggregated) of the S11-tagged protein of interest. As expected, colonies expressing protein #6 (polysulfide reductase subunit) are fainter than colonies expressing the other 17 control proteins, because its expression leads to the accumulation of large amounts of a red product, absorbing the blue 488 nm excitation light.
Estimating soluble protein expression in vivo in living bacterial cells
To estimate soluble expression in an E. coli colony assay, the S11-tagged proteins were expressed first from the moderate AnTET regulated promoter, then the expression was shut off (see “Methods”). After resting for 1 h to allow the proteins to remain soluble or become aggregated according to their intrinsic properties, and for any remaining AnTET inducer to diffuse out, the ccGFP 1–10 v2 detector protein was then expressed from the strong T7 promoter to help insure a molar excess of the 1–10 protein. Under these conditions, the 1–10 fragment should only bind soluble and accessible S11 tagged protein molecules. Referring to Fig. 4, (lower row of fluorescent colonies), with the exception of protein #7, nucleoside diphosphate kinase (see below), the fluorescence is reflective of soluble expression as estimated from the SDS-PAGE of the soluble and pellet fractions of the same protein expressed alone from the stronger T7 promoter (Fig. 4, SDS gel (middle), and Supplementary Table 1). The in vivo solubility of the protein #7 is higher from the moderate AnTET regulated promoter (based on colony fluorescence, Fig. 4bottom row of colonies) compared to its expression from the strong T7 promoter (SDS gel lanes, Fig. 4, middle, and Supplementary Table 1). This is consistent with our earlier observation1 using SDS-PAGE that showed protein #7 tagged with GFP S11 M3 is partially soluble as expressed from the moderate pTET AnTET promoter, and insoluble expressed from pET T7 promoter1. Taken together, the behavior of the 18 control proteins tagged with the ccGFP S11 v1 is consistent with earlier findings with the optimal GFP S11 M3 fragment1, suggesting that the optimized ccGFP S11 v1 tag does not strongly perturb the solubility behavior of the fusion proteins.
Testing cross-complementation between GFP and ccGFP split protein fragments
To test the ability of ccGFP fragments to recognize GFP fragments and vs. versa, complementation reactions were set up between the non-cognate pairs, i.e. ccGFP 1–10 v2 with GFP S11 M31, and GFP 1–10 OPT1 with ccGFP S11 v1 (Fig. 5a). Complementation reactions were also set up between cognate pairs of fragments i.e. ccGFP 1–10 v2 with ccGFP S11 v1, and GFP 1–10 OPT with GFP S11 M3. Referring to Fig. 5a, under the conditions of the assay, the GFP 1–10 OPT fragment weakly complements with ccGFP S11 v1, yielding a fluorescence value after ~ 12 h and ~ 4% that of its cognate interaction with GFP S11 M3. Notably this reaction had not reached completion at 12 h. CFP 1–10 OPT22 also weakly complements with ccGFP S11 v1 but with lower efficiency compared to GFP 1–10 OPT (Supplementary Fig. S7). In contrast, under the same conditions, ccGFP 1–10 v2 does not detectably complement with GFP S11 M3 (Fig. 5a). Amino acid sequences of ccGFP S11 v1 and GFP S11 M3 were aligned with each other in Fig. 5b.


(a) Normalized progress curves for complementation of cognate and non-cognate ccGFP and GFP fragments. Rapid complementation of cognate fragments (upper curves). ccGFP 1–10 v2 and SR-ccGFP S11 v1 (upper solid line); GFP 1–10 OPT and SR-GFP S11 M3 (upper dotted line). GFP fragments are from the original split GFP1 derived from A. victoria. Weak complementation between non-cognate fragments (bottom curves). Complementation between ccGFP 1–10 v2 and its non-cognate fragment SR-S11 M3 is not detectable and the trace is at the baseline (bottom solid line). Complementation of GFP 1–10 OPT with the non-cognate fragment ccGFP S11 v1 (lower dotted line) is ~ 4% of the final value for the cognate GFP S11 fragment (upper dotted line). Complementation was initiated by mixing 800 pmol of each 1–10 fragment with 50 pmol of S11 fragment in 200 µl reaction wells. (b) Amino acid sequence alignment showing sequences of ccGFP S11 v1 and GFP S11 M3.
Double-labeling experiments
To ascertain the utility of the new split ccGFP in cyan and green fluorescent protein double-labeling experiments, we tagged the soluble protein sulfite reductase from P. aerophilum with an N-terminal ccGFP S11 v1, and a C-terminal GFP S11 M3 fragment22. The construct had an N-terminal 6His tag for capture by metal affinity resin, followed by a thrombin tag for selective cleavage and release from the bead. Referring to Fig. 6, after mixing the tagged complex with CFP 1–10 OPT22 and ccGFP 1–10 v2, the fluorescent complex was captured on Talon resin. Imaging the washed beads with the appropriate excitation and emission filters revealed the expected cyan and green fluorescence. The complemented CFP was recovered in the wash after cleavage by thrombin, while the ccGFP was retained on the resin via the 6His tag (Fig. 6).


Double-labeling experiment with split CFP22 and ccGFP (this work). Sulfite reductase with an N-terminal 6HIS and ccGFP S11 v1 tag, and a C-terminal CFP S11 M3 tag was complemented with an excess of CFP 1–10 OPT and ccGFP 1–10 v2 (left). Talon metal affinity beads were added to capture the complex, washed and the beads imaged to show CFP and GFP bound (middle). After cleavage of the protein on the beads by added thrombin, the beads were washed and the resin and flow through imaged to reveal where the CFP and GFP localized (right). As expected, the ccGFP was retained on the beads (upper right), while the CFP was found in the flow-through (lower right).

