Constructs
DNA encoding nanobodies were obtained by gene synthesis (IDT) and cloned into a pET vector in frame with a C-terminal 6xHis tag or GST tag by Gibson assembly (NEBuilder® HiFi DNA Assembly Master Mix, New England Biolabs). DNA encoding SARS-CoV-2 S RBD (S a.a. 319–541) was obtained by gene synthesis and cloned into pcDNA3 with an N-terminal SARS-CoV-2 S signal peptide (S a.a. 1–16) and a C-terminal 3xFlag tag by Gibson assembly. EGFP was cloned into pcDNA3 with a C-terminal 3xFlag tag by Gibson assembly. SARS-CoV-2 S was amplified by PCR (Q5 High-Fidelity 2x Master Mix, New England Biolabs) from pUC57-nCoV-S (a kind gift from Jonathan Abraham lab). SARS-CoV-2 S was deleted of the 27 a.a. at the C-terminal and fused to the NRVRQGYS sequence of HIV-1, a strategy previously described for retroviruses pseudotyped with SARS-CoV S27. Truncated SARS-CoV-2 S fused to gp41 was cloned into pCMV by Gibson assembly to obtain pCMV-SARS2ΔC-gp41. psPAX2 and pCMV-VSV-G were previously described28. pTRIP-SFFV-EGFP-NLS was previously described29 (a gift from Nicolas Manel; Addgene plasmid # 86677; http://n2t.net/addgene:86677; RRID:Addgene_86677). cDNA for human TMPRSS2 and Hygromycin resistance gene was obtained by synthesis (IDT). pTRIP-SFFV-Hygro-2A-TMPRSS2 was obtained by Gibson assembly.
Cell culture
HEK293T cells were cultured in DMEM, 10% FBS (ThermoFisher Scientific), PenStrep (ThermoFisher Scientific). HEK293T ACE2 was a kind gift from Michael Farzan. HEK293T ACE2 cells were transduced with pTRIP-SFFV-Hygro-TMPRSS2 to obtain HEK293T ACE2/TMPRSS2 cells. The transduced cells were selected with 320 µg/ml of Hygromycin (Invivogen) and used as a target in SARS-CoV-2 S pseudotyped lentivirus neutralization assays. Transient transfection of HEK293T cells was performed using TransIT®-293 Transfection Reagent (Mirus Bio).
Amino acid profile construction and analysis of natural nanobodies
Nanobody protein sequences were downloaded from the Protein Data Bank (www.rcsb.org, date 2020-09-02, Supplementary Data 1) or abYsis (www.abysis.org/abysis, date 2021-05-01, Supplementary Data 1). Nanobodies were separated into CDRs and frames (segments) by finding regions of continuous sequence in each nanobody that best matched the following standard frame sequences:
frame1 standard: EVQLVESGGGLVQAGDSLRLSCTASG,
frame2 standard: MGWFRQAPGKEREFVAAIS,
frame3 standard: AFYADSVRGRFSISADSAKNTVYLQMNSLKPEDTAVYYCAA,
frame4 standard: DYWGQGTQVTVSS,
Each matched region is the corresponding frame of the nanobody, the region between frame1 and frame2 is CDR1, the region between frame2 and frame3 is CDR2, the region between frame3 and frame4 is CDR3 (Fig. 1g). Only nanobody sequences with at least one unique CDR were selected to represent natural nanobodies and used for constructing amino acid profiles (a.a. profile). 298 sequences from Protein Data Bank (PDB298) and 1030 sequences from abYsis (abYsis1030) fit this selection criterion (Supplementary Data 1). The amino acid (a.a.) profile at each position within each segment was calculated by finding the percentage of each of the 20 universal proteinogenic amino acids at that position among all selected nanobodies, all frame lengths were set to the same length as frame standards. CDR lengths were set to accommodate different CDR lengths, CDR1 and CDR2 lengths were set to 10, CDR3 length was set to 30. Nanobodies with CDR lengths shorter than the corresponding set length had their CDR filled from the C-terminal end with empty position holders up to the set length. Numbers in the amino acid profile table are the percentage of each amino acid. CDR boundaries were defined by the position where the combined frequency of the top two most abundant amino acids dropped sharply.
We compared our annotation method to Kabat and Chothia annotation (www.abysis.org/abysis/sequence_input/key_annotation/key_annotation.cgi) and found all three methods (Kabat, Chothia, and ours) showed frame regions with the same core sequence, and with 1–2 amino acid differences in the exact CDR boundaries between the three methods. The performance of our library suggests our annotation faithfully captured the domain structure of nanobodies.
We used the 1-Gini index to measure the level of diversity at each amino acid position. The Gini index measures the degree of inequality among individuals in a population, ranging from 0, when resources are uniformly (equally) distributed across individuals, and 1 when one member has all the resources. Our diversity index of 1-Gini takes 0 when there is no diversity (one amino acid has an abundance of 100%) and 1 for the highest diversity (all amino acids have the same abundance). The diversity index is calculated for 8 positions for CDR1, 6 positions for CDR2, and 18 positions for CDR3 for all sequence groups, when no sequence in the group contains a certain CDR position, the diversity index will be 0. For example, in CDR2, both the natural nanobody collection and our input library contained a very small percentage of nanobodies having CDR2 with 6 a.a., while the output binder collection has no nanobody having CDR2 with 6 a.a., hence the diversity index has a value of 0 for the output binder plot in Supplementary Fig. 6b but a non-zero value for natural nanobodies and input library in Supplementary Fig. 1c, d.
Nanobody library design and construction
Nanobody library sequence is designed to recapitulate the sequence structure of frames and CDRs observed from analyzing natural nanobodies (PDB298, abYsis1030, Supplementary Data 2). Our design differs from prior designs6,7,8 in both the length of CDRs, the positions selected for randomization, and the randomization strategy. Such differences likely arise from differences in the size of natural nanobody collection retrieved from databases (93 in McMahon et al. 6 versus 298/1030 in this study) and/or in how the nanobodies are annotated and analyzed (“amino acid profile construction and analysis of natural nanobodies”). For example, our analysis showed the percentage of nanobodies containing CDR2 with lengths 4, 5, or 6 amino acids (a.a). are 32%, 61%, and 1.7% respectively, we thus chose to use CDR2 with a length of 5 a.a. to recapitulate the most prevalent CDR2 length. In contrast, McMahon et al. 6 used an equivalent CDR2 length of 4 a.a., while Moutel et al. 7 used an equivalent length of 6 a.a. (Supplementary Fig. 4).
Nanobody libraries were constructed by ligation of PCR products in three stages, with each stage randomizing one of the three CDRs. Primers used and PCR cycling conditions for each primer pair are listed in Supplementary Data 3. Primers were synthesized by IDT (www.idtdna.com) using the standard DNA oligo synthesis and purified by desalting without PAGE purification, we find the level of synthesis errors with standard oligo synthesis and desalting purification do not have a significant impact on the functionality of the nanobody library. At each stage, PCR was performed using a high-fidelity DNA polymerase without strand displacement activity (Phusion High-Fidelity DNA Polymerase, New England Biolabs). Importantly, 65 °C was used as the elongation temperature to avoid hairpin opening during DNA elongation. PCR products with the correct size were purified by DNA agarose gel extraction using NucleoSpin Gel and PCR Clean-Up Kit (Takara, this kit was used for all DNA agarose gel extraction steps in this study). Ligation and phosphorylation of PCR products were performed simultaneously using T4 DNA ligase (New England Biolabs) and T4 Polynucleotide Kinase (New England Biolabs). Ligation products with the correct size were purified by DNA agarose gel extraction. Purified ligation products were quantified with Qubit 1× dsDNA HS Assay Kit (ThermoFisher Scientific, this kit was used for all Qubit measurements in this study) using Qubit 3 Fluorometer.
CDR2 was randomized in stage one, PCR templates at this stage were equal molar mixtures of plasmids carrying DNA encoding frames, including three frame1 versions, one frame2, three frame3 versions, and one frame4. The three versions of frame1 and frame3 were derived from consensus sequence extracted from natural nanobody a.a. profile, the A3 nanobody10 and a GFP-binding nanobody16. Amino acid sequences of the frames are shown in Supplementary Fig. 2.
CDR1 was randomized in stage two, 200 ng of ligation product from the first stage were digested by Not I-HF (New England Biolabs) and heat-denatured, the entire digestion product was used as a template for PCR in stage two. The ligation product of stage two was subjected to one round of ribosome display and anti-Myc selection (described in “In vitro selection”), the entire recovered RNA was reverse transcribed and PCR amplified and purified.
270 ng of this RT-PCR product was used as a template for PCR in stage three to randomize CDR3. The ligation product of stage three was purified by DNA agarose gel extraction. The purified ligation product was then digested by DraI (New England Biolabs) and a fragment of ~680 bp in size was purified by DNA agarose gel extraction to obtain the final nanobody library, referred to as the input library.
High throughput full-length sequencing of nanobody library
Sequencing libraries from nanobody DNA libraries were prepared by two PCR steps using primers and PCR cycling conditions listed in Supplementary Data 3. Equal mixtures of Phusion High-Fidelity DNA polymerase (New England Biolabs) and Deep Vent DNA polymerase (New England Biolabs) were used for both PCRs to ensure efficient amplification. PCR cycle number was chosen to avoid over-amplification and typically falls between 5 and 15.
In the first PCR, Illumina universal library amplification primer binding sequence and a stretch of variable lengths of random nucleotides were introduced to the 5′ end of library DNA. And similarly, Illumina universal library amplification primer binding sequence and a stretch of variable lengths of index sequence are introduced to the 3′ end of library DNA. Eight different lengths were used for both random nucleotides and index to create staggered nanobody sequences in the sequencing library, this arrangement is required for high-quality sequencing of single amplicon libraries on an Illumina MiSeq instrument. The product of the first PCR was purified by column clean-up using NucleoSpin Gel and PCR Clean-Up Kit and the entire sample was used as the template for the second PCR.
In the second PCR, Illumina universal library amplification primers were used to generate a sequencing library. Sequencing libraries were purified by DNA agarose gel extraction, quantified using Qubit 3 Fluorometer, and sequenced on an Illumina Miseq instrument using MiSeq Reagent Nano Kit v2 (500-cycles) (Illumina, MS-103-1003), no PhiX control library spike-in was used. The sequencing run setup was: paired-end 2 × 258 with no index read. Index in the library was designed as an inline index, thus a separate index read was not required. Raw reads were separated by index, trimmed to remove N bases and bases with a quality score of <10 prior to downstream analysis.
Ribosome display
Nanobody DNA library containing a specified amount of diversity was first amplified using a DNA recovery primer pair listed in Supplementary Data 3. Equal mixtures of Phusion High-Fidelity DNA polymerase (New England Biolabs) and Deep Vent DNA polymerase (New England Biolabs) were used for the PCR. PCR cycle number was chosen to avoid over-amplification and typically falls between 5 and 15. In a standard preparation, 200–500 ng of the purified PCR product was used as DNA template in 25 µl of coupled in vitro transcription and translation reaction using PURExpress In Vitro Protein Synthesis Kit (New England Biolabs). The reaction was incubated at 37 °C for 30 min, then placed on ice, and 200 µl ice-cold stop buffer (10 mM HEPES pH 7.4, 150 mM KCl, 2.5 mM MgCl2, 0.4 µg/µl BSA (New England Biolabs), 0.4 U/µl SUPERase•In (ThermoFisher Scientific), 0.05% TritonX-100) was then added to stop the reaction. This stopped ribosome display solution was used for binding to immobilized protein targets during in vitro selection. The amount of DNA template, the volume of coupled in vitro transcription and translation reaction, and the volume of stop buffer was scaled proportionally when different volumes of stopped ribosome display solution where needed. 1–8× standard preparations were used for each selection round with the first round using 8× standard preparations, the second round using 2× standard preparations, and the third-round using 1× standard preparation.
In vitro selection
Target proteins were immobilized to magnetic beads by first coating protein G magnetic beads (ThermoFisher Scientific, 10004D) with anti-Flag antibody (Sigma-Aldrich, F1804, at 1:50 dilution), then incubating antibody-coated beads with cell lysate or cell media containing 3×Flag tagged target proteins at 4 °C for 2 hours. For anti-Myc selection, magnetic beads were coated by anti-Myc antibody (ThermoFisher Scientific, 13-2500, at 1:50 dilution) only. 100 µl of antibody-coated beads were used for target immobilization and pre-clearing in the first round, and 50 µl were used for subsequent rounds. The beads were washed three times with PBST (PBS, ThermoFisher Scientific, with 0.02% TritonX-100). Stopped ribosome display solutions were first incubated with antibody-coated beads (without targets) at 4 °C for 30 minutes for pre-clearing of non-specific and off-target binders, the solution was then transferred to target immobilized beads and incubated at 4 °C for 1 hour, the target immobilized beads were then washed four times with wash buffer (10 mM HEPES pH 7.4, 150 mM KCl, 5 mM MgCl2, 0.4 µg/µl BSA (New England Biolabs), 0.1 U/µl SUPERase•In (ThermoFisher Scientific), 0.05% TritonX-100). After washing, beads were resuspended in TRIzol Reagent (ThermoFisher Scientific, 15596026), and RNA was extracted from the beads, 25 µg of linear acrylamide (ThermoFisher Scientific, AM9520) were used as co-precipitant during RNA extraction. Reverse transcription of extracted RNA was performed using Maxima H Minus Reverse Transcriptase (ThermoFisher Scientific) and primer as described in Supplementary Data 3, row 64. The reverse transcription reaction was purified using SPRIselect Reagent (Beckman Coulter) to obtain purified cDNA. Purified cDNA was amplified by PCR using equal mixtures of Phusion High-Fidelity DNA polymerase and Deep Vent DNA polymerase. PCR cycle number (Supplementary Data 3) was chosen to avoid over-amplification and typically falls between 10 and 25. This PCR condition ensures efficient full-length product synthesis at each cycle and is required to faithfully amplify nanobody genes without CDR shuffling, a phenomenon18 that could otherwise cause selection failure. The PCR product was purified by DNA agarose gel extraction. The purified PCR product was used for library generation for high throughput full-length sequencing or as DNA input for ribosome display reaction (coupled in vitro transcription and translation) to perform additional rounds of in vitro selection.
One round of anti-Myc selection was performed on the nanobody library with CDR1 and 2 randomized to enrich for correct-frame sequences. Several factors can in principle contribute to the presence of out-of-frame sequences after anti-Myc selection: (1) non-specific binding of RNA or protein to magnetic beads; (2) translation through alternative start codons downstream of areas containing out-of-frame errors; and/or (3) inefficient binding of the anti-Myc antibody to the expressed Myc peptide that is located between the VHH protein and ribosome. We disfavor (1), because although our input library contained 27.5% full-length sequences, the remaining sequences that contained errors do not interfere with full-length sequences and are reduced to <10% after three rounds of RBD selection (Fig. 2c), suggesting that these erroneous sequences or their encoded peptides do not non-specifically stick to beads at significant levels to impact binder selection.
As a control experiment to demonstrate the efficiency of our ribosome display and selection protocol, SR6c3 sequence was linked with 5′ and 3′ sequence elements for ribosome display and serves as control input DNA, 100 ng of control input DNA was displayed by ribosome display in a reaction volume of 10 µl and bound to 500 µl RBD-coated beads, washed and total RNA was extracted from the beads. 7910 ng total RNA was recovered, of which 989 ng is estimated to be SR6c3 RNA (1/8 of the total, calculated by the mass ratio of nanobody RNA, 649 nt, to E. coli ribosomal RNAs, 4568 nt), representing a coverage rate of 19× in the output.
CDR-directed clustering analysis
Computational analysis for CDR-directed clustering was performed using custom python scripts. Paired-end sequences were merged into full-length nanobody sequences. Merged nanobody sequences were quality trimmed and translated into nanobody protein sequences, which were separated into CDRs and frames (segments) as described in the “Amino Acid Profile Construction and analysis of natural nanobodies” section. Two nanobodies were determined to have similar CDRs via the following steps. First, the ungapped sequence alignment score (match score) was calculated for each CDR of the two nanobodies as the sum of BLOSUM6230 amino acid pair scores at each aligned position (if two CDRs have different lengths, their sequence alignment score was set to −5). The alignment scores of any two CDRs were summed to yield three scores, and if at least one of the three was larger than 35 (Fig. 2b), the two nanobodies were defined as having similar CDRs. Next, nanobodies with similar CDRs were grouped into a cluster by a two-step process. In the first step, we chose as nanobody cluster-forming “seeds” those nanobodies that were called similar to at least 5 other nanobodies (all remaining nanobodies were not considered for clustering). In the second step, we iteratively selected a seed nanobody with at least 5 other similar (>35 match score) seed nanobodies, and grouped all of them into one cluster, removing them from the seed nanobody pool, and iterated this procedure until no seed nanobodies remained. For RBD, there were 83,433 seeds in the first step, and 83,392 were grouped in clusters in the second step. For EGFP, 71,210 of 71,220 seeds were grouped in clusters (Supplementary Data 9). This heuristic was fast in a standard computing environment with multiprocessing capabilities.
A representative sequence to illustrate each CDR in each cluster was chosen as the most frequent CDR sequence in the cluster (the chosen representatives for CDR1, 2, and 3 may not necessarily be from the same sequence, and are used only for illustrative purposes for each cluster as in Supplementary Data 4 and 5; whole nanobody sequences were used for gene synthesis and all downstream experiments). A consensus sequence was generated for each CDR, where each position in the CDR was represented by a six-character string, such that the first and fourth characters were the single letter code for the top and the second most abundant amino acid at the position, respectively, and the following two characters (second and third for the most abundant; fifth and sixth for the second most abundant), were their frequency, respectively (ranging from 00 for <34% to 99 for 100%). The consensus sequence for a CDR was recorded as a single “B00” when the standard deviation of the lengths of all CDRs was >1. CDR scores were calculated by summing a score for each position in the CDR consensus sequence, with scores of 3, 2, 1 for positions where the most abundant amino acid had frequencies >80%, 50%, or less, respectively, and a score of 0 for CDRs with a consensus sequence of a single “B00” (Supplementary Data 4 and 5). Representative whole nanobody sequence for each cluster was selected as the one with the maximal sum (max-sum) of all CDR similarity scores between the nanobody and all other nanobodies in the cluster. This max-sum representative nanobody sequence selection process minimizes the impact of random errors introduced during NGS library preparation and sequencing by imposing a scoring penalty on sequences containing random errors.
Protein expression and purification
Target proteins used for in vitro selection and ELISA were prepared by transiently transfecting HEK293T cells with plasmids carrying either spike RBD with C-terminal 3×Flag tag and N-terminal signal peptide of the spike (RBD-3×Flag), or EGFP with C-terminal 3×Flag tag (EGFP-3×Flag). Cell culture media (for RBD-3×Flag) or lysate of cell pellet (for EGFP-3×Flag) was used for coating magnetic beads (for CeVICA) or plates (for ELISA). Nanobodies with C-terminal 6xHis tag (Nanobody-6xHis) were purified by expressing in E. coli., followed by purification using HisPur Cobalt Resin (ThermoFisher Scientific, 89964). Briefly, Nanobody-6xHis plasmids were transformed into T7 Express E. Coli. (New England Biolabs), single colonies were transferred into 10 ml LB media and grown at 37 °C for 2–4 h (until OD reached 0.5–1), the culture was chilled on ice, then IPTG was added to a final concentration of 10 μM. The culture was then incubated on an orbital shaker at room temperature (RT) for 16 hours. Bacterial cells were pelleted by centrifugation and lysed in B-PER Bacterial Protein Extraction Reagent (ThermoFisher Scientific) supplemented with rLysozyme (Sigma-Aldrich), DNase I (New England Biolabs), 2.5 mM MgCl2, and 0.5 mM CaCl2. Bacterial lysates were cleared by centrifugation and mixed with wash buffer (50 mM sodium phosphate pH 7.4, 300 mM sodium chloride, 10 mM imidazole) at a 1:1 ratio, and then incubated with 40 μl HisPur cobalt resin for 2 hours at 4 °C. The resins were then washed four times with wash buffer. Proteins were eluted by incubating resin in elution buffer (50 mM sodium phosphate pH 7.4, 300 mM sodium chloride, 150 mM imidazole) at RT for 5 minutes. Purified protein samples were quantified by measuring absorbance at 280 nm on a NanoDrop spectrophotometer.
ELISA assay for nanobody binding to RBD
Maxisorp plates (BioLegend, 423501) were coated with 1 µg/ml anti-Flag antibody (Sigma Aldrich, F1804) in coating buffer (BioLegend, 421701) at 4 °C overnight. Plates were washed once with PBST (PBS, ThermoFisher Scientific, with 0.02% TritonX-100), a 1:1 mixture of HEK293T cell culture media containing secreted RBD-3xFlag and blocking buffer (PBST with 1% nonfat dry milk) was added to the plates and incubated at room temperature (RT) for 1 hour. RBD coated plates were then blocked with blocking buffer at RT for 1 hour. Plates were washed twice with wash buffer and purified Nanobody-6xHis diluted in blocking buffer were added to the plates and incubated at RT for 1 hour. Plates were washed three times with wash buffer, HRP conjugated anti-His tag secondary antibody (BioLegend, 652503) diluted 1:2000 in blocking buffer was then added to the plates and incubated at RT for 1 hour. Plates were washed three times with wash buffer and TMB substrate (BD, 555214) was added to the plate and incubated at RT for 10–20 minutes. Stop buffer (1 N sulfuric acid) was added to the plates once enough color developed. Quantification of plates was performed by measuring absorbance at 450 nm on a BioTek synergy H1 microplate reader using Gen5 software 1.11.5. Data reported were background subtracted. Two levels of background subtraction were performed: (1) subtracting absorbance measured from wells incubated with blocking buffer only (without purified Nanobody-6xHis) from sample measurements (reflecting background absorbance by plates); and (2) subtracting absorbance from each nanobody incubated wells coated only with anti-Flag antibody and without RBD (reflecting non-specific binding of each nanobody).
Pseudotyped SARS-CoV-2 lentivirus production and lentivirus production for transductions
Lentivirus production was performed as previously described28. Briefly, HEK293T cells were seeded at 0.8 × 106 cells per well in a six-well plate and were transfected the same day with TransIT-293 Transfection Reagent and a mix of DNA containing 1 µg psPAX, 1.6 µg pTRIP-SFFV-EGFP-NLS, and 0.4 µg pCMV-SARS2ΔC-gp41. The medium was changed after overnight transfection. SARS-CoV-2 S pseudotyped lentiviral particles were collected 30–34 hours post-medium change and filtered on a 0.45 µm syringe filter. To transduce HEK293T ACE2 the same protocol was followed, with a mix containing 1 µg psPAX, 1.6 µg pTRIP-SFFV-Hygro-2A-TMPRSS2, and 0.4 µg pCMV-VSV-G.
SARS-CoV-2 S pseudotyped lentivirus neutralization assay
The day before the experiment, 5 × 103 HEK293T ACE2/TMPRSS2 cells per well were seeded in 96-well plates in 100 µl. On the day of lentivirus harvest, SARS-CoV-2 S pseudotyped lentivirus was incubated with nanobodies or nanobody elution buffer in 96-well plates for 1 hour at RT (100 µl virus + 50 µl of nanobody at appropriate dilutions). The medium was then removed from HEK293T ACE2/TMPRSS2 cells and replaced with 150 µl of the nanobody plus pseudotyped lentivirus solution. Wells in the outermost rows of the 96-well plate were excluded from the assay. After overnight incubation, the medium was changed to 100 µl of fresh medium. Cells were harvested 40–44 hours post-infection with TrypLE (Thermo Fisher), washed in medium, and fixed in FACS buffer containing 1% PFA (Electron Microscopy Sciences). Percentages of GFP positive cells were quantified on a Cytoflex LX (Beckman Coulter) and data were analyzed with FlowJo. During the development of the pseudotyped lentivirus neutralization assay, we found HEK293T ACE2/TMPRSS2 cells were highly susceptible to pseudovirus infection and produced consistent inhibition measurements, while Vero E6 and Caco-2 cells showed lower susceptibility in our GFP detection-based assays.
Affinity maturation
Error-prone PCR was used to introduce random mutations across the full length of selected nanobody DNA sequences. 0.1 ng of plasmid carrying DNA sequence encoding each selected nanobody were used as template in PCR reactions using Taq DNA polymerase with reaction buffer (10 mM Tris–HCl pH 8.3, 50 mM KCl, 7 mM MgCl2, 0.5 mM MnCl2, 1 mM dCTP, 1 mM dTTP, 0.2 mM dATP, 0.2 mM dGTP) suitable for causing mutations in PCR products. Mutagenized library (pre-affinity maturation) for input to CeVICA was made by ligating PCR products of error-prone PCR that carries nanobodies to DNA fragments containing the remaining elements required for ribosome display. Three rounds of ribosome display and in vitro selection were performed on the mutagenized library as described in the “In vitro selection” section, during which the incubation time of the binding step was kept between 5 seconds and 1 minute to impose a stringent selection condition, additional error-prone PCR was not performed between selection rounds. The output library (post-affinity maturation) was sequenced along with the pre-affinity maturation library as described in the “high throughput full-length sequencing of the nanobody library” section.
Identification and ranking of beneficial mutations
To identify potential beneficial mutations for each selected nanobody, we built an amino acid profile (a.a. profile) table for each nanobody family in the pre- and post-affinity maturation library and identified amino acids with increased frequency in the post-affinity maturation population compared to their pre-maturation frequency. For each nanobody parental sequence, an a.a. profile was built of the percent of each a.a. across all nanobody sequences originated from one parental nanobody in the pre-affinity maturation library (“pre-a.a. profile”) and in the post-affinity maturation library (“post-a.a. profile”). A percent point change table was generated by subtracting the pre-a.a. profile from the post-a.a. profile, describing the change of frequency of each observed amino acid at each position of the nanobody protein following affinity maturation.
We defined a putative beneficial mutation and assigned beneficial mutation score as either (1) the non-parental amino acid with the biggest increase in frequency if its increase is at least 0.5 percentage points; the score is the difference from the parental amino acid frequency; or (2) the non-parental amino acid with the biggest increase after the parental amino acid if the increase is at least 1.5 percentage points; the score is the percentage point change of the beneficial mutation. To avoid too many proximal putative beneficial mutations (which may cause structural incompatibility), a putative beneficial mutation was discarded if it (1) is outside the CDRs; (2) is <3 positions away from another beneficial mutation (“nearby mutation) and has a smaller beneficial mutation score than the nearby mutation; and (3) co-occurs less than twice with the nearby mutation. From this final list of putative beneficial mutations, different combinations were chosen and incorporated into each nanobody parental sequence that includes one combination of all beneficial mutations in CDRs, one combination of the top-3 ranked (by beneficial mutation score) mutations in frames, and at least one combination of both CDR mutations and frame mutations (Supplementary Data 7).
Biolayer interferometry
Biolayer interferometry assays were performed on the Octet RED384 instrument (Sartorius) using anti-GST biosensors (Sartorius, 18-5096). Assays were performed in sample buffer (PBS with 0.05% Tween-20, 0.5 mg/ml BSA). Nanobodies were loaded on anti-GST biosensors in a sample buffer containing bacteria lysates of E. coli. expressing GST-tagged nanobodies (100-fold dilution), achieving loading levels of 1–1.2 nm. Nanobodies-loaded sensors were dipped in sample buffer containing recombinant RBD (ThermoFisher, RP-87678) for 200 seconds to record association, then dipped in sample buffer for 1200 seconds to record dissociation. Nanobody-loaded sensors dipped in sample buffer containing no RBD were used as reference sample sensors for background subtraction. No signal increase was observed for reference sample sensors which indicate no non-specific binding to loaded nanobodies. Non-specific binding of RBD to anti-GST biosensors was tested by dipping anti-GST biosensors not loaded with nanobodies in 20 nM RBD. No signal increase was observed during the incubation indicating that RBD does not bind non-specifically to anti-GST biosensors. Data analysis was performed using the Octet Data analysis software 10.0, Savitzky–Golay filtering was used to remove noise, and curves were fitted using a 1:1 binding model.
Size-exclusion chromatography
Size-exclusion chromatography was performed on an AKTA Pure 25M system with a Superdex increase 75 10/300 GL column (Cytiva). 50–100 μg nanobodies were loaded onto the column in running buffer (20 mM HEPES, 150 mM NaCl, PH 7.5), a flow rate of 0.5 ml/min was used and UV280 readings were recorded for 1.25 column volumes. Peak analysis was performed using the UNICORN 7 software (Cytiva).
Thermal stability assays
Protein thermal shift assays were performed using the Protein Thermal Shift Dye Kit (ThermoFisher, 4461146) according to the manufacturer’s instructions. 4 μg of nanobodies were diluted in 1× reaction buffer and measurements were performed on a Bio-Rad CFX384 real-time PCR system using a melt curve protocol (30–98 °C, 1 °C increment, hold for 20 s then read plates using FRET channel). 98 °C heat denaturation was performed by diluting nanobody sample to 1 μM in PBS containing 100 ng/μl BSA, then heating at 98 °C for 10 min then holding at 4 °C using a PCR machine. ELISA assay of nanobody samples prior to and after complete thermal denaturation was performed as described above (“ELISA assay for nanobody binding to RBD”).
Figure plots generation
Plots in figures were generated using python package Matplotlib 3.3.0 (https://matplotlib.org/)
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

