Computational design of SARS-CoV-2 S protein binding peptides
Computational design of peptides requires the generation of tertiary structures in order to predict binding to a target of interest. Analysis of the neutralizing sites on the SARS-CoV-2 S protein indicated that most monoclonal antibodies (mAbs) targeted the receptor binding domain (RBD) of the S protein25,26, while only one targeted the N-terminal domain (NTD)27. Cryo-electron microscopy structure analysis of the S protein trimer revealed that the RBD can be in an open or closed state28. Based on structural information, Barnes et al.26 identified four different classes of RBD-targeting antibodies depending on whether they bound to the ACE2 binding domain, and whether the S1 RBD was in the closed or open state. Therefore, in the computational design of binding peptides we modeled three different structures of spike protein: S protein trimer in the open form (PDB ID: 6VYB), S protein trimer in the closed form (PDB ID: 6VXX), and RBD of the S protein (PDB ID: 6LZG).
Two approaches were used to generate a library of peptides. In the first approach, we used a fragment from the N-terminal α1 helix of the human ACE2 receptor, shown to be directly involved with binding to the RBD of the S protein9,12. Ten 18-mer peptide sequences spanning the length of the original ACE2 fragment were generated and their tertiary structures were modeled from the crystal structure of the RBD/hACE2 complex (PDB ID: 6LZG). These peptides were subsequently docked to the RBD of the spike protein to generate protein-peptide complexes for sequence optimization.
In the second approach, we designed a set of peptides from a pool of random sequences. Since short peptides are very flexible and typically lack a distinct conformation in their unbound state, an ensemble of 20 structures was generated for every peptide sequence, as described in the Methods section. These peptides were docked to the S1 subunit of the S protein trimer in its open and closed forms, as well as to the monomer RBD structure, with no constraints on the location of the binding site. Analysis of docking complexes revealed several hot spots for peptide binding on both the NTD and RBD. About 300 protein-peptide complexes (with peptides bound to each the NTD and RBD) were selected for sequence optimization.
Sequence optimization was performed as an iterative process where mutations were introduced into every peptide sequence and kept if they resulted in an improved binding score. The most important, but also most difficult, part of the proposed approach was to correctly score and rank the protein-peptide complexes based on their binding affinity. Current scoring functions use different simplifications to enable search through both the sequence and conformational space. To account for these differences, we used the consensus score of two different scoring functions (ddg provided by Rosetta and ZRANK29 developed at Boston University) to rank sequences based on predicted binding affinity to the S protein. Using the consensus scores generated by computational analysis, we selected a library of 2,376 unique peptide sequences for screening. This library consisted of 10 wild-type ACE2 variants (as shown in Fig. 2a), 800 ACE2-optimized sequences as described in approach 1, and 1566 S protein binding sequences as described in approach 2.
Microarray screening of designed peptides
One critical capability missing from current efforts to design binding sequences against SARS-CoV-2 is the ability to test candidates in a simple and high-throughput format. Here, we applied a fast and simple microarray-based screening pipeline to select S protein binding peptides from our in silico designed library. All reagents used in this pipeline were commercially available and required no special modifications or equipment, thereby allowing for easy adoption in other laboratories. Since the S protein trimer was not commercially available at the time of screening, we screened for binding to the SARS-CoV-2 S1 subunit, which contains the receptor binding domain (RBD) and N-terminal domain (NTD)—two of the binding hot-spots identified during computer-based docking studies.
The top-ranking peptide sequences identified from in silico design were printed on a custom peptide microarray with side-by-side duplicates. The library of 2,376 sequences fit on a 1 × 2 design where two copies of the array were printed onto a single slide. Each subarray was exposed to biotinylated SARS-CoV-2 S1 protein at a single concentration between 2 and 50 µg/mL (or buffer-only control) and binding sequences were identified following incubation with streptavidin conjugated fluorescent dye, as shown in Fig. 1a.


Peptide Microarray to Identify SARS-CoV-2 S Protein Binding Sequences. (a) Schematic showing microarray screening procedure for detecting binding of a biotinylated target protein. (b) Microarray images of peptide subarray following exposure to 50, 10, or 5 µg/mL of SARS-CoV2 S1 protein. (c) Normalized binding signal of peptides after exposure to SARS-CoV2 S1 protein at 50, 10, or 5 µg/mL concentration. Z-score at each S1 concentration is plotted for peptides listed in numerical order along the X-axis. P2-P811 correspond to peptides derived from the ACE2 N-terminal alpha helix (approach 1) while P812-P2376 were derived from a random starting library and screened in silico for docking to S protein (approach 2). Peptides with high Z-scores (> 1.95, indicated by dotted line) represent S1 protein binding sequences.
One advantage of this system is that processing of multiple arrays can easily be completed in a single day, allowing for rapid screening of the entire library under multiple experimental conditions. Representative images of the array are shown in Fig. 1b following exposure to 50, 10, and 5 µg/mL of S1 protein. In addition to the control spots, visible as bright and dark spots around the perimeter, a large number of apparent binders are visible on the left third of the array. These spots correspond to the portion of the library (P2-P811) designed from the ACE2 receptor fragment, as described in approach 1. Not surprisingly, the design of peptides from a known binding sequence resulted in numerous “hits” with a range of binding affinities, as observed by the variation in fluorescent intensity from this region. Following data normalization, these sequences also had the highest proportion of binders with 115 of 800 sequences showing a Z-score greater than 1.95 (Fig. 1c) at the highest S1 protein concentration tested.
Perhaps most interesting, however, is that the ten wild-type ACE2 variants (P2-P11), used as the starting sequences for optimization in approach 1 (Fig. 2a), showed very little binding to S1 protein. Despite the low signals, there was a trend for increased binding from peptides overlapping the central region of the original ACE2 sequence. In fact, P7, which contained 9 of the 11 binding residues present in the original fragment (Fig. 2b), showed the highest binding to S1 protein, although it never achieved a normalized binding signal greater than 1.3. The increase in binding signal obtained from sequences derived from the wild-type ACE2 variants highlights the importance of the sequence optimization process for improving target binding affinity.


Screening and selection of peptide binders to SARS-CoV-2 S protein. (a) Peptides P2-P11 represent wild-type versions of the original 27-mer ACE2 N-terminal alpha helix. They were designed as 18-mers spanning the length of the original fragment with 17 amino acids overlapped. The ACE2 fragment is predicted to bind to the SARS-CoV2 S protein via residues shown in bold12. (b) Normalized binding signal of the ACE2-derived peptide variants show little to no binding to SARS-CoV2 S1 protein. There is an apparent trend for increased binding from the peptide fragments overlapping the center of the WT sequence and P7 shows the highest binding signal with a z-score of 1.3 when exposed to 50 µg/mL of S1 protein. (c) Normalized binding signal of the 14 peptides selected from microarray screening experiments for further characterization. 10 peptides were selected from the pool of ACE2 mutants, 3 were selected from the pool of modeled sequences, and one non-binding sequence was selected for comparison. All sequences selected had a Z-score > 2 on the 50 µg/mL S1 protein array, except for P481. (d) Sequences of the 14 peptides selected for synthesis with N-terminus biotin attached via a PEG4 spacer. **Note that P28 was not able to be synthesized by the vendor. (e) Binding curves of biotinylated peptides to immobilized SARS-CoV2 S1 protein in ELISA plate-based assay. Four peptides (P89, P100, P168 and P180) showed higher binding affinity than the original ACE2 fragment (SBP1) and were selected for further characterization.
Additionally, a handful of binding sequences from the pool of random sequences were observed to have high affinity to the S protein (P812-P2377, right two-thirds of array in Fig. 1b). Importantly, these sequences were obtained de novo with no inputs from a priori binding characterization. After data normalization, a total of 5 sequences were identified with a Z-score greater than 1.95 at the highest S1 protein concentration tested. Taking into consideration normalized binding signal across all S1 protein concentrations tested, we selected 14 peptides for further characterization: 10 from the pool of ACE2 mutants, 3 from the pool of S1 modeled sequences, and one non-binding sequence (P481) as a negative control for comparison. A heatmap of Z-scores for each of the selected peptides across all S1 protein concentrations tested is shown in Fig. 2c with corresponding sequences in Fig. 2d. All sequences selected had a normalized binding signal greater than 2 on the 50 µg/mL S1 protein array, except for P481 (the non-binding sequence). Importantly, not including time for array production, the entire screening process could be completed in less than one week, showing that potential binding sequences can be identified rapidly in response to emerging targets.
Peptide binding characterization
To confirm and further characterize the binding observed on the microarray, we turned our attention to more traditional methods. We started by setting up an ELISA-like assay to measure binding of biotinylated peptides to immobilized SARS-CoV-2 S1 protein, as described in the Methods section. We validated this assay with a 23-mer control sequence (SBP1) from the N-terminal alpha helix of the ACE2 receptor, which had been shown to bind the SARS-CoV-2 S protein RBD with micromolar affinity7. In our ELISA-like assay, biotinylated SBP1 had a calculated KD of 2.2 µM to the SARS-CoV-2 S1 protein while a 12-mer truncated version of the peptide (SBP2) showed no measurable binding (Fig. 2e). Using the same assay format, we also screened our selected peptide sequences for binding to S1 protein. Of the 14 sequences selected from the microarray, 13 were able to be synthesized with an N-terminus biotin attached via a PEG4 spacer (Fig. 2d). Testing of the peptides in the assay identified four lead candidates with stronger binding to the S1 protein compared to the benchmark peptide, SBP1 (Fig. 2e). As expected, P481 showed no measurable binding signal to S1 protein. Notably, all four of the lead peptide binders were from the pool of ACE2 derived sequences.
Next, we used bio-layer interferometry (BLI) to further characterize the binding interactions between our top four peptide candidates and the S1 protein. This assay allows for the measurement of kinetic parameters and the immobilization strategy is more relevant to our endpoint sensing platform (described below). Here, biotinylated peptides were immobilized onto streptavidin coated biosensor tips and dipped into wells containing free S1 protein in solution. The presence of the biotin tag on the N-terminus of the peptide sequence forces the peptide into a specific orientation when immobilized, which may limit accessibility to the target protein, but is required for use as a diagnostic. The association and dissociation curves from serial dilutions of the S1 protein are shown for each of the peptides in Fig. 3a–d. After global 1:1 curve fitting, the dissociation constants (KD) for all peptides were determined to be between 100 and 250 nM (Fig. 3e), with P89 having the highest affinity (KD = 124 nM). While the top 4 peptides identified in this study were selected for binding to SARS-CoV-2 S protein, preliminary selectivity screening against other viral proteins indicated that some cross-reactivity may be present and needs to be further evaluated (Supplemental Fig. S1).


Binding characterization of peptide binders. (a–d) BLI binding traces of S1 protein association and dissociation to each of the top four biotinylated peptides (immobilized onto streptavidin sensors). (e) The calculated binding constants for the top four peptides. (f) Potential binding sites for the top-4 peptides on the RBD subunit of the spike protein (colored in cyan) as identified from the docking simulations. Peptides P89 (red) and P100 (blue) bind to the ACE2 binding site, peptide P180 (yellow) binds to the same site as CR3022 and S2A4 antibodies, while the binding site for peptide P168 (purple) is located similarly to the antibody S309 binding site25,30,31,32.
We also performed computer modeling to identify the potential binding sites of these peptides on the SARS-COV-2 S protein. The 3D structures of the peptide sequences were generated using the Rosetta package and peptide docking to the RBD was performed using the ZDOCK, as described in the Methods section. As shown in Fig. 3f, the predicted binding site for peptides P89 and P100 encompasses residues Y449–Q493 and overlaps with the ACE2-binding site32. Potential binding sites for the two other peptides are located outside of the ACE2-binding region. Peptide P180 binds to the residues Y380–F392 of the S protein, which was identified as the binding site for the S304 and CR3022 antibodies31, while P168 binds to residues E340–K356 of the RBD, consistent with the binding site of the S309 antibody.
Electrochemical impedance spectroscopy (EIS) for detection of SARS-CoV-2 S protein
To demonstrate the application of these peptides as the sensing element for the detection of SARS-CoV-2 S protein, we integrated them into an electrochemical-based sensor to detect S protein from spiked saliva. The peptides were immobilized on self-assembled monolayer modified gold electrodes and alterations in the charge transfer resistance (ΔRct) were used to quantify the binding signal by electrochemical impedance spectroscopy (EIS). Results for peptide P180 are shown in Fig. 4a (detailed results in Supplemental Fig. S2). EIS results were obtained in phosphate buffer solution (0.1 M, pH 7.2) containing 10 mM K3[Fe(CN)6]/K4[Fe(CN)6], with the frequency scanned from 100 kHz to 0.1 Hz at the potential of the redox probe (0.135 V vs. Ag/AgCl). Charge transfer resistance (Rct) values were obtained by fitting the EIS spectra using Randell circuit model provided by Solartron. The plot in Fig. 4b of the signal (ΔRct) for varying S protein levels (0.05–10 µg/mL) that are present in a background of 10 µg/mL Ricin as the non-specific control protein shows a steady rise in signal due to specific binding to the S protein that leads to progressive blockage of the electron transfer. To show the potential of this sensor platform to function in saliva matrices, the S-protein was spiked at varying levels into 50% saliva. The plot in Fig. 4c shows a linear rise in the signal (ΔRct) as a function of S protein concentration in saliva. We assess the limit of detection for this sensor as ~ 0.1 µg/mL in phosphate-buffered saline (PBS) and ~ 0.2 µg/mL in saliva matrices, based on signals that are at least three standard deviations above a negative control protein (10 µg/mL Ricin in PBS, Fig. 4b) and no target in saliva (Fig. 4c). We are currently developing alternate sensor paradigms based on nanoporous gold electrodes33 and redox signal amplification to improve sensitivity34, however, the current limit of detection would be relevant for detection of SARS-CoV-2 in clinical samples35,36.


Electrochemical sensing of s protein with peptide binder. (a) Schematic of the immobilization of the P180 peptide on gold electrodes for EIS measurements to quantify binding with SARS COV2 S protein based on alteration in charge transfer resistance (DRct). (b) Signal (DRct) as a function of S protein levels in Ricin background at 10 µg/mL. (c) Signal (DRct) as a function of S protein levels in 50% saliva.

