Oligonucleotides
Oligonucleotide sequences are listed in Supplementary Table 1. Oligonucleotides were ordered from Integrated DNA Technologies (IDT) unless indicated otherwise.
Cloning of phagemids for display of PAC-tagged Nb–p3 fusions for PHAGE-ATAC
Based on the 10× scATAC bead oligo design (Extended Data Fig. 1c), we hypothesized that introduction of an RD1 flanking the Nb CDR3 barcode would enable barcode capture alongside accessible chromatin fragments during droplet-based indexing. To avoid premature termination of Nb–p3 fusion translation due to the introduction of RD1, we modified the RD1-spanning reading frame, which resulted in the expression of a 12-amino acid PHAGE-ATAC tag (PAC-tag). To generate a phagemid for C-terminal fusion of both PAC-tag and p3, 20 ng of pDXinit (Addgene, catalog no. 110101) was subjected to site-directed mutagenesis with primers EF77 and EF78 using PfuUltraII (Agilent) in 50-µl reactions. PCR conditions were: 95 °C for 3 min; 19 cycles at 95 °C for 30 s, 60 °C for 1 min and 68° for 12 min; and a final extension at 72 °C for 14 min. Template DNA was digested for 1.5 h at 37 °C by addition of 1.5 µl of DpnI (Fastdigest, Thermo Fisher Scientific). PCR reactions were then purified using GeneJet Gel Extraction Kit (Thermo Fisher Scientific) and eluted in 45 µl of water, after which 20 µl of eluate was transformed into chemically competent Escherichia coli (NEB Stable Competent) and plated on lysogeny broth containing ampicillin (LB-Amp), yielding pDXinit-PAC. For cloning of Nb-PAC–p3, fusion-encoding phagemids, Nb sequences (Supplementary Table 3) were ordered as gBlocks from IDT. Then, 25-ng Nb gBlocks were first amplified by PCR to introduce SapI restriction sites. Primers EF87 and EF88 were used for CD4 Nb, primers EF87 and EF89 for CD16 Nb, primers EF104 and EF105 for CD8 Nb, primers EF299 and EF300 for CEACAM4 Nb, and primers EF176-EF213 for all 28 SARS-CoV-2-S-recognizing Nbs. The 50-µl PCR reactions using Q5 (New England Biolabs (NEB)) were cycled: at 98 °C for 1 min; 35 cycles at 98 °C for 15 s, 60 °C for 30 s and 72° for 30 s; and a final extension of 72 °C for 3 min. PCR reactions were loaded on a 1% agarose gel, expected bands were cut and PCR products were extracted using a GeneJet Gel Extraction Kit (Thermo Fisher Scientific) and eluted in 40 µl of water. Cloning was performed using the FX system as described previously31. Briefly, each eluted insert was mixed with 50 ng of pDXinit-PAC in a molar ratio of 1:5 (vector:insert) in 10-µl reactions and digested with 0.5 µl of SapI (NEB) for 1 h at 37 °C. Reactions were incubated for 20 min at 65 °C to heat inactivate SapI and cooled down to room temperature, and constructs were ligated by addition of 1.1 µl of 10× T4 ligase buffer (NEB) and 0.25 µl of T4 ligase (NEB) and incubated for 1 h at 25 °C. Ligation was stopped by heat inactivation for 20 min at 65 °C, followed by cooling to room temperature. Ligation reactions, 2 µl, were transformed into chemically competent E. coli (NEB Stable Competent) and plated on 5% sucrose-containing LB-Amp, yielding pDXinit-CD4Nb-PAC, pDXinit-CD8Nb-PAC, pDXinit-CD16Nb-PAC, pDXinit-CEACAM4Nb-PAC and all 28 pDXinit-SARS2-SNb-PAC constructs. For cloning of CD8 hashtag phagemids, 20 ng of pDXinit-CD8Nb-PAC was used as a template for site-directed mutagenesis (as described earlier), using primers EF156 and EF157 to generate pDXinit-CD8Nb(PH-A)-PAC, primers EF158 and EF159 for pDXinit-CD8Nb(PH-B)-PAC, primers EF164 and EF165 for pDXinit-CD8Nb(PH-C)-PAC, and primers EF166 and EF167 for pDXinit-CD8Nb(PH-D)-PAC. For cloning of EGFP Nb-displaying phagemids, the EGFP Nb sequence from pOPINE GFP nanobody (Addgene, catalog no. 49172) was amplified in 50-µl PCR reactions with Q5 (NEB) using 25 ng of the plasmid template and EF05 and EF06 primers. The EGFP Nb insert was cloned into pDXinit using FX cloning (described earlier), yielding pDXinit-EGFPNb. EGFP Nb-displaying phagemids containing RD1 in different orientations were cloned by using pDXinit-EGFPNb and performing site-directed mutagenesis (described earlier) with EF73 and EF74 to obtain pDXinit-EGFPNb-PAC or using EF75 and EF76 to yield pDXinit-EGFPNb-RD1(5-3). For introduction of a PCR handle required for PDT library amplification, pDXinit-EGFPNb-PAC was subjected to site-directed mutagenesis (as described earlier) using primers EF78 and EF79, yielding pDXinit-EGFPNb(handle)-PAC. For cloning of mCherry Nb-displaying phagemids, the mCherry Nb sequence from pGex6P1 mCherry Nb (Addgene, catalog no. 70696) was amplified in 50-µl PCR reactions with Q5 (NEB) using 25 ng of the plasmid template and EF07 and EF08 primers. The mCherry Nb insert was cloned into pDXinit using FX cloning (as described earlier), yielding pDXinit-mCherryNb. All constructs are listed in Supplementary Table 2.
Analysis of RD1-mediated phagemid amplification using RD1-containing primers
Of pDXinit-EGFPNb, pDXinit-EGFPNb-PAC or pDXinit-EGFPNb-RD1(5–3), 5 ng was subjected to linear PCR (10-µl reaction volume) using primer EF170 and 5 µl of 2× KAPA HiFi HotStart ReadyMix (Roche) and cycling conditions of: 98 °C for 2 min; 12 cycles at 98 °C for 10 s, 59 °C for 30 s and 72 °C for 1 min; and a final extension of 72 °C for 5 min. After completion, 0.625 µl of each primer EF147 and EF57, 1.25 µl of water and 12.5 µl of 2× KAPA were added. Nb-specific PCR was performed using: 98 °C for 3 min; 30 cycles at 98 °C for 15 s, 65 °C for 20 s and 72 °C for 1 min; and a final extension of 72 °C for 5 min. PCR using primers EF57 and EF58 and indicated plasmid templates was used as an amplification control.
Phage production
Phagemid-containing SS320 (Lucigen) cultures were incubated overnight in 2YT/2%/A/T at 37 °C and 240 r.p.m. Cultures were diluted 1:50 in 2YT/2%/A/T and grown for 2–3 h at 37 °C and 240 r.p.m. until the optical density at 600 nm (OD600) = 0.4–0.5. Bacteria, 5 ml, were then infected with 200 µl of M13K07 helper phage (NEB) and incubated for 60 min at 37 °C. Bacteria were collected by centrifugation and resuspended in 50 ml of 2YT containing 50 µg ml−1 of ampicillin and 25 µg ml−1 of kanamycin (2YT/A/K). Phages were produced overnight by incubation at 37 °C and 240 r.p.m. Cultures were centrifuged and phages were precipitated from supernatants by addition of a quarter volume of 20% poly(ethylene glycol)-6000/2.5 M NaCl solution and incubation on ice for 75 min. Phages were collected by centrifugation (17 min, 12,500g, 4 °C). Phage pellets were resuspended in 1.2 ml of phosphate-buffered saline (PBS), suspensions were cleared (5 min, 12,500g, 25 °C) and supernatants containing phages were stored.
Cell culture
NIH3T3 (American Type Culture Collection (ATCC), catalog no. CRL-1658) and HEK293T cells (ATCC, catalog no. CRL-3216) were maintained in Dulbecco’s modified Eagle’s medium (DMEM) containing 10% fetal bovine serum (FBS), 2 mM l-glutamine and 100 U ml−1 of penicillin–streptomycin (Thermo Fisher Scientific), and cultured at 37 °C and 5% CO2. For subculturing, medium was aspirated, cells were washed with PBS and detached with trypsin–ethylenediaminetetraacetic acid (EDTA) 0.25% (Thermo Fisher Scientific). Detachment reactions were stopped with culture medium and cells were seeded at the desired densities. Cell stocks were prepared by resuspending cell aliquots in FBS with 10% dimethyl sulfoxide and freezing them slowly at −80 °C. Frozen aliquots were then moved to liquid nitrogen for long-term storage. All cell lines were regularly tested for Mycoplasma contamination.
Plasmid transfection of HEK293T cells
A day before transfection, 2 × 106 HEK293T cells were seeded in 10-cm dishes (Corning) in complete culture medium (as described in Cell culture). Transfection was performed using GeneJuice reagent (Thermo Fisher Scientific). Then, 600 µl of Opti-MEM and 12 µl of GeneJuice were mixed in 1.5-ml tubes, vortexed shortly and spun down. Plasmid DNA, 4 µg (pCAG (Addgene, catalog no. 11160), pCAC-EGFP (Addgene, catalog no. 89684), pCAC-EGFP-GPI (Addgene, catalog no. 32601) or pHDM-SARS2 Spike-delta21 (Addgene, catalog no. 155130)), was added, and tubes were vortexed shortly and spun down. The transfection mix was added dropwise to HEK293T cells. Cells were grown for 24 h at 37 °C and 5% CO2 to allow transgene expression. Successful transfection was assessed by fluorescence microscopy on an EVOS M5000.
Flow cytometry for detection of phage binding
Harvested antigen-expressing cell lines or thawed PBMCs (for the harvest and thawing protocol, see PHAGE-ATAC workflow) were resuspended in cold flow cytometry buffer (FC buffer: PBS containing 2% FBS) and incubated with respective pNbs for 20 min on a rotator at 4 °C. Cells were centrifuged and washed with cold FC buffer twice to remove unbound phages (all centrifugation steps were 350g, 4 min, 4 °C). For optimization of fixation and lysis conditions, cells were fixed using either 0.1% or 1% formaldehyde (Thermo Fisher Scientific) and permeabilized with lysis buffers containing varying concentrations of NP-40, digitonin or Tween-20. Cells were resuspended in FC buffer and anti-M13 antibody (Sino Biological, catalog no. 11973-MM05T-50) was added at 1:500 dilution. After 10 min on ice, cells were washed twice in FC buffer and anti-mouse Fc Alexa Fluor-647-conjugated secondary antibody (Thermo Fisher Scientific, catalog no. A-21236) was added at 1:500 dilution. Cells were incubated for 10 min on ice, washed twice in FC buffer and resuspended in Sytox Blue (Thermo Fisher Scientific) containing FC buffer for live/dead discrimination according to the manufacturer’s instructions. In indicated cases, cells were stained with anti-CD4-FITC (clone OKT4, BioLegend) at 1:500 dilution; thereby no anti-M13 and anti-mouse Fc antibodies were used. Stained cells were analyzed using a CytoFLEX LX Flow Cytometer (Beckman Coulter) at the Broad Institute Flow Cytometry Facility. Flow cytometry data were analyzed using FlowJo software v.10.6.1.
CEACAM4 Nb selection and validation
CEACAM4 Nbs were selected by biopanning with phage display using a previously described Nb library32. Selected Nbs were expressed as Fc-fusion proteins and assessed for binding to recombinant CEACAM4 (Enquire Bio, catalog no. QP5812-ec) by ELISA. Briefly, 96-well MaxiSorp plates (Thermo Fisher Scientific, catalog no. 442404) were coated with 50 μl per well of recombinant CEACAM4 protein or bovine serum albumin (BSA; Thermo Fisher Scientific, catalog no. BP1600100) at 5 μg ml−1 in PBS and incubated overnight at 4 °C. After coating, plates were washed four times with buffer PT (PBS with 0.05% Tween 20), 200 μl of blocking solution (PBS with 1% casein) was added, plates were incubated for 1 h at room temperature and then washed again four times. Nbs were first diluted to 0.5 µM and then serially diluted by half logs in blocking solution. Diluted Nb, 50 μl, was added for 1 h at room temperature. Plates were washed four times and 50 μl of horseradish peroxidase anti-human immunoglobulin G antibody (BioLegend, catalog no. 410603, 1:5,000) diluted in blocking solution was added to each well. After 30 min of incubation at room temperature, plates were washed six times with PT and once with PBS. Plates were developed with 100 μl of TMB Substrate Reagent Set (BD Biosciences, catalog no. 555214) and the reaction was stopped after 5 min by the addition of 100 μl of 1 M sulfuric acid. Plates were then read at wavelengths of 450 nm and 570 nm.
PHAGE-ATAC workflow
For the cell line ‘species-mixing’ experiment, culture medium was aspirated, cell lines were washed with PBS, harvested using trypsin–EDTA 0.25% (Thermo Fisher Scientific), resuspended in DMEM containing 10% FBS, centrifuged, washed with PBS and resuspended in FC buffer (above). For PBMC and CD8 T-cell experiments, cryopreserved PBMCs or CD8 T cells (AllCells) were thawed, washed in PBS and resuspended in cold FC buffer. All centrifugation steps were carried out at 350g for 4 min and 4 °C unless stated otherwise.
Cells were incubated with phages on a rotating wheel for 20 min at 4 °C. After three washes in FC buffer, cells were fixed in PBS containing 1% formaldehyde (Thermo Scientific) for 10 min at room temperature. Fixation was quenched by addition of 2.5 M glycine to a final concentration of 0.125 M. Cells were washed twice in FC buffer and permeabilized using lysis buffer (10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40, 1% BSA) for 3 min on ice. This buffer was used because we found that standard 10× Genomics scATAC lysis buffer results in loss of pNb cell staining (Extended Data Fig. 4). After lysis, cells were washed by the addition of 1 ml of cold wash buffer (lysis buffer without NP-40), inverted and centrifuged (5 min, 500g, 4 °C). Supernatant was aspirated and the cell pellet was resuspended in 1× Nuclei Dilution Buffer (10× Genomics). Cell aliquots were mixed with Trypan Blue and counting was performed using a Countess II FL Automated Cell Counter. Processing of cells for tagmentation, loading of 10× Genomics chips and droplet encapsulation via the 10× Genomics Chromium controller microfluidics instrument were performed according to the Chromium Single Cell ATAC Solution protocol.
For PHAGE-ATAC detection of intracellular EGFP, harvested cells were resuspended in cold FC buffer and immediately fixed in PBS containing 1% formaldehyde (Thermo Fisher Scientific) for 10 min at room temperature. Fixation was quenched by the addition of 2.5 M glycine to a final concentration of 0.125 M. Cells were washed twice in PBS and permeabilized using lysis buffer (10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40, 1% BSA) for 5 min on ice. After lysis, cells were washed by the addition of 1 ml of cold wash buffer (lysis buffer without NP-40), inverted and centrifuged (5 min, 500g, 4 °C). Cells were resuspended in FC buffer and incubated with anti-EGFP phage on a rotating wheel for 20 min at 4 °C. After three washes in FC buffer, the supernatant was aspirated and the cell pellet was resuspended in 1× Nuclei Dilution Buffer (10× Genomics). Downstream processing of cells for PHAGE-ATAC was as described above.
For species mixing, a single 10× channel was ‘super-loaded’ with 20,000 cells. Linear amplification and droplet-based indexing were performed as described in the 10× ATAC protocol on a C1000 Touch Thermal cycler with 96-Deep Well Reaction Module (BioRad). After linear PCR, droplet emulsions were broken, barcoded products were purified using MyONE silane bead cleanup and eluted in 40 µl of elution buffer I (the Chromium Single Cell ATAC Solution protocol). At this point eluates were split for PDT and ATAC library preparation. Whereas 5 µl of eluate was used for PDT library preparation as described below, the remaining 35 µl of eluate was used for scATAC library generation (according to the Chromium Single Cell ATAC Solution protocol). Splitting samples at this point is not expected to result in a loss of library complexity because PDTs and ATAC fragments already underwent amplification via linear PCR.
The aliquot for PDT library preparation was used for PDT-specific PCR in a 100-µl reaction using 2× KAPA polymerase and primers EF147 and EF91; cycling conditions were: 95 °C for 3 min, 20 cycles at 95 °C for 20 s, 60 °C for 30 s and 72 °C for 20 s; and a final extension of 72 °C for 5 min. Amplified PDT products were purified by addition of 65 µl of SPRIselect beads (Beckman Coulter); 160 µl of supernatants were saved and incubated with 192 µl of SPRIselect. Beads were washed twice with 800 µl of 80% ethanol and the PDT library was eluted in 40 µl of buffer EB (QIAGEN).
The concentration of PDT libraries was determined and 15 ng was used for 100-µl indexing PCR reactions using 50 µl of Amp-Mix (10× Genomics), 7.5 µl of SI-PCR Primer B (10× Genomics) and 2.5 µl of i7 sample index-containing primers (10× Genomics); cycling conditions were: 98 °C for 45 s; 6 cycles at 98 °C for 20 s, 67 °C for 30 s and 72° for 20 s; and a final extension of 72 °C for 1 min. Indexed PDT libraries were purified by the addition of 120 µl of SPRIselect and eluted in 40 µl of buffer EB. The concentration of the final libraries was determined using a Qubit dsDNA HS Assay kit (Invitrogen) and size distribution was examined by running a High Sensitivity DNA chip on a Bioanalyzer 2100 system (Agilent).
PDT and ATAC libraries were pooled and paired-end sequenced (2 × 34 cycles) using Nextseq High Output Cartridge kits on a Nextseq 550 sequencer (Illumina). Raw sequencing data were demultiplexed with CellRanger-ATAC mkfastq. ATAC fastqs were used for alignment to the GRCh38 or mm10 reference genomes using CellRanger-ATAC count v.1.0.
ASAP-seq and PHAGE-ASAP of PBMCs
For ASAP-seq and PHAGE-ASAP, PBMCs were resuspended in cold FC buffer and first blocked with Human TruStain FcX (BioLegend) for 10 min at 4 °C. Cells were then either stained with TotalSeq-A antibodies (BioLegend; listed in Supplementary Table 5) or costained with both pNbs and TotalSeq-A antibodies by incubation on a rotating wheel for 20 min at 4 °C. Cells were then washed and processed as outlined above for PHAGE-ATAC with the following modifications: before droplet encapsulation and barcoding, 0.5 µl of 1 µM TotalSeq-A bridge oligo EF369 was added per 65 µl of Chromium ATAC barcoding master mix, as described previously9. After MyONE silane bead cleanup, samples were eluted in 43 µl of elution buffer I (the Chromium Single Cell ATAC Solution protocol) and 3 µl of eluate was used for ADT library preparation by PCR using primers EF147, and EF370, EF371 or EF372, to obtain indexed ADT libraries as reported9. For PHAGE-ASAP, 5 µl of eluate was used for PDT library preparation as described above.
Computational workflow for generation of PDT count matrices
PDT fastqs were obtained by running CellRanger-ATAC mkfastq on raw sequencing data and customized UNIX code was used to derive PDT-cell barcode count tables. Customized Python code (‘phage_to_kite-R3.py’) was used to reformat the 10× scATAC R1/R2/R3 file conventions into a paired-end read file format compatible with kallisto|bustools for quantification. Using a kmer length of 13 for CDR3 regions, PDT libraries were pseudo-aligned to a user-specific reference and per-cell counts were determined using error-corrected barcodes and bustools (up to one mismatch for both cell barcodes and CDR3 barcodes). Notably, as phages do not have unique molecular identifiers, we used a dummy poly(A) sequence for compatible processing.
Analysis of species-mixing PHAGE-ATAC experiment
PHAGE-ATAC-seq data from the species-mixing experiment were demultiplexed using CellRanger-ATAC mkfastq, and generated ATAC fastqs were processed with CellRanger-ATAC count to filter reads, trim adapters, align reads to both GRCh38 and mm10 reference genomes, count barcodes, identify transposase cut sites, detect accessible chromatin peaks and identify cutoffs for cell barcode calling. The ‘force-cells’ parameter was not set. Barcodes were classified as human or mouse if >90% of barcode-associated fragments aligned to GRCh38 or mm10, respectively. Cutoffs for cell barcode calling were >3,000 ATAC fragments overlapping peaks for human and >10,000 for mouse barcodes (based on empirical density). Doublet barcodes were defined as containing >10% ATAC fragments aligning to both GRCh38 and mm10 reference genomes. The EGFP PDT count table was generated as described above by searching PDT fastqs for the corresponding phage barcode (Supplementary Table 4) and deriving PDT-associated cell barcodes via filtering using the entire list of called cell barcodes (human and mouse).
After flow cytometry measurement of HEK293T-EGFP-GPI (EGFP+) and HEK293T cells (EGFP−), FCS files were exported using CytExpert Software (Beckman Coulter). Values for forward scatter (FSC area) and EGFP fluorescence (FITC area) were derived from FCS files. Human EGFP+ and EGFP− cells were defined based on the distribution of EGFP PDT counts (for PHAGE-ATAC) or EGFP fluorescence represented by FITC-area values (for flow cytometry) by setting a gate at the minimum value in between both populations.
Analysis of PBMC PHAGE-ATAC, ASAP-seq and PHAGE-ASAP experiments
Sequencing data from PHAGE-ATAC, ASAP-seq and PHAGE-ASAP libraries of PBMCs were processed using CellRanger-ATAC count to the GRChg38 reference genome using all default parameters, yielding 1,408 (PHAGE-ATAC), 5,654 (PHAGE-ASAP) and 4,806 (ASAP-seq) high-quality PBMCs, respectively (no filtering was applied beyond the CellRanger-ATAC knee call). Per-library ADTs9 and PDTs were computed using the processing pipelines described above. We further downloaded processed CITE-seq PBMC data8 from the Gene Expression Omnibus (GEO, accession no. GSE100866), which resulted in recovery of 7,660 PBMCs after removal of spiked-in mouse cells. This published dataset was jointly analyzed with the newly generated datasets described above. We performed data integration using canonical correlation analysis20 and the 2,000 most variable RNA genes is the default in Seurat. Next, we performed RNA imputation for the ATAC-seq data using Seurat v.3 with the default settings33. Reduced dimensions and cell clusters were inferred using this merged object via the first 25 canonical correlation components, with the default Louvain clustering in Seurat v.3. Centered log(ratio) (CLR)-normalized PDTs were visualized in the reduced dimension space and a per-tag, per-cluster mean was computed to further access staining correlation between the modalities (Fig. 2d).
Cell annotations were derived based on well-established marker genes for PBMCs (Extended Data Fig. 6h). For protein-based clustering and analyses, we identified T-cell clusters from the integrated embedding (using the chromatin/RNA data) and then further stratified them into subpopulations based on CD4 and CD8 PDT CLR (Extended Data Fig. 6d,f). Differential gene activity scores between these populations were then computed using the default functionality in Seurat/Signac (Wilcoxon’s rank-sum test). To compare the protein quantification of each modality, we utilized the labeled clusters of CD4 and CD8 T cells (computed using only accessible chromatin and RNA abundances) as positive labels and other cell types as negatives (thus, the labels are a function of clustering and are imperfect). Utilizing these per-cell positive and negative annotations, we determined the receiver operating curves for each protein in each modality (Fig. 2e).
To verify the high-quality capture of somatic mtDNA mutations in this experiment, sequencing reads aligning to chrM were processed using mgatk, as previously described11. A total of 518 high-quality variants were identified in at least one cell using the standard variant thresholds (variance mean ratio >−2; strand correlation >0.65), and the enrichment of nucleotide substitutions matched our previously identified patterns of strand-specific transitions11.
Analysis of cell hashing PHAGE-ATAC experiment
One channel of sequencing data from the hashed, combined, CD8-enriched T cells was processed using CellRanger-ATAC count via the GRCh38 reference genome and all default parameters, yielding 8,366 high-quality PBMCs (no filtering was applied beyond the CellRanger-ATAC knee call). As we suspected the presence of contaminating B cells, we first characterized cell states using latent semantic indexing (LSI)-based clustering and dimensionality reduction using Signac and Seurat33. Specifically, all detected peaks were used as input into LSI. The first 20 LSI components (except for the first component, which was found to be correlated with the per-cell sequencing depth) were used to define cell clusters using the default Louvain clustering algorithm in Seurat. Per-cluster chromatin accessibility tracks were computed using a per-million fragments abundance for each cluster, as previously implemented11. These chromatin accessibility tracks were used to annotate cell clusters based on promoter accessibility of known marker genes.
To assign hash identities to cell barcodes, we utilized the HTODemux function from Seurat23 with the positive.quantile parameter set at 0.98. This yielded 703 doublets, 1,225 negatives and 6,438 singlets based on the abundance and distribution of CD8 hashtag PDTs.
To verify PHAGE-ATAC hashtag-based assignments, we performed mtDNA genotyping using mgatk11, and nuclear genotyping and donor assignment using souporcell24 with ‘–min_alt 8 –min_ref 8 –no_umi True -k 4 –skip_remap True –ignore True’ options, which resulted in 92.9% accuracy (99.3% singlet accuracy, 74% overlap in called doublets), confirming the concordance of our hashing design.
Analysis of PBMC–HEK293T mixture PHAGE-ATAC experiment
Due to the low cell knee call by the default CellRanger-ATAC knee call (probably due to the mixture of PBMCs and HEK293T cells), we manually identified high confidence cells that had a TSS score >4 and at least 500 accessible chromatin fragments in peaks, yielding 4,690 cells. Using components 2–30 from LSI, we produced a dimensionality reduction and clustering with Signac33. PDTs were quantified using kallisto|bustools as described above for all phages used in the library.
Cloning of PANL, a synthetic high-complexity pNb library
To generate randomized library inserts, three separate primer mixes (for long CDR3, medium CDR3 and short CDR3 inserts) were used for PCR-mediated assembly. For short CDR3 inserts, the primer mix contained 0.5 µl of each of polyacrylamide gel electrophoresis-purified EF42, EF43, EF64, EF44, EF65, EF45, EF46, EF47, EF66 and EF48 (each 100 µM) (EllaBiotech). For medium CDR3 inserts, EF67 was used instead of EF66. For long CDR3 inserts, EF68 was used instead of EF66. Primer mixes were diluted 1:25 and 1 µl of each mix was used for overlap-extension PCR using Phusion (NEB). Four 50-µl reactions for each mix were performed using the following cycling conditions: 98 °C for 1 min; 20 cycles at 98 °C for 15 s, 60 °C for 30 s and 72° for 30 s; and a final extension of 72 °C for 5 min. PCR reactions of the same mix were pooled and purified by addition of 280 µl of AMPure XP beads (Beckman Coulter). The beads were washed twice with 800 µl of 80% ethanol and assembled inserts were eluted in 100 µl of water. Concentrations of each insert (long, medium, short) were determined and pooled in a 1:2:1 molar ratio. Five identical 50-µl PCR reactions with pooled inserts and primers EF40 and EF41 were performed using Phusion (NEB), with the following cycling conditions: 98 °C for 1 min; 30 cycles at 98 °C for 15 s, 62 °C for 30 s and 72° for 30 s; and a final extension of 72 °C for 5 min. The amplified library insert was pooled and purified by adding 350 µl of AMPure XP beads (Beckman Coulter). Beads were washed twice with 1 ml of 80% ethanol and the library insert was eluted in 60 µl of water. Five identical 60-µl restriction digest reactions for the digest of 7.5 µg of library vector pDXinit-PAC with 2.5 µl of SapI were performed. Library insert (4.8 µg) was digested in a 30-µl reaction using 2.5 µl of SapI. Digests were incubated for 4 h at 37 °C and loaded on to 1% agarose gels. Bands corresponding to digested library vector and insert were cut and products were extracted using a GeneJet Gel Extraction Kit (Thermo Fisher Scientific) and eluted in 40 µl of water. Five identical 100-µl ligation reactions were performed, each containing 1.25 µg of digested pDXinit-PAC, 450 ng of digested insert and 0.5 µl of T4 ligase (NEB). Ligations were incubated for 16 h at 16 °C, heat inactivated for 20 min at 65 °C and cooled to room temperature. Then, 100 µl of AMPure XP beads was added to each ligation reaction, the beads were washed twice using 300 µl of 80% ethanol and ligation products were eluted in 15 µl of water, and pooled. Five electroporations in 2-mm cuvettes (BioRad) were performed, each using 90 µl of electrocompetent SS320 E. coli (Lucigen) and 12 µl of ligation product. Pulsing was performed on a GenePulserXcell instrument (BioRad) with parameters 2.5 kV, 200 Ω and 25 µF. After electroporation, bacterial suspensions were added to 120 ml of prewarmed super optimal broth with catabolite repression (SOC) and incubated for 30 min at 37 °C and 225 r.p.m. An aliquot of library-carrying bacteria was saved at this point and used to prepare a dilution series. Each dilution was plated on LB-Amp plates. After overnight incubation at 37 °C, colonies were counted, transformation efficiency was determined and library complexity was estimated. The remaining 120 ml of library-containing culture were added to 1.125 l of 2YT medium containing 2% glucose, 50 µg ml−1 of ampicillin and 10 µg ml−1 of tetracycline (2YT/2%/A/T) and incubated overnight at 37 °C and 240 r.p.m. The library-containing culture was harvested, glycerol stocks were prepared and library aliquots were stored.
Analysis of picked PANL clones using PCR and Sanger sequencing
Library-containing bacteria were plated on LB-Amp, incubated overnight, and colonies were picked and inoculated in 8 ml of LB-Amp. Cultures were incubated for at least 8 h at 37 °C and 240 r.p.m. Bacteria were harvested and plasmids isolated using GeneJet Plasmid Miniprep kit (Thermo Fisher Scientific). PCR was performed to evaluate clone inserts; 10-µl PCR reactions were set up that contained 10 ng of isolated plasmid, 0.5 µl each of primers EF52 and EF53, and 4.5 µl of 2× OneTaq Quick Load Master Mix (NEB). The cycling conditions were: 94 °C for 4 min; 28 cycles at 94 °C for 15 s, 62 °C for 15 s and 68 °C for 30 s; and a final extension at 68 °C for 5 min. PCR reactions were analyzed on 2% agarose gels. Selected clones were analyzed by Sanger sequencing using primer EF17.
Phage Nb library production
A PANL aliquot corresponding to 3 × 1010 bacterial cells (around 5× coverage of the library) was transferred to 200 ml of 2YT/2%/A/T and cultures were grown until OD600 = 0.5 was reached (~2 h). Cultures were infected with 8 ml of M13K07 helper (NEB) for 60 min at 37 °C. They were then harvested, supernatants discarded and bacterial pellets resuspended in 1 l of 2YT/A/K. Cultures were incubated overnight at 37 °C and 250 r.p.m. for production of the input library of pNb particles. Bacterial cultures were harvested, supernatants collected and phages precipitated using poly(ethylene glycol)/NaCl as described earlier. Final phage pellets were resuspended in a total of 20 ml of PBS and stored. Phage titers were determined by infecting a log(phase culture) of SS320 with a dilution series of the produced phage library and plating bacteria on LB-Amp. Colonies were counted and titers calculated. Produced phage libraries were characterized by titers >4 × 1011 plaque-forming units (p.f.u.) ml−1.
Phage display selection
HEK293T cells were transiently transfected with either pCAG or pCAG-EGFP-GPI for 24 h as described above. Cells were harvested, 107 pCAG-transfected cells were resuspended in 1 ml of PBS containing 2% BSA (PBS–BSA) and 8 ml of the PANL library (1.6 × 1012 p.f.u.) in PBS–BSA was added for counterselection. Samples were incubated for 1 h on a rotating wheel at 4 °C and then centrifuged at 350g for 5 min at 4 °C. Supernatants containing phages were added to 107 pCAG-EGFP-GPI-expressing cells for positive selection. After 1 h on a rotating wheel at 4 °C, samples were centrifuged (350g, 5 min and 4 °C) and washed six times with PBS–BSA to remove unbound phages. Cells were washed once in PBS and centrifuged, and cell pellets were resuspended in 500 µl of trypsin solution (1 mg ml−1 of trypsin (Sigma-Aldrich) in PBS) to elute bound phages. Cells were incubated for 30 min on a rotating wheel at room temperature and digests were stopped by the addition of AEBSF protease inhibitor (Sigma-Aldrich) to a final concentration of 0.5 mg ml−1. Samples were centrifuged (400g and 4 min at room temperature) and the supernatant containing eluted phages was used to infect 10 ml of log(phase SS320) (OD600 = 0.4). After infection for 40 min at 37 °C, cultures were added to 90 ml of 2YT/2%/A/T and incubated overnight at 37 °C and 250 r.p.m. Cultures containing output libraries were aliquoted and glycerol stocks were prepared. Output library phage particles were prepared as described earlier for PANL and used in subsequent selection rounds using the same protocol described here.
Sequencing of PANL and selection output libraries
Bacterial cultures harboring phagemid libraries were grown overnight at 37 °C and 240 r.p.m. in 50 ml of LB containing 2% glucose and 50 µg ml−1 of ampicillin. Bacteria were harvested and plasmids isolated using ZymoPURE II Plasmid Midiprep Kit (Zymo Research). A first PCR was performed to amplify Nb inserts; 100-µl PCR reactions were set up that contained 100 ng of isolated plasmid library, 2.5 µl of primer mix EF235–EF241 and 2.5 µl of mix EF249–EF255, and 50 µl of 2× KAPA HiFi HotStart ReadyMix (Roche). The cycling conditions were: 95 °C for 3 min; 16 cycles at 95 °C for 20 s, 60 °C for 30 s and 72 °C for 20 s; and a final extension of 72 °C for 5 min. Nb amplicons were purified by addition of 120 µl of SPRIselect beads (Beckman Coulter), beads were washed twice with 200 µl of 80% ethanol and Nb product libraries were eluted in 40 µl of buffer EB (QIAGEN).
Concentration of amplicon libraries was determined and 20 ng was used for 100-µl indexing PCR reactions using 50 µl of 2× KAPA HiFi HotStart ReadyMix (Roche), 2.5 µl of primer EF242 and 2.5 µl of primer EF256; cycling conditions were: 95 °C for 45 s; 6 cycles at 95 °C for 20 s, 67 °C for 30 s and 72° for 20 s; and a final extension of 72 °C for 1 min. Indexed amplicon libraries were purified by the addition of 120 µl of SPRIselect and eluted in 40 µl of buffer EB. The concentration of the final libraries was determined using a Qubit dsDNA HS Assay kit (Invitrogen) and size distribution was examined by running a High Sensitivity DNA chip on a Bioanalyzer 2100 system (Agilent). Amplicon libraries were pooled and paired-end sequenced (read 1: 96 cycles, read 2: 184 cycles) on a MiSeq sequencer (Illumina).
Analysis of phagemid library sequencing experiments
Customized Python code (‘process_phage_library_construct.py’) was written to parse out the variable CDR1, CDR2 and CDR3 sequences using a positional sequence logic relative to constant regions in the PANL library design. Sequencing reads where constant regions could not be identified (up to two mismatches) were discarded, noting that all libraries had at least 90% parsing efficiency (range: 90–94%). Library complexity was estimated from annotated duplicate reads with identical variable CDR sequences using the Lander–Waterman method34. Nucleotide sequences were converted to amino acid sequences using a standard codon dictionary. To account for sequencing errors in annotating clones, we determined clones based on a rank ordering of sequences and collapsed any sequencing within a Hamming distance of 2 (based on nucleotide identity of the variable CDR1, CDR2 and CDR3 sequences). Per-position amino acid frequencies were estimated using the top 1,000 collapsed clones.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

