Study subjects
Peripheral blood was drawn from healthy controls who were recruited as part of an institutional review board-approved study at Yale University, and written consent was obtained. All experiments conformed to the principles set out in the World Medical Association’s Declaration of Helsinki and the Department of Health and Human Services Belmont Report.
Human T cell isolation and culture
Peripheral blood mononuclear cells (PBMCs) were prepared from whole blood by Ficoll gradient centrifugation (Lymphoprep, STEMCELL Technologies) and used directly for total T cell enrichment by negative magnetic selection using Easysep magnetic separation kits (STEMCELL Technologies). Cell suspension was stained with anti-CD4 (RPA-T4), anti-CD8 (catalog no. RPA-T8), anti-CD25 (clone 2A3), anti-CD45RO (catalog no. UCHL1), anti-CD45RA (catalog no. HI100) and anti-CD127 (catalog no. hIL-7R-M21) (all from BD Biosciences) for 30 min at 4 °C. Naive CD4+ T cells (CD4+/CD25neg/CD127+/CD45ROneg/CD45RA+) and naive CD8+ T cells (CD8++/CD25neg/CD127+/CD45ROneg/CD45RA+) were sorted on a FACSAria (BD Biosciences). Sorted cells were plated in 96-well, round-bottomed plates (Corning) and cultured in RPMI 1640 medium supplemented with 5% human serum, 2 nM l-glutamine, 5 mM Hepes, 100 U ml−1 of penicillin, 100 μg ml−1 of streptomycin, 0.5 mM sodium pyruvate, 0.05 mM nonessential amino acids and 5% human AB serum (Gemini Bio-Products). Cells were seeded (30,000–50,000 per well) into wells pre-coated with anti-human CD3 (2 μg ml−1, clone UCHT1, BD Biosciences) along with soluble anti-human CD28 (1 μg ml−1, clone 28.2, BD Biosciences) in the presence or absence of human IFN-β (500 U ml−1: Pestka Biomedical Laboratories) or IL-27 (100 ng ml−1: BioLegend) without adding IL-2. To avoid interindividual batch effects, T cells used for RNA-seq and ATAC-seq were isolated from one individual (37-year-old, healthy man with no evidence of viral infection, cancer or autoimmune diseases at the time of sample collection).
Lentiviral and Vpx-VPL production
Lentiviral plasmids encoding shRNA for gene knockdown or open reading frame (ORF) of STAT5A for overexpression were obtained from Sigma-Aldrich (MISSION shRNA, Sigma-Aldrich) and Horizon Discovery Biosciences (Precision LentiORF), respectively. Scrambled shRNA (Sigma-Aldrich; for perturbation experiment) or GFP control vectors (Origene, for protein validation experiment; Horizon Discovery Bioscience, for overexpression experiment) are used as the controls for lentiviral transduction. Each plasmid was transformed into One Shot Stbl3 chemically competent cells (Invitrogen) and purified by ZymoPURE plasmid Maxiprep kit (Zymo Research). Lentiviral pseudoparticles were obtained after plasmid transfection of 293T cells using Lipofectamine 2000 (Invitrogen) or TurboFectin 8.0 Transfection Reagent (Origene). To prepare Vpx-VLPs, 293T cells were cotransfected by Lipofectamine 2000 or TurboFectin 8.0 Transfection Reagent with the 5 μg of pMDL-X, 2.5 μg of pcRSV-Rev, 3.5 μg of X4-tropic HIV Env and 1 μg of pcVpx/myc, as described previously with some modifications37,38. The medium was replaced after 6–12 h with fresh medium with 1× viral boost (Alstem). The lentivirus- or Vpx-VLP-containing medium was harvested 72 h after transfection and concentrated 80× using Lenti-X concentrator (Takara Clontech) or Lenti Concentrator (Origene). Lentiviral particles were then resuspended in RPMI 1640 medium without serum and stored at −80 °C before use. Virus titer was determined by using Jurkat T cells and Lenti-X GoStix Plus (Takara Clontech).
Lentiviral transduction with Vpx-VPLs
Two-step Vpx-VLP and lentiviral transduction were performed as described previously with some modiciations36. Vpx are pseudotyped with X4-tropic HIV Env to promote efficient entry of Vpx-VLPs into quiescent human T cells37. FACS-sorted naive CD4+ T cells were plated at 50,000 cells per well in round-bottomed, 96-well plates and chilled on ice for 15 min. Then 25–50 μl of Vpx-VLPs was added to each well and mixed with cold cells for an additional 15 min, then spinfected with high-speed centrifugation (1,200g) for 2 h at 4 °C. Immediately after centrifugation, cells are cultured overnight at 37 °C. Vpx-transduced cells are spinoculated again with lentiviral particles containing shRNAs or ORF with high-speed centrifugation (1,000g) for 1.5 h at room temperature. After 24 h of incubation, the second transduction of lentiviral particles with shRNAs or ORF was performed as well as the first-time spinoculation. After a second lentiviral transduction, cells were washed and plated into 96-well, round-bottomed plates pre-coated with anti-human CD3 (2 μg ml−1) and soluble anti-human CD28 (1 μg ml−1), in the presence or absence of human IFN-β (500 U ml−1). Cells were collected at days 4–6 after anti-CD3/CD28 stimulation and GFP-positive cells were sorted by FACSAria or analyzed by Fortessa.
Real-time qPCR
Total RNA was extracted using RNeasy Micro Kit (QIAGEN) or ZR-96 Quick-RNA kit (Zymo Research), according to the manufacturer’s instructions. RNA was treated with DNase and reverse transcribed using TaqMan Reverse Transcription Reagents (Applied Biosystems) or SuperScript IV VILO Master Mix (Invitrogen). Complementary DNAs were amplified with Taqman probes (Supplementary Table 2) and TaqMan Fast Advanced Master Mix on a StepOne Real-Time PCR System (Applied Biosystems) according to the manufacturer’s instructions. Relative mRNA expression was evaluated after normalization with B2M expression.
Flow cytometry analysis
Cells were stained with a LIVE/DEAD Fixable Near-IR Dead Cell Stain kit (Invitrogen) and surface antibodies for 30 min at 4 °C. For intracellular cytokine staining, cells were treated with 50 nM phorbol-12-myristate-13-acetate (MilliporeSigma) and 250 nM ionomycin (MilliporeSigma) for 4 h in the presence of Brefeldin A (BD Biosciences) before harvesting. Cells were washed and fixed with BD Cytofix Fixation Buffer (BD Biosciences) for 10 min at room temperature, then washed with phosphate-buffered saline. Intracellular cytokines were stained in permeabilization buffer (eBioscience) for 30 min at 4 °C. Antibody details are provided in Supplementary Table 3. Cells were acquired on a BD Fortessa flow cytometer with FACSDiva (BD Pharmingen) and data were analyzed with FlowJo software v.10 (Threestar).
RNA-seq library preparation and data analysis
Then, 10,000 cells per condition were subjected to RNA-seq. The cDNAs were generated from isolated RNAs using SMART-Seq v.4 Ultra Low Input RNA Kit for sequencing (Takara/Clontech). Barcoded libraries were generated by the Nextera XT DNA Library Preparation kit (Illumina) and sequenced with a 2 × 100 bp paired-end protocol on the HiSeq 4000 Sequencing System (Illumina).
After sequencing, adapter sequences and poor-quality bases (quality score <3) were trimmed with Trimmomatic. The remaining bases were trimmed if their average quality score in a 4-bp sliding window fell <5. FastQC was used to obtain quality control metrics before and after trimming. Remaining reads were aligned to the GRCh38 human genome with STAR 2.5.2 (ref. 46). We used Picard to remove optical duplicates and to compile alignment summary statistics and RNA-seq summary statistics. After alignment, reads were quantified to gene level with RSEM (RNA-seq by expectation–maximization)47 using the Ensembl annotation. Normalized fragments per kilobase million (FPKM) expression values of the three replicates were used for the analysis below.
Identification of three transcriptional waves
The correlation matrix is created by Pearson correlating48 the IFN-β expression profile of each timepoint with all the other timepoints, creating a symmetrical matrix of Pearson’s correlation coefficients.
Differential expression calculation
The differential expression49 for each timepoint was calculated using the DEseq2 (ref. 50) R package. Three separate testing methods were used for calculating DEGs: Wald51, likelihood ratio test52 and time course53. For each of the three methods, genes with a false discovery rate (FDR)-adjusted P value <0.05 were regarded as differentially expressed. More specifically, for CD4+ T cells, if TFs appeared in two of the three calculating methods (agree by two), they were regarded as differentially expressed. For CD8+ T cells, if TFs appeared in any of the methods above, they were regarded as differentially expressed.
Selection of regulators for perturbation
The list of TFs for perturbation was selected based on the following criteria: (1) overlapped DETFs across the timepoints between CD4+ and CD8+ T cells based on the above-mentioned method; intersection of DETFs in our in vitro data were chosen; (2) DETFs in human TILs22,23,24,25 that were significantly correlated with exhausted T cell cluster (defined by the upregulation of LAG-3/PD-1/TIM-3): ‘human TIL score’, defined as the number of times a gene was shared between the four different human cancer TIL datasets; (3) HIV-specific T cell signature upregulated in progressive patients compared with stable patients26; and (4) TFs that were induced by IL-27 and categorized as IL-27-driven coinhibitory receptor modules1; ‘human ISG score’ is defined as the number of times it was shared across the three different categories (T cells, PBMCs and all immune cells) of human ISGs identified by the Interferome database. All perturbed TFs were confirmed as IFN-I-induced genes and showed ‘human ISG score’ >1.
DEGs heatmap
Expression data of all timepoints under IFN-β treatment expression were divided by the control expression data, creating fold-change ratio data of treatment over control. For visualization purposes, the fold-change data in Fig. 2a were then shown for the DEGs only, logged to a base of 10 and scaled per row. The results were clustered hierarchically. In Fig. 2c specific genes of interest were picked, such as ISGs, coinhibitory receptors, T cell-related genes and TFs.
Heatmap of perturbed TFs
Twenty-one TFs were perturbed using lentiviral shRNA, together with a scrambled shRNA (SCR) as control. ISGs (IFN score B54) and coinhibitory receptors were selected to visualize the expression differences in the two modules, in Extended Data Fig. 5b,c. The effect of TF perturbation on IFN-β response for each gene of interest (GOI) was calculated as follows: for each shRNA perturbation, the effect of IFN-β treatment over control condition was calculated for GOIs. This fold-change value was then logged to a base of 2. To normalize the effect of the shRNA protocol, we further normalized the data by the SCR, in an identical manner to the shRNA perturbation described above. Finally, the log2(transformed shRNA values) were subtracted by the log2(transformed SCR values) as shown in the equation bellow:
$$begin{array}{l}mathop {{log }}nolimits_2 left( {frac{{{mathrm{GOI}},{mathrm{expression}},{mathrm{in}},{mathrm{perturbed}},{mathrm{TF}} + {{{mathrm{IFN}}}}beta ,{{{mathrm{treatment}}}}}}{{{mathrm{GOI}},{mathrm{expression}}}},{{{mathrm{in}},{mathrm{perturbed}},{mathrm{TF}}}}} right)\ – mathop {{log }}nolimits_2 left( {frac{{{mathrm{GOI}},{mathrm{expression}},{mathrm{in}},{mathrm{SCR}} + {{{mathrm{IFN}}}}beta ,{{{mathrm{treatment}}}}}}{{{mathrm{GOI}},{mathrm{expression}},{mathrm{in}},{mathrm{SCR}}}}} right)end{array}$$
For the heatmap visualization, the normalized values calculated by the equation above were clustered hierarchically.
Differential expression analysis for each TF perturbation was conducted as follows: perturbed TF + IFN-β treatment expression versus perturbed TF untreated expression. DEGs in this analysis that were regarded as significant (FDR-adjusted P < 0.05) were marked with a white plus sign on their tile.
PCA and PCA biplot analysis
The PCA was applied to the DEGs (defined above) as variables and perturbed genes as observations. The data were normalized by the SCR control, in the same manner as the heatmap of perturbed TFs. Although PDCD1 was not defined as a DEG across the time course, it was differentially expressed at later timepoints in mRNA-seq and qPCR, so we manually added it to Fig. 5f. Genes of IFN score B54 (Supplementary Table 1) were represented as ISGs in Fig. 3e. The PCA and biplot analysis were calculated and visualized using the R package FactoMineR55.
Regulatory networks
After the differentially expressed analysis in each transcriptional wave, the DEGs were defined as TFs and their targets (from → to). The targets of all DETFs were determined using chromatin immunoprecipitation with sequencing (ChIP-seq) data from the database GTRD56. TFs and target genes defined as differentially expressed were added to the network as nodes (circles) and edges (lines) were added between them. The network plots were created using the Cytoscape57 software.
Top regulatory TFs heatmaps
We ranked the DETFs of each transcriptional time wave to identify the dominant regulators in each stage of the differentiation process using two methods. HG values were the values in the heatmap that are the log(P) value of the HG enrichment test –log10(P value). The HG calculation was conducted using the python SciPy package58. The second method used network Cent, which is a parameter that is given to each node, based on the shortest path from the node to the other nodes in the network. It represents how central and connected a node is to the rest of the network59. The Cent calculation was conducted using the python NetworkX60 package. The rank column is an average of both HG and Cent values, after scaling to the range of 0–1.
Integration of perturbation data to regulatory networks
After the generation of the regulatory network mentioned above, we further refined the network by integrating the TF perturbation data. Genes that were significantly affected by a TF perturbation were added as ‘validated’ edges between the perturbed TF and the target gene. If a gene was upregulated by a TF perturbation, the interaction between them was registered as downregulation. If a gene was downregulated by a TF perturbation, the interaction between them was registered as upregulation.
Reduced TF networks
For visualization purposes we used a network reduction algorithm. The algorithm presents all the TFs in the network (nodes) but reduces some of the connections between them (edges). The reduced TF networks were calculated using the NetworkToolbox61 package in Rstudio. The method of the reduction calculation was ECOplusMaST32, which applies the efficiency cost optimization, neural network-filtering method combined with the Maximum Spanning Tree-filtering62 method.
The barplots depicted at the bottom represent the mean level of the degree63 value for either up- or downregulated TFs, in the complete networks (not reduced ones). The degree of a node is the number of connections that it has to other nodes. An independent two-way Student’s t-test was conducted between the degree values of the up and down TFs for each time wave. The P value of each test is presented in the caption of each barplot.
Bridging TFs network
DETFs and their target genes from the three transcriptional waves were combined to create a comprehensive network emphasizing the dynamics between transcriptional waves. In this network TFs and their target genes were annotated by the transcriptional wave in which they appear as differentially expressed. TFs that participate in more than one transcriptional wave are regarded as bridging TFs.
ATAC-seq library preparation and data analysis
Cells were split into RNA-seq (10,000 cells) and ATAC-seq (15,000 cells) at the time of collection. Cells for ATAC-seq were cryopreserved using Recovery Cell Culture Freezing Medium (Gibco) with a concentration of 15,000 cells per 100 μl. Cryopreserved cells were thawed and immediately proceeded to Tn5 digestion (50 μl of transposase mixture: 25 μl of 2× TD buffer, 2.5 μl of TDE1 (both from Illumina), 0.5 μl of 1% digitonin (Promega) and 22 μl of nuclease-free water)64. Transposition reactions were incubated at 37 °C for 30 min in a thermal shaker with agitation at 300 r.p.m. Transposed DNA was purified using a Zymo DNA Clean and Concentrator-5 Kit (Zymo Research) and purified DNA was eluted in 10 μl of elution buffer. Transposed fragments were amplified and purified as described previously64 with modified primers65. Libraries were quantified using qPCR (KAPA Library Quantification Kit) before sequencing. All ATAC libraries were sequenced using a 2× 150-bp paired-end protocol on the Nova-seq Sequencing System (Illumina).
ATAC-seq alignment, feature calling and quantification
We trimmed adapter sequences before aligning reads to the GRCh38 human genome assembly with Bowtie 2. We used FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) to obtain sequence quality control metrics both before and after read trimming. We obtained alignment summary metrics with Picard tools (http://broadinstitute.github.io/picard). We retained reads that mapped uniquely to the primary assembly with mapping quality ≥30 and removed optical duplicates. We shifted cut sites by +4 bp and −5 bp on positive and negative strands, respectively.
We used MACS2 (ref. 66) to call ATAC-seq peaks within individual samples using a q < 0.2 threshold. For each library, we use ataqv67 to calculate transcription start site enrichment and the fraction of reads in peaks. We excluded peaks within ENCODE blacklist regions.
We used Markov clustering68 to identify a preliminary set of consensus features across libraries. Edges were weighted by the proportion of overlapping bases between each pair of peaks. Peaks that did not overlap with at least one peak from another library were excluded from further analysis. After clustering, we defined preliminary clusters to be the maximal (union) coordinates of all constituent peaks.
We used a two-state hidden Markov model to identify the central, maximally replicated portion of each preliminary feature. Consensus features were defined as the maximal extent of all ‘open’ segments within each cluster.
We used featureCounts69 to quantify the number of reads within each peak for each library. Outliers were identified through PCA.
ATAC-seq sample PCA
The PCA was conducted on ATAC-seq peaks from 19 CD4+ T cell samples and 21 CD8+ T cell samples. Two samples were removed from the CD4+ samples due to large variability relative to the other replicates.
Differential chromatin accessibility
Differentially accessible ATAC peaks were calculated using DEseq2 R package. Peaks with FDR-adjusted P < 0.05 and fold-change of ±1 are regarded as differentially accessible peaks.
TF footprint analysis
We used TOBIAS30 to identify TF footprints within accessible chromatin features. We pooled libraries by cell type (CD4+ and CD8+ T cells) and treatment status (control and IFN-β) at 2-, 8- and 72-h timepoints. Within each pool, we corrected for Tn5 cut site bias and calculated footprint scores with TOBIAS. We used TOBIAS to identify TF motifs differentially footprinted at each timepoint.
Reanalysis of COVID-19 scRNA-seq data
A PBMC scRNA-seq dataset of 10 COVID-19 patients and 13 matched controls, which had been previously performed and reported by us36, was reanalyzed. We have described the full cohort and detailed methods36. From eight of the ten COVID-19 samples, PBMCs from two different timepoints had been analyzed. Four of the COVID-19 patients had been classified as progressive and the other six as stable. Briefly, single-cell barcoding of PBMCs and library construction had been performed using the 10× Chromium NextGEM 5prime kit. Libraries had been sequenced on an Illumina Nova-seq 6000 platform. Raw reads had been demultiplexed and processed using Cell Ranger (v.3.1) mapping to the GRCh38 (Ensembl 93) reference genome. The resulting gene–cell matrices had been analyzed using the package Seurat63 in the software R, including integration of data, clustering, multiplet identification and cell-type annotation. The final annotated R object was used and reanalyzed in Seurat with default settings—unless otherwise specified—as follows.
The three cell populations ‘Dividing T & NK’, ‘Effector T’ and ‘Memory CD4 & MAIT’ were each subsetted and reclustered to obtain a finer cell-type granularity because they included a mix of CD4, CD8, MAIT and γδ T cells. Per subset, the top 500 variable genes were determined by the ‘FindVariableFeatures’ function using the ‘vst’ method. Data were scaled using the ‘ScaleData’ function regressing out the total number of unique molecular identifiers (UMIs) and the percentage of UMIs arising from the mitochondrial genome. After PCA, the first ten PCs were utilized to detect the nearest neighbors using the ‘FindNeighbors’ function and clustered by Seurat’s Louvain algorithm implementation ‘FindClusters’ using a resolution of 0.2 for ‘Dividing T & NK’, 0.3 for ‘Effector T’ and 0.1 for ‘Memory CD4 & MAIT’ subsets. Cluster-specific gene expression profiles were established using the ‘FindAllMarkers’ per cluster and per subset to annotate the clusters. New cell-type annotations were then transferred back to the full dataset.
A new Uniform Approximation and Projection (UMAP) embedding was created by integrating the datasets on a subject level as follows: a subset containing all T cells was generated, which was then split by subject. For each subject, the top 2,000 variable genes were selected, then integration anchors determined by ‘FindIntegrationAnchors’ (with k.filter = 150). These anchors were used to integrate the data using the ‘IntegrateData’ function with the top 30 dimensions. The integrated data were scaled, subjected to a PCA and the top 13 PCs used as input for the ‘RunUMAP’ function on 75 nearest neighbors.
Module scores were calculated using the ‘AddModuleScore’ function using (1) all genes within the gene ontology list ‘RESPONSE TO TYPE I INTERFERON’ (GO:0034340)64 and (2) all genes significantly associated with either of the three waves in our in vitro perturbation experiments (Supplementary Table 1). Differential gene expression was established using Seurat’s implementation of Wilcoxon’s rank-sum test within the ‘FindMarkers’ function with Bonferroni’s correction for multiple testing.
Statistical analysis
Detailed information about statistical analysis, including tests and values used, is provided in the figure legends. P values ≤0.05 were considered to be significant.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

