Mice
C57BL/6 J mice were purchased from Beijing HFK Bioscience Co. Ltd (Beijing, China), housed and bred under SPF condition (Specific Pathogen Free) at Laboratory Animal Center of West China Second University Hospital and were allowed access to diet and water ad libitum. All animal experiments were carried out following the protocols approved by the Institutional Animal Care and Use Committee of West China Second University Hospital [(2018) Animal Ethics Approval No.004].
Sample collection
The sample preparation and the overall workflow are summarized in Fig. 1. Female adult mice between 8-9 weeks were anesthetized and sacrificed by cervical dislocation. Bone marrow cells were isolated from the femur and tibia through a 70 μm cell strainer (BD Falcon). Hematopoietic stem and progenitor cells (HSPCs) were first enriched using EasySep™ Mouse Hematopoietic Cell Isolation Kit (STEMCELL, Cat No. 19856) with a lineage cocktail (biotinylated-CD11b, B220, Gr-1, TER-119 and CD3e) to remove mature cells, according to the manufacturer’s instructions.


Sample collection and workflow. (a) Sample collection and sequencing methods. Bone marrow cells (BMC) were isolated, and lineage negative (LIN−) cells were enriched using magnetic beads. Long-term (LT) and short-term (ST) hematopoietic stem cells (HSC) and multipotent progenitors (MPP) were sorted according to their surface makers: LT-HSC (Sca-1+c-Kit+CD34− CD135−), ST-HSC (Sca-1+ c-Kit+CD34+ CD135−) and MPP (Sca-1+ c-Kit+CD34+ CD135+). Single-cell or 100 cells (bulk P100) were sorted for library construction following the Smart-seq2 protocol. The cDNA libraries were used for short-read (Illumina Hiseq) or long-read (Nanopore or PacBio) sequencing. (b) Gating strategy for cell sorting. The main population was gated via FSC-A (forward scatter) and SSC-A (side scatter), and single cells were gated via FSC-A and FSC-H. FSC, forward scatter; SSC, side scatter; A, area; H, height.
To purify LT/ST-HSC and MPP, cells were sorted on a BD FACSAria SORP according to the following markers: LT-HSC [lineage (LIN)−Sca-1+c-Kit+CD34−CD135−], ST-HSC (LIN−Sca-1+c-Kit+CD34+CD135−) and MPP (LIN−Sca-1+c-Kit+CD34+CD135+). The following antibodies were used for staining (all purchased from BD): PerCP/Cyanine5.5 anti-CD11b Antibody (Clone M1/70, Cat No. 550993), PerCP/Cyanine5.5 anti-CD3e (Clone 145-2C11, Cat No. 551163), PerCP/Cyanine5.5 anti-TER119 (Clone TER119, Cat No. 560512), PerCP/Cyanine5.5 anti-Gr-1 (Clone RB6-8C5, Cat No. 552093), PerCP/Cyanine5.5 anti-B220 (Clone RA3-6B2, Cat No. 552771), BV421 anti-CD135 (Clone A2F10.1, Cat No. 562898), PE/Cyanine7 anti-Sca-1 (Clone D7, Cat No. 558162), APC/Cyanine7 anti-CD117 (c-Kit) (Clone 2B8, Cat No. 560185), FITC anti-CD34 (Clone RAM34, Cat No. 553733), Fixable Viability Stain 510 (BD Pharmingen, Cat No. 564406). Data were analyzed using FlowJoV10.7.1 software.
For single-cell RNA-sequencing (scRNA-seq), cells were individually sorted into 8-Strip PCR tubes containing the lysis buffer. For bulk RNA-seq, 100 cells (P100) were sorted into one PCR tube as a biological replicates. The batch information for cell sorting was included in Online-only Tables 1, 2.
Library construction and sequencing
The cDNA libraries were generated following the Smart-seq2 protocol7 and the Illumina Nextera XT DNA preparation kit was used. The batch information of cDNA generation and sequencing was included in Online-only Tables 1-3.
Short-read sequencing
cDNA was then sheared randomly by Bioruptor Pico sonication device (Diagenode) for Illumina library preparation protocol including DNA fragmentation, end repairing, 3′ ends A-tailing, adapter ligation, PCR amplification and library validation. After library preparation, PerkinElmer LabChip® GX Touch and Step OnePlus™ Real-Time PCR System were introduced for library quality inspection. Qualified single-cell or P100 libraries were then loaded on Illumina Hiseq X Ten platform for PE150 sequencing. The average sequencing depth was 23.3 million reads (SD = 3.41) for P100 and 4.2 million reads (SD = 1.1) for single cells. Sequencing library construction and sequencing were done in Annoroad Gene Technology (Beijing, China).
PacBio sequencing
The full-length cDNA libraries were generated as described above, 5 µg of full-length cDNA were used for size selection using the BluePippin™ Size Selection System (Sage Science, Beverly, MA, USA). SMRTbell library was constructed using 1 μg size-selected (above 4 kb) cDNA with the Pacific Biosciences SMRTbell template prep kit. The binding of SMRTbell templates to polymerases was conducted using the Sequel II Binding Kit, and then primer annealing was performed. Sequencing was carried out on the Pacific Bioscience (PacBio) Sequel II platform. Sequencing library construction and sequencing were done in Annoroad Gene Technology (Beijing, China).
Oxford Nanopore Technologies cDNA sequencing
Similarly, the full-length cDNA was generated using the Smart-seq2 protocol as described above. About 250 ng cDNA were subjected to end-repair and dA-tailing ligation using NEBNext FFPE DNA Repair Mix (NEB, Cat No. M6630L) and NEBNext Ultra II End Repair/dA-Tailing Module (NEB, Cat No. E7645). PromethION library preparation was performed according to the manufacturer’s instructions (Oxford Nanopore Technologies, SQK-LSK109 Kit, EXP-NBD104 and EXP-NBD114). Sequencing was done on the Promethon p48 platform. Sequencing library construction and sequencing were done in Biomarker Technologies Co., Ltd (Beijing, China).
Sequencing data analysis
All programs used default parameters unless stated otherwise in the corresponding section.
Single-cell short-read sequencing
Single-cell analysis was carried out using the Seurat R package (v3.2)8. We filtered cells using the following criteria: (a) feature counts below 500 or above 8000, (b) UMI counts below 70,000 or over 5,000,000, (c) identified as doublets using DoubletFinder (v2.0.3)9 with the parameter of “PCs = 1:6, pN = 0.25, pK = 0.09, nExp = 24”. A total of 109 LT-HSCs, 98 ST-HSCs and 107 MPPs were kept for further analyses. The quality statistics were provided in Online-only Table 1. We used principal component analysis (PCA) with variable genes as input and identified the top 6 significant PCs that were used as input for tSNE (t-distributed stochastic neighbor embedding). Cell markers from previously published studies10,11 were used to verify the cell types. Differentially expressed genes in the cell types were determined using the FindAllMarkers function in Seurat R package (v3.2)8.
Bulk short-read sequencing
The quality of the raw sequence data was analyzed using FastQC software (v0.11.8) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and RSeQC software12 (v3.0.1) (http://rseqc.sourceforge.net/). The paired-end short-reads were aligned to the mouse reference genome GRCm38 with annotation from ENSEMBL release 93 using STAR (v2.7.1a)13 with the parameters of “—outSJfilterOverhangMin 12 12 12 12 —alignSJoverhangMin 3 —alignSJDBoverhangMin 3 —chimSegmentMin 12 —chimScoreMin 2 —chimScoreSeparation 10 —chimJunctionOverhangMin 12 —outFilterMultimapNmax 1 —chimOutType Junctions SeparateSAMold”.
Gene body coverage, distribution of aligned reads over genome feature and RNA integrity at transcript-level were calculated using RSeQC12. The RNA integrity at the transcript-level was evaluated using the Transcript Integrity Number (TIN) algorithm14. These quality statistics were provided in Online-only Table 2. Gene expression was quantified using function summarizeOverlaps from GenomicAligments (v1.24.0)15 with parameters of “mode = “Union”, singleEnd = FALSE, ignore.strand = TRUE, fragments = TRUE”. The read counts were then normalized by function rlog with the parameter of “blind = FALSE” in DESeq2 (v1.28.1)16. Principal component analysis (PCA) and sample distances were calculated using FactoMineR (v2.4)17 and DESeq2 (v1.28.1)16, respectively. The batch effect (provided in Online-only Table 2) was corrected using ComBat from the sva package (v3.36.0)18.
Bulk long-read sequencing
The alignment was performed using Minimap2 (v2.17-r974-dirty)19 with the parameter of “-ax splice”. The number of reads, read length and quality were analyzed using NanoComp software (v1.33.1)20 with the parameter of “–raw–store–tsv_stats” and visualized using R package ggplot2 (v3.3.2)21 after filtering the reads with length below 200 or over 150,000 bp. These quality statistics were provided in Online-only Table 3.
Stringtie2 (v2.1.7) was used to assemble and quantify both short and long reads with or without a reference guiding. The parameter of the assembly process was “-p 20 -e -G -A”, and “-L” for long reads. The read counts of genes were extracted with a python script (prepDE.py3) with “-l 600” and normalized by function rlog with the parameter of “blind = FALSE” in DESeq2 (v1.28.1)16. The batch effects of gene expression from bulk Illumina, PacBio and Oxford Nanopore Technologies sequencing were corrected using ComBat from the sva package (v3.36.0)18. Principal component analysis (PCA) was calculated using plotPCA function in DESeq2 (v1.28.1)16. Correlation was calculated by function cor from R package stats (v4.0.2) with the parameter of “method = pearson”. GffCompare (v0.12.1)22 was used to compare and evaluate the accuracy of Stingtie223 transcript assembly.
SUPPA2 (v2.3)24 was employed to classify the AS events with the parameter of “-f ioe -e SE SS MX RI FL”. Seven AS types were identified, including skipping exon (SE), retained intron (RI), alternative 5′ splice site (A5), alternative 3′ splice site (A3), mutually exclusive exons (MX), alternative first exon (AF), in which alternative first-exon use results in mRNA isoforms with distinct 5′ UTRs, and alternative last exon (AL), in which alternative use of multiple polyadenylation sites results in distinct terminal exons.
To visualize long-reads for transcripts, the gene regions were extracted using Samtools (v1.10.2)25. Bedtools (v2.30.0)26 was used to convert all bam files to GTF format. Visualization was performed using R package ggbio (v1.36.0)27. Sashimi plots of short-read sequencing data were plotted using pysashimi (https://github.com/ygidtu/pysashimi).

