Preloader

Transcriptional changes in the mammary gland during lactation revealed by single cell sequencing of cells from human milk

Our research complies with all relevant ethical regulations as detailed in the below sections.

Tissue collection and non-lactating cell isolation

Non-lactating human breast tissue was donated by participants who provided informed consent who were undergoing elective aesthetic reduction mammoplasty at the Nymphenburg Clinic for Plastic and Aesthetic Surgery in accordance with the regulations of the ethics committee of the Ludwig-Maximilian University, Munich, Germany (proposal 397-12). Limited demographic information on the participants was provided that included the age and parity of the participant (Supplementary Fig. 4A). Single-cell suspensions of mammary cells were generated using an adaptation of a previously described protocol41, however, in this case using a fast tissue digestion protocol (see Engelbrecht et al.42). Briefly, the freshly collected mammary tissue was collected and minced using scalpels into <2–3 mm3 pieces. Twenty millilitres of minced tissue in digestion buffer (Dulbecco’s Modified Eagle Medium (DMEM)/F12 w/o phenol red, 2% w/v bovine serum albumin (BSA), 10 mM HEPES, 2 mM glutamine, 100 U/mL penicillin-streptomycin) supplemented with 1 µg/mL insulin (Sigma, I6634) was added together with 800 U/mL collagenase (Sigma, C9407) and 100 U/mL hyaluronidase (Sigma, H3506) and made up to the 25 mL mark. Falcon tubes were then sealed with parafilm to ensure they were airtight and mixed at 100 r.p.m. at 37 °C for 3–4 h. Following digestion, the resulting dissociated organoid fragments were washed twice in washing buffer (DMEM/F12 w/o phenol red, 10 mM HEPES, 2 mM glutamine, 100 U/mL penicillin–streptomycin) and pelleted by 300 × g centrifugation for 5 min. Once the organoids were isolated, they were cryopreserved in 50% washing buffer, 40% foetal calf serum (FCS) and 10% dimethyl sulfoxide and stored in liquid nitrogen until required. When required, organoid fragments were gently defrosted in a 37 °C water bath for approximately 5 min before being treated with trypsin and dispase (Life Technology) to yield a highly viable single-cell suspension.

Human milk cell isolation

Human milk donors were recruited from Pippagina English Prenatal and Postnatal classes or through the Helmholtz Zentrum München in accordance with regulations of the ethics committee of the Ludwig-Maximilian University, Munich, Germany (proposal 17–715). Participants provided written informed consent and filled out a detailed questionnaire to provide demographic information. Briefly, human milk was freshly collected using either a provided double electric breast pump (Medela, Symphony) or participants personal pump, under aseptic conditions. Milk collections were obtained either within the lactation room at the Helmholtz Centre Munich, at the participants’ homes or post-partum educational classes depending on the preferences of the participant. Fresh milk samples were immediately transported on ice to the laboratory and processed as soon as possible (<2 h after collection). Human milk cells were isolated by diluting milk samples in an equal volume of sterile phosphate-buffered saline (PBS) (ThermoFisher Scientific, Waltham, U.S.) and centrifuged at 870 × g for 20 min at 20 °C in a Rotanta 460R centrifuge (Hettich, Tuttlingen, Germany). The pellet was washed by removing the supernatant and resuspending in 5–10 mL of cold PBS before transferring the sample to a new 15 mL tube (Corning, Corning, U.S.) and centrifuging at 490 × g for 5 min at 4 °C. Following a second washing step, 100–550 µL of mammary epithelial cell growth medium (MECGM) (PromoCell, Heidelberg, Germany) was added to the human milk cell aggregations according to the pellet size. Overall samples sequenced in this study ranged from 38 to 102.5 mL and yielded between 0.82 × 106 and 66.96 × 106 total membrane-enclosed structures (including cells) from milk (see Schultz-Pernice et al.43 for more details). To examine the cells more closely, nuclear stain DRAQ5TM (62254, ThermoFisher Scientific, Waltham, U.S.) and Nile red (N3013-100MG, Merck, Darmstadt, Germany) were added to a final concentration of 0.4 µg/mL (1 mM) and 0.1 µg/mL, respectively, and the cells incubated for a further 5 min in the dark. Cells were then loaded onto a Neubauer Improved counting chamber and examined on an immunofluorescence microscope. Subsequently, single cells were either frozen or used immediately.

Cell culture

Both NMC and LMCs were cultured in 2D and 3D using previously described methods44. Briefly, single cells were mixed with MECGM supplemented with 1% pen/strep (Invitrogen), 0.5% FCS (Pan Biotech), 3 μM Y-27632 (Biomol) and 10 μM forskolin (Biomol) and seeded onto polystyrene cell culture plates for 2D culture. After an establishment period of 5 days, the medium was changed to MECGM supplemented with 1% pen/strep and 10 μM forskolin. Where samples were cultured for immunofluorescence analysis, cells were seeded on 1% gelatine coated coverslips. Colonies of 2D-cultured cells were fixed using 4% paraformaldehyde and washed before staining the cells with a 1:100 primary CK8/18 antibody (DLN-010750, Dianova, Castelldefels, Spain) in blocking buffer (PBS 0.1% BSA (Merck, Darmstadt, Germany) with 10% normal donkey serum (GeneTex, Irvine, U.S.)) solution for 3 h at room temperature. Following washing, a 1:250 secondary antibody in (anti-mouse Alexa Fluor 488 Donkey Anti-Mouse IgG, A21202, ThermoFisher Scientific, Waltham, U.S.) PBS 0.1% BSA solution was added and samples were incubated for a further 45 min at room temperature. After antibody incubation, 500 μL of a 0.4 μg/mL DAPI solution (D9542-1MG, Merck, Darmstadt, Germany) were added and incubated for 2 min, followed by mounting of the slides. Images were acquired using a Zeiss “Axio Imager.M2” microscope in combination with the “Zeiss Zen 2.3 pro” software. For 3D culture, single cells were mixed with neutralizing solution and acidified rat tail collagen I (Corning) to generate collagen gels in siloxane coated 24-well plates. After allowing the gels to polymerize for an hour, supplemented media (as above) was added on top of the gels that were then gently encircled to generate floating collagen gels. Similar to 2D culture, after an establishment period of 5 days, the media on the floating collagen gels was changed to MECGM supplemented with pen/strep and forskolin only.

Flow cytometry

Flow cytometry was employed to determine the similarity of expression of mammary markers between LMC and NMCs. Cells were stained with CD45-PB (V450, dilution of 1:100, catalogue number: 560368, BD, Heidelberg, Germany), EpCAM-FITC (dilution of 1:10, catalogue number: GTX79849-100, GeneTex, Eching, Germany) and CD49f-PE (dilution of 1:20, BD, catalogue number: 555736, Heidelberg, Germany). After a 45-min incubation, stained cells were diluted in MECGM and filtered through 35 µm cell strainer caps of round-bottom tubes (Corning, Corning, U.S.). Ten minutes prior to analysis, cells were further stained with DRAQ5TM (as above) to a final concentration 1 µM. Flow cytometry was performed using a FACSAriaTM III cell sorter (BD Biosciences, Franklin Lakes, U.S.) with a 100 µm nozzle in combination with the FACSDivaTM 6.0 Software. Alternatively, flow cytometry was performed on LMCs to examine FR expression in live, nucleated epithelial cells. LMCs were prepared by staining the cells with CD45-PB (dilution of 1:100, catalogue number: 304021, Biolegend, CA, U.S., catalogue number: 304021) and anti-human FRs α and β (FR-αβ)-PE (dilution of 1:40, catalogue number: 391805, Biolegend, CA, U.S., catalogue number: 391805) for 45 min in the dark on ice. Subsequently, the samples were stained for DAPI (prepared to 1 mg/mL, BD Pharmingen, Berkshire, U.K., catalogue number: 564907) and DRAQ5 (same as above) for 10 min prior to analysis. In this case, flow cytometry was performed on a BD LSRFortessa machine (BD Biosciences, Franklin Lakes, U.S.) in combination with the FACSDivaTM 9.0 Software. In both cases, small volumes of cells from each sample were mixed prior to staining and used as comparisons and single stain controls. Both machines were used with a 100 µm nozzle. Laser settings were adjusted using unstained and single stain controls. Obtained data were analysed using the FlowJo_V10 Software (FlowJo LLC, Ashland, U.S.).

Library preparation, sequencing and data processing

We examined cells that were prepared on three separate 10× genomics chips. Batch 1 contained four freshly dissociated NMCs (NMC1–4) and four samples of freshly collected and isolated LMCs (LMC1–4). Batch 2 contained three samples of NMCs (NMC5–7) and five viably frozen samples of LMCs (LMC5–8), including one sample that had been sequenced in the previous batch (LMC2B). Finally, batch 3 contained a new freshly collected and isolated LMC donor (LMC9) and a repeat sample of a freshly dissociated NMC donor (NMC1B), to examine batch effects (Supplementary Figs. 4a and 5a). Overall, we examined NMCs taken from nulliparous (n = 3) and parous (n = 4) females aged 19–65 years, as well as LMCs from uniparous or multiparous (n = 5 and n = 4, respectively) females aged 27–44 years, with lactation stages ranging from 2 to 12 months (Supplementary Fig. 4a). Library preparation for batch 1 and 2 was performed 10× Chromium single-cell kit using version 3 chemistry according to the instructions in the kit. The libraries were then pooled and sequenced on a NovaSeq6000 S2. Read processing was performed using the 10× Genomics workflow using the Cell Ranger Single-Cell Suite version 3.0.2. Samples were demultiplexed using barcode assignment and unique molecular identifier (UMI) quantification. The reads were aligned to the hg19 reference genome using the pre-built annotation package obtained from the 10× Genomics website (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references). All lanes per sample were processed using the “cell ranger count” function. The output from different lanes was then aggregated using “cellranger aggr” with -normalise set to “none”. Batch 3 was prepared in a similar fashion, however, using version 2 chemistry, sequenced on an Illumina HiSeq 4000 and read processing was done using Cell Ranger Single-Cell Suite version 2.1.1.

Quality control and data pre-processing

Each batch had quality control and pre-processing performed separately using similar pipelines. Barcodes identified as containing low counts of UMIs likely resulting from ambient RNA were removed using the function “emptyDroplets” from the DropletUtils package45. For all batches, barcodes arising from single droplets were then filtered to ensure that cleaned barcodes contained at least 1000 UMIs and that the percentage of mitochondrial genes compared to overall annotated genes were not higher than 1× the median absolute deviation. Following filtering, expression values were normalized and log-transformed using the “computeSumFactors” from scran and “logNormCounts” from the scater46 package.

Once each batch had undergone quality control and normalization the SingleCellExperiment objects were merged, intersecting genes kept and cells were normalized using the function “multiBatchNorm” from the batchelor package in R. Batch effects were removed by adding LMCs to NMCs by batch and performing mutual nearest neighbours (MNN) correction using the “fastMNN” function from the batchelor package. After performing quality control, combining, normalizing and performing MNN correction, we retained a total of high quality 54,714 NMCs and 56,030 LMCs (Fig. 1d and Supplementary Figs. 3 and 4) which had similar UMI, gene and % mitochondrial counts per cell, despite differences in cell source (Supplementary Fig. 3). Variance between the transcriptomic profile of single cells was examined by PC analysis using the “runPCA” function using BiocSingular version 1.6.0.

To visualize the global structure of the data, UMAP graphing (umap version 0.2.7.0) and Louvain clustering using scran47 was performed on the batch corrected data (Fig. 1e). The resulting UMAP contained an even distribution of cells from each batch (Supplementary Fig. 5a) and sample (Supplementary Fig. 6a, b), suggesting that adequate correction was performed. In the case of the LMC and NMC samples that were repeatedly sequenced, we performed PC analysis and found no residual batch effects despite different sequencing days or in the case of LMCs whether the cells were prepared from freshly isolated or viably frozen cells (Supplementary Fig. 5c, d). Overall, 22 clusters were identified, which mapped to 9 major cell types including MY/basal, LP, HR, VA, FB, EN, IM and LC1 and LC2 (Supplementary Fig. 6b, c). We noted that the IM cell cluster appeared to have many subclusters, hence we performed subclustering analysis which revealed 12 clusters that were identified to represent 5 major IM cell subtypes based on gene expression profiles (Fig. 3c). Plots were generated using either ggplot2 or pheatmap packages with custom colours generated by the RcolourBrewer package.

Differential gene expression analysis

DEGs were identified between subsetted groups by first generating pseudobulk samples using “aggregateAcrossCells” function in the scater package. edgeR version 3.34.3 was used to compute DEGs between groups by filtering and scaling sample library size using the “filterByExpr” and “calcNormFactors” functions. Next the common, trended and tagwise negative binomial dispersions of the genes were calculated using “estimateDisp”. Quasi-likelihood negative binomial generalized log-linear models were fitted using “glmQLFit” and “glmQLFTest”. FDR corrections were applied to the resulting p values using the Benjamini–Hochberg method. To visualize the DEGs, volcano plots were generated using the EnhancedVolcano package the FDR corrected p value cut off FDR < 1 × 10−8. Significant genes were ranked according to their FC and the top 5/10% of the positive (upregulated) or bottom 5/10% of the negative (downregulated) genes had gene set enrichment analysis performed on them using the “weight01” algorithm and “fisher” statistic using “runTest” in the “getSigGroups” function from topGO package. “GenTable” was used to generate a table with the top 50 biological process GO terms. Plots of selected GO terms were generated using ggplot2, plotting the resulting p value from the classic Fisher test and gene ratio, which is the number of significant genes for the term divided by the total number of significant genes used in the gene enrichment test.

CellChat analysis

CellChat analysis was performed using the R CellChat package version 1.1.0. First, LMC-derived cells were subsetted from the total cells and annotated according to their cell type: “LC1”, “LC2”, “Tcell”, “Mono”(cytes), “Macro”(phages), “Bcell_plasma” or “Myeloid” cells. A CellChat object was made using the “createCellChat” function. After annotating the object with relevant labels and identifying overexpressed genes, the communication probability was inferred using “computeCommunProb” function. Cell–cell communications for each cell signalling pathway were generated with “computeCommunProbPathway” function. Graphs were generated using the “netVisual_chord_gene” function.

Mammary cell signature score comparisons

Mammary signatures31 from previously published data11 from LP, mature luminal, MY and ST cells were investigated in our data using the “AddModuleScore” function from the Seurat package48 version 4.0.2. For each test, the overall expression of the genes/features from each signature was calculated after subtracting 20 randomly selected genes (from the same bin as the signature features) as a control feature per cell. The resulting signature score is unitless but is indicative of signature enrichment per cell, which was then compared between clusters. Few genes in the published score were47 not found in our data set and these have been reported in Supplementary Dataset 6.

Cell signatures in TCGA tumours

Cell signatures were derived from each cell subtype by using the “findMarkers” from scran package version 1.20.1. Genes that intersected between signature lists of multiple cell types were identified by the “intersect” function and were removed. Final gene lists for each cell type can be found in Supplementary Dataset 10. Samples were downloaded from the TCGA using the package TCGAbiolinks using the “GDCquery” function, where the parameters were set as follows project = “TCGA-BRCA”, platform = “Illumina HiSeq”, experimental.strategy = “RNA-Seq”, sample.type = “c(“Primary Tumor”, “Solid Tissue Normal”). Only tumours that had been annotated for “paper_BRCA_Subtype_PAM50” were kept for analysis (n = 1083). Cell signature scores were then examined in the TCGA data using the “AddModuleScore” function as described above.

SCENIC analysis

For the sake of computational speed, a random sample of 20,000 cells was selected from all luminal epithelial cells. The analysis was performed as previously described21. First, the gene regulatory network was constructed using “grn” function in pyscenic (version 0.11.1) with standard settings. The regulons were then identified using the “ctx” (–mask_dropouts) function. Finally, the area under the curve was computed using the “aucell” function with standard settings. The regulon specificity score was computed per regulon and cell type to receive a cell type-specific ranking of regulons. For the visualization of regulons (FIGREF to the graphs of e.g. SOX10), the top 30 downstream targets (ranked by “importance”, see Supplementary Dataset 4 for all regulons) and top 5 secondary targets were plotted in a directed graph.

Statistics and reproducibility

No statistical methods were used to predetermine sample size. Unless stated otherwise, no data were excluded from the analyses. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source link