Label-free imaging flow cytometry for analysis and sorting of enzymatically dissociated tissues

Hardware based reduction of cell aggregates

In the present work, we build upon the existing soRT-FDC technology to improve reliability of measurement, analysis, and sorting of enzymatically dissociated tissues. To develop and showcase the methods, we used dissociated retina cells originating from human retinal organoids (HROs) and mouse eyes (see Fig. 1A). HROs differentiated from a photoreceptor-specific reporter human induced pluripotent stem cell line (hiPSC-Crx mCherry¹⁴) were cultured for 125 days. Mice expressing GFP restricted to rod photoreceptors (Nrl-eGFP mouse¹⁵) were at postnatal day 4 (P04) when applying the dissociation protocol (see Materials and Methods). For flow cytometry measurement in RT-FDC or sorting using soRT-FDC, cells were resuspended in a measurement buffer with elevated viscosity (see Materials and Methods), as illustrated in Fig. 1A.

soRT-FDC is a microfluidic technique allowing not only to capture bright-field images and fluorescence information from single cells at 1,000 cells/s, but also sort specific cells based on the decision of a DNN at 200 cell/s. In soRT-FDC, suspended cells and sheath fluid are pumped into a microfluidic chip by means of two syringe pumps. The sheath flow focusses the sample flow towards a narrow channel. At the end of the channel the cells are captured by a high-speed camera and optionally fluorescence information is retrieved for up to three wavelengths. After the narrow channel, the microfluidic system widens and divides into a path towards the default and target outlet (see Fig. 1B and Figures S1A and S1B, Supporting Information). The narrow channel is a distinct feature of soRT-FDC as it allows to deform cells to obtain information about the mechanical properties of cells. Furthermore, cells are aligned in the channel which simplifies image analysis tasks due to the reduced degrees of freedom.

However, any constriction in a microfluidic design introduces the risk of being blocked by debris or large objects contained in the processed sample. A blocked or partially blocked channel will impair sorting. Moreover, presence of cell clumps in a dataset can skew analysis results. In samples containing suspension cells, such as blood, this rarely becomes a problem. In dissociated samples however, presence of cell aggregates like doublets, is very common. To prevent such objects from reaching highly confined parts of the chip, filter pillars were implemented and their design improved¹¹. We introduce multiple columns of increasingly narrowly spaced filter pillars, allowing the successive retention or resolution of interfering objects. The distance between pillars at the first column is 60 µm (indicated as d₁ in Fig. 1C), which allows to catch larger objects (see Fig. 1C). The pillars at the final column show a distance of 15 µm, which catch smaller objects and also contribute to separating and dividing aggregates into single cells. In the sheath inlet the first and last column of filters have an inner distance of 60 µm and 10 µm, respectively. Separation of cells is further promoted by serpentine channels of a width of 30 µm (see Fig. 1C and Figure S1, Supporting Information). We observed that debris particles are prone to get stuck in the curvature of the serpentines. To prevent a full blocking of the chip, multiple serpentines were placed in parallel, resulting in a practically undisturbed execution of measurements or sorting experiments for hours.

The microfluidic design shown in Fig. 1C decreases the probability of the occurrence of large aggregates (see Figure S1, Supporting Information) but does not guarantee to generate a pure single cell suspension. In the following, a method for detection of aggregates such as cell doublets is introduced, allowing to exclude such events during data analysis.

DNN based detection of cell aggregates

In flow cytometry, cell doublets can skew datasets and any subsequent analysis requires an exclusion of such events. For example, when a non-fluorescent cell is attached to a fluorescent cell, the event would be assigned to the fluorescence positive group but other features such as granularity are affected by both cells. Image flow cytometers like RT-FDC and soRT-FDC provide a bright-field image and doublets of cells could be identified by human eye. As datasets typically contain several thousands of images, this task would be extremely labor intensive, resulting in a need for automation. Therefore, we visually assessed more than 60,000 cells (42,583 single cells and 21,137 doublets of cells) using RT-FDC measurements of HROs to create a labelled dataset. To speed up the labelling process, we developed a dedicated software (YouLabel) with graphical user-interface (Figure S2, Supporting Information). Using the generated dataset we trained supervised machine learning models, more specifically, convolutional neural nets (CNNs, Fig. 2A), a type of DNN that is commonly used for image classification tasks. The input image size for the CNN is 36 × 36 pixels (= 24.5 × 24.5 µm) which is large enough to cover aggregates of cells and cells in proximity (Fig. 2A). Accidental sorting of multiple cells and erroneous assignment of fluorescence intensities is not only a problem when cells are directly attached to each other but also when they travel at a close distance (see Figure S3 A, Supporting Information). To train the CNN to detect such events, they were assigned to the class of doublets during the manual labeling process.

In order to span a wide variety of phenotypes, we used images of dissociated HRO cultures¹⁶. Based on the resulting dataset, we trained a CNN (Fig. 2A) to perform the task of identifying doublets, and the resulting model (CNN_doublet) reaches a validation accuracy of 80.3% (Fig. 2B). To test the applicability of the model to new data, we recorded a dataset of murine Nrl-eGFP cells. In Nrl-eGFP transgenic mice GFP expression is restricted to rod photoreceptors¹⁷. Each event was forwarded through CNN_doublet to obtain the probability that the event is a doublet (p_doublet) and the histogram in Fig. 2C shows the resulting distribution of probabilities. The corresponding testing accuracy is 97.4% (see confusion matrix in Figure S3 B, Supporting Information). Interestingly, the model confidently predicts single cells and doublets into the correct class as shown by example images and the confusion matrix (Figure S3 B, Supporting Information). The CNN classifies an event as doublet if a second cell is closer than approximately 15 µm (Figure S3 C, Supporting Information). The model also delivers sensible results for a measurement of whole blood (Figure S3 D, Supporting Information, data taken from¹²), indicating that the model could be employed for a general-purpose doublet detection algorithm.

CNN based detection of cell aggregates is a helpful tool for analyzing RT-DC or RT-FDC data which could be employed for many datasets and comes at low computational cost. Forwarding a single image through CNN_doublet only requires 1.4 ms (Intel Core i7 3930 K @ 3.2 GHz). Processing 10,000 images at once (batch processing) allows to achieve an inference time of 0.75 ms per image. While these times are sufficient to process large datasets, for sorting an inference time below 250 µs is required. Therefore, faster doublet detection methods are required.

Detection and separation of cell aggregates for single cell sorting

In RT-DC, RT-FDC, and soRT-FDC a real-time contour detection algorithm evaluates acquired images using efficient OpenCV implementations. By counting the number of contours in an image, we implemented a switch that allows to suppress sorting if more than n = 1 contours were detected (see Fig. 3A). The additional contour counting step comes at no additional computational cost. To reduce the chance of having multiple cells within the ROI, the cell concentration could be decreased but since that would decrease the frequency of measurement and sorting, an optimal cell concentration needs to be determined.

The duration of a standing surface acoustic wave (SSAW) pulse is 2 ms. No additional cell should enter the SSAW region during that time to avoid accidental sorting of wrong cells. For the common flowrate of 0.04 µl/s, a volume of ({V}_{2ms}=0.04frac{mu l}{s}bullet 2 ms=0.08 nl) is passing the chip during an SSAW pulse. One cell contained in ({V}_{2ms}) corresponds to a concentration of 12.5 million cells/ml. To reach that concentration, an initial sample concentration of c₁ = 50 million cells/ml has to be applied since the sample flow (({Q}_{sample}=0.01 mu l/s)) is diluted by the sheath fluid (({Q}_{sheath}=0.03 mu l/s)). As a result, ({V}_{2ms}) contains on average a single cell, but presence of a cell in a volume element is a random process and presence of individual cells is independent. Therefore, the number of cells ((n)) in a volume element (({V}_{2ms})) can be described by a Poisson distribution:

$$pleft(nright)=frac{{mu }^{n}{e}^{-mu }}{n!},$$

where (mu) is the expected (average) number of cells in the volume element ({V}_{2ms}). Figure 3B shows the Poisson distribution for (mu =1) (blue, corresponds to c₁ = 50 million cells/ml). The area under the curve (pale blue) shows the probability that more than one cell is contained in ({V}_{2ms}) which is p₁ = 26.4%. For sorting experiments, we reduced the concentration to c₂ = 20 million cells/ml, which corresponds to an average of (mu =0.4) cells and a probability of getting multiple cells in ({V}_{2ms}) of p₂ = 6.2% (see red plot in Fig. 3B and pale red area under the curve).

The underlying assumption of the Poisson distribution is that cells travel independently, which is not entirely true, as they can stick together and form aggregates¹⁸. As a result, avalanches of cells occasionally traverse the channel (see Fig. 3C). Figure 3C shows the measurement time versus event number and the color code indicates the time difference between two captured events. Two steep increases of the curve indicate occasions where avalanches of cells flushed through the channel. As a result, each captured image contained an object, resulting in an average time difference of (Delta T=frac{1}{3000 fps}=0.33 ms) (purple regions in the plot). For the rest of the plot, the event number rises steadily and the time difference between captured events is on average 0.09 s (yellow regions of the line), which is a bit lower than the expected frequency. This is likely caused by cell sedimentation over time. Figure 3C suggests that avalanches of cells can be identified based on the characteristic time difference between captured events of (Delta T=frac{1}{fps}). Therefore, we implemented a timer, allowing to suppress the sorting pulse if (Delta T) is below a set threshold. In practice, we found that a (Delta T) of 0.38 ms results in reliable omission of sorting during cell avalanches. The image insets in Fig. 3C deliberately show only events with multiple cells in an image. While such events occur more often during avalanches, the majority of the images still shows a single cell. This fact highlights the advantage of time delay analysis in contrast to contour count. All methods were implemented into the C+ + based sorting software.

DNN architecture for optimized CPU utilization

Intelligent image-activated cell sorting allows to sort cells based on the decision of a trained DNN. While a CNN would be the preferred architecture for image classification tasks, CNNs usually require more computational time and are thus too slow for rapid cell sorting with commonly used hardware. As an alternative to a CNN, in Ref.¹², a multilayer perceptron (MLP) was used due to considerably better computational efficiency. Originally, the MLP was optimized to provide an inference time (t<200 mu s,) allowing for real-time inference to trigger a cell sorting mechanism, while conserving a high classification accuracy for the distinction between different blood cell types. However, CPU specifications were not regarded in the choice of MLP design. Modern CPU chipsets provide methods for parallel computations (Hyper-Threading, Intel Advanced Vector Extensions), allowing to increase the complexity of an MLP, without changing its inference time. We therefore chose to, first, screen various MLP architectures for their computational speed and, second, for their image classification performance (see Sect. 2.5). The screening was carried out on the same PC that is used to operate the soRT-FDC setup (Intel Core i7 3930 K @ 3.2 GHz).

The MLP base architecture is designed as follows. The input layer of the MLP model accepts grayscale values of an 8-bit raw image divided by 255. Then, hidden layers perform a transformation of the input information by a set of weights and biases and an activation function (Rectified linear unit—ReLU) as indicated in Fig. 4A. The complexity of an MLP depends on its number of parameters. The number of parameters increases the more layers and nodes are present in the neural net. Therefore, we built MLPs with (k) ((1le kle 4)) hidden layers and iterated through a set of numbers of nodes ({n}_{i}) (Fig. 4A). The number of nodes ({n}_{i}) of each layer was set to a multiple of 8 between 8 and 240 and for every possible combination, a model was built to determine the inference time and the number of trainable parameters (N). To limit the computational resources, we omitted models containing (N)>80,000 parameters from the screening, resulting in a total number of 396,521 models (30, 671, 16,527, and 379,293 models for (k)=1,2,3, and 4, respectively) whose results are shown in Fig. 4B (red, orange, blue, and magenta indicate models for (k)=1,2,3, and 4, respectively).

As expected, MLPs with more layers but the same number of parameters have a higher inference time due to reduced potential of parallel computation. The MLP architecture suggested by Nawaz et al.¹², is included in our screening, and results in an inference time of ({t}_{Nawaz}=174 mu s) (indicated as MLP_Nawaz in Fig. 4B). Interestingly, no 4-layer MLP reached an inference time (le {t}_{Nawaz}). Multiple models with k = 1,2, and 3 comprehend more trainable parameters while having an inference time close to ({t}_{Nawaz}). We searched for models with the maximum number of parameters in the range (170 mu sle tle 175 mu s). The identified models with (k)=1,2, and 3 layers are indicated in Fig. 4B by MLP₁, MLP₂, and MLP₃, respectively. The models MLP₁, MLP₂, and MLP₃ contain 2.7 to 9.1 times more trainable parameters compared to MLP_Nawaz and the total number of parameters for each model is shown in Table 1. The screening is independent of actual classification performance, but allows to find models with optimized CPU utilization. In the following, these models are employed to solve an image classification problem to assess the resulting accuracy levels.

Table 1 MLP screening summary.

DNN classifier for photoreceptor detection and sorting

We performed seven independent experiments using RT-FDC to acquire data from dissociated retinae of Nrl-eGFP mice at postnatal day 4 (P04) ± 1 day. To that end, we used the Nrl-eGFP mouse line, which expresses eGFP under the control of the Nrl promoter, labelling rod photoreceptors from an early stage onwards. Figure 5A shows an example measurement and gates indicate certain subpopulations of cells. In a size region between 20 and 35 µm², there are cells of various fluorescence expressions. To minimize wrongly labelled cells in the dataset, we employed CNN_doublet to remove all events with p_doublet > 0.3, excluding doublets and too proximate cells. Furthermore, we used a conservative gating strategy by only keeping cells with very low and very high fluorescence for the class of small GFP^– and small GFP⁺ cells, respectively (see gray and green rectangles in Fig. 5A). Debris (area < 20 µm²) and objects larger than 35 µm² were not considered for the deep learning image classification task as they can be gated out based on their size during sorting. The challenging classification task that should be solved using DNNs is to distinguish small GFP⁺ (green in Fig. 5A) and small GFP^– cells (gray in Fig. 5A).

In the current experimental setup, the focus is adjusted manually, resulting in slight differences between sessions and even slight focus drifts during long sorting procedures. To include phenotypes from different focus positions in the dataset, the focus was manually altered during acquisition of the training dataset. The range of alteration was kept in a range that would in practice be used for sorting or measurement. For acquisition of the validation dataset, the focus was left at a fixed position. Table 2 shows the number of events captured for small GFP^– and small GFP⁺ cells.

Table 2 Number of images in training and validation set.

As focus alteration increases the variety of phenotypes contained in the training dataset we would like to introduce the phrase “experimental data augmentation”. In contrast, “mathematical data augmentation” refers to computational operations applied to the image data after the measurement. Mathematical data augmentation allows to modify the image phenotype during DNN training and was shown be an effective tool to improve the accuracy and robustness of DNNs¹⁹. A strong modification of the phenotype may enable the DNN to become robust to such alterations, but also increases the difficulty to converge. Therefore, data augmentation should ideally modify the images in a range that could occur in practice. In the following, image augmentation operations are introduced and assessed to identify sensible parameter settings. Each augmentation option is implemented into AIDeveloper which is a software for training DNNs for image classification without need for programming. All model training in this study has been performed using AIDeveloper 0.2.3²⁰.

In the current soRT-FDC setup, there is variation in brightness between experiments. Alteration of brightness can be performed computationally. To get an intuition for the range of brightness levels of different experiments, we assessed pixels at the upper border (10 × 255 pixels, see red rectangles in Fig. 5B) of one image from each of 29 measurements (Figure S4 A, Supporting Information). This region allows to obtain information about the background brightness as it is located outside the measurement channel. Based on these pixels, we also computed the standard deviation to get an estimate of image noise. Furthermore, we assessed the alignment of cells in the channel and found an average tilt of 11° (see Fig. 5C). The typical ranges of brightness difference, image noise, and rotation were employed to tune image augmentation methods to slightly change images during model training. Moreover, we employed random vertical flipping and random shifting (left–right and up-down) of the cropped images by one pixel during model training. For more details on each data augmentation method please see Materials and Methods.

Learning rate screening

The learning rate ((l)) is one of the most important hyper-parameters when training DNNs as it controls how strong the weights (W) of a model are adjusted in each training iteration. To discover a sensible value for (l), a screening of a range of learning rates can be performed²¹. To provide an easy access to that method, we implemented it into AIDeveloper. Graphical software elements guide the user through the analysis and tooltip annotations offer basic information (see Figure S5, Supporting Information). To our knowledge, this is the first time, the learning rate screening method is implemented into a software with graphical user-interface for easy accessibility.

MLP training

During acquisition of the training and validation dataset, the number of available cells was differing between samples. Therefore, some measurements contained more events than others. To avoid overfitting of the model to the phenotype of the measurement with most events, we performed random sampling to achieve an equal contribution of each measurement. In each training iteration of the model, a different batch of training images was sampled from each measurement. Using the same routine, the validation dataset was assembled before the first training iteration and left constant throughout all training iterations.

Training and validation data were loaded into AIDeveloper and the following data augmentation parameters were set: rotation: ± 10°, left–right shift: ± 1 pixel, up-down shift: ± 1 pixel, additive brightness: ± 12, multiplicative brightness: 0.6…1.3, standard deviation of Gaussian noise: 3.0, and random vertical flipping. A learning rate screening was performed (see Fig. 6A), considering the image augmentation parameters. For all MLP models, we found a steep decrease of the loss approximately at (l={10}^{-5}), which is 100 times smaller than the default learning rate ((l={10}^{-3})) as shown in Fig. 6A and Figure S5 B, Supporting Information. Using the learning rate (l={10}^{-5}), the models MLP_Nawaz, MLP₁, MLP₂, and MLP₃ were trained for 30,000 training iterations (see Fig. 6B, Figure S6 F, Supporting Information). Table 3 shows the maximum validation accuracy for MLP₁, MLP₂, MLP₃, and MLP_Nawaz, indicating that the architecture of MLP₂ is the best choice for this classification task. To obtain a benchmark for the classification accuracy if there was no restriction of the inference time, we trained two different convolutional neural net architectures. These architectures contain two (CNN_LeNet) and four (CNN_Nitta) convolutional layers (see Figure S6 D, E, Supporting Information). Interestingly, CNN_LeNet performs worse compared to all MLPs (see Figure S6 F, Supporting Information). Only CNN_Nitta was able to outperform the MLPs. For comparison, we also trained each model using the default learning rate (10^–3) but the overall performance was lower for each model (see Figure S6 F).

Table 3 Comparison of best models based on the maximum validation accuracy (max. val. acc.) of each MLP architecture.

When applying MLP₂ to an image, the model returns the probability that the image contains a small GFP⁺ cell: P(GFP⁺). The histogram in Fig. 6C shows P(GFP⁺) for all events of the validation set. As expected, events that are actually GFP⁺ cells return high values for P(GFP⁺) (green histogram), while GFP^– cells tend to return lower P(GFP⁺) values (gray histogram). But there is also a considerable overlap between the distributions, which is the reason for the imperfect classification performance of the model. Typically, a threshold of P(GFP⁺)_thresh = 0.5 is used to assign events to different classes. By increasing this threshold, only cells are predicted to be GFP⁺ where the model returns a high enough P(GFP⁺). Increasing P(GFP⁺) causes an increase of the precision (see Materials and Methods), which would in practice correspond to a higher concentration of GFP⁺ cells in the target sample after sorting. At the same time, increasing the threshold reduces the sensitivity of the model, which in practice means a reduced yield of GFP⁺ cells after sorting. The evolution of concentration and yield for different threshold values is plotted in Fig. 6D.

For one photoreceptor transplantation experiment, 100,000 cells are required and the sorting duration should be limited to one hour to assure high viability of the cells²². Calculations above showed that in average 0.4 cells are passing the camera within 2 ms (for a sample concentration of 20 million cells/ml). As a result, in average one cell is captured every 5 ms, which corresponds to a measurement frequency of 200 cells/s. As there are approximately 50% GFP⁺ cells, 100 cells/s could potentially be sorted. Due to the presence of cell aggregates, a more realistic sorting rate is 75 cells/s. Based on these boundary conditions, the minimum yield can be computed as following:

$${yield}_{min}=frac{100.000 cells}{75 frac{cells}{s}bullet 3600s}=37.0%approx 40%$$

The yield of 40% is reached for a P(GFP⁺)_thresh of 0.67 (marked in plot), which corresponds to a concentration of GFP⁺ cells of 77%. Figure 6E shows confusion matrices for P(GFP⁺)_thresh = 0.5 and P(GFP⁺)_thresh = 0.67.

Photoreceptor sorting and transplantation

To verify the working principle, we employed the methods introduced in this work for image-based sorting of rod photoreceptors of dissociated Nrl-eGFP mouse retina. After sorting, the initial sample and the sorted target sample were both measured using RT-FDC to evaluate the number of fluorescent cells. The color code of scatter plots in Fig. 7 illustrates the event-density, which suggests that the maximum density is located at 300 and 4,000 a.U. of fluorescence for the initial and target sample, respectively. An elevated fluorescence of cells in the target sample is also confirmed by the medians of the fluorescence intensity (M_Init = 728 and M_Targ = 1,684 in Fig. 7A). To evaluate the number of GFP⁺ and GFP^– events a gate was chosen manually (solid green line in Fig. 7A). The percentage of events within that gate is ({c}_{GFP+}^{Init}=frac{3957}{7428}bullet 100=53.2%) for the initial sample and ({c}_{GFP+}^{Targ}=frac{1516}{2180}bullet 100=69.5%) for the target sample.

Cells contained in the target fraction were washed and subretinally transplanted into adult female C57Bl/6JRj mice. Two weeks after transplantation, GFP⁺ signal could be detected marking transplanted cells in the subretinal space of recipient mice (Fig. 7B), as well as in photoreceptor cell bodies within the host ONL (Fig. 7B, insert), the latter likely as a result of material transfer from donor to host cells²³. Although control eyes, in which similar numbers of unsorted cells were transplanted, contain more GFP⁺ cells at analysis (Figure S9, Supplementary Information), this is a proof of concept that cells enriched via soRT-FDC can be used for transplantation and survive in the murine retina, making soRT-FDC a useful method to provide cells for downstream applications.

Source link

Vasiprak Blog

Hardware based reduction of cell aggregates

DNN based detection of cell aggregates

Detection and separation of cell aggregates for single cell sorting

DNN architecture for optimized CPU utilization

DNN classifier for photoreceptor detection and sorting

Learning rate screening

MLP training

Photoreceptor sorting and transplantation

You might also like

Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing

One of the World’s Most Precise Microchip Sensors Created – Thanks to a Spiderweb

Cleaning the brain after ischemic stroke — ScienceDaily

Stay tuned!