Preloader

Bayesian-based decipherment of in-depth information in bacterial chemical sensing beyond pleasant/unpleasant responses

Experimental design

To analyse the collective responses of E. coli cells with reasonable accuracy, an experimental setup was constructed to monitor the responses of multiple cells stimulated simultaneously with the use of a tethered cell assay10, in which the rotation of a single flagellar motor is monitored as the rotation of the cell body (Fig. 1b). Under optimised conditions, more than 100 of approximately 500 of E. coli cells observed in each microscopic field were found to be rotating. Rotating cells are considered to be fixed on the surface through only one flagellum, and we extracted this population and analysed. The rotational directions of the cells were analysed concurrently at 10-ms intervals with custom-made image and motion analysis programs (Fig. S2). Care was taken to reduce the influence of the viscous drag force of the solution by minimising the flow rate (see the supplementary information [SI] for details, Fig. S3). Using this setup, the responses of many cells to two standard amino acid attractants, l-glutamate (l-Glu) and l-asparagine (l-Asn), were measured. When exposed to either attractant, each cell exclusively rotated CCW and, within a relatively short time, resumed pre-stimulation behaviours, i.e. alternating the rotational direction between CCW and CW (Fig. 1c). However, the responses of individual cells to even the same attractant substantially differed in terms of the time course of stimulation and adaptation. Nevertheless, ensemble averaging of a large set of ‘noisy’ data revealed essentially similar, but apparently distinct, traces of the two attractants (Fig. 1d).

Comparison of ensemble averaged responses of E. coli cells to different attractants

Further analyses of the collective responses of E. coli cells to different attractants were conducted to determine whether the attractant responses were chemical-specific. First, the attractant responses (time courses of mean CW bias values) of more than 100 cells (hereinafter referred to as CW bias traces) to various concentrations of l-Glu and l-Asn were measured (Fig. 2a). Although the strength of stimulation differed between the two attractants, the higher the amino acid concentration, the longer the attractant response persisted (CW bias  0) (Fig. 2a). Next, for mathematical treatments, the intracellular signalling system was treated as a black box and each CW bias trace as a geometric figure that is presumably described by a simplistic template. Each time course of CW bias was fitted by a combination of six linear lines (Figs. 2b and S4, L1–L6), where L1 represents the pre-stimulation state, L2 the stimulated state, L3 to L5 the recovery processes and L6 the post-adaptation state. Consequently, the trace was converted to a vector consisting of 15 index values (left{ {y_{1} , ldots , y_{15} } right}) (see SI for details) to collect training data of responses to each chemical, where y1 is the duration of L2 and y2 and y3 are the amplitude and slope of L3, respectively. In spite of the large deviations among cells that presumably result from preparation-to-preparation variations, the distributions of these index values differed between the attractant species, which are presented as different coloured data points in Fig. 2c. These results raised the possibility that information could be extracted to identity the input stimulation (chemical species and concentration) by analysing the ensemble averaged time course of CW bias in one way or another.

Figure 2
figure 2

Decipherment of two input chemicals from output responses of E. coli cells. (a) Typical responses to l-Glu (left column) and l-Asn (right column). Output traces of CW bias show a common profile consisting of a strong attractant response immediately after chemical stimulation (CW bias  0.0) and a recovery phase returning to near initial CW bias after sufficient time. Each graph is coloured according to chemical species (purple, l-Glu; green, l-Asn). The concentrations of chemicals increase from top to bottom. (b) Construction of characteristic vectors. Individual vectors representing each CW bias trace were calculated by matching a geometric template consisting of 6 lines (L1–L6). The template shape (both amplitudes and durations of lines) was modified to fit CW bias traces and 15 arbitrary parameters were obtained (see SI for details). As examples, parameters of y1, y2 and y4 are shown, where y1 is the duration of L2 and y2 and y4 are the amplitude and duration of L3, respectively. (c) Concentration dependencies of 2 indexes, (y_{1} , y_{4}), of characteristic vectors (all in Fig. S5). The values of (y_{1} ,; y_{4}) (positions are indicated in b) of characteristic vectors obtained as response activities to l-Glu (purple) and l-Asn (green) are plotted. Each graph is coloured according to chemical species as in (a). Plot makers of ‘o’ indicate data used to successfully identify the blind samples, whereas ‘x’ indicates failure. Solid lines show model functions representing concentration dependencies. Based on these model functions, Bayesian estimation is performed using observation values of blind samples, and we decipher type of blind sample (see text for details). Dashed red lines, green and purple arrows are added as examples of observations for explanation. The accuracy rates of decipherment and model functions were evaluated with leave-one-out cross validation. (d) Accuracy rate of decipherment for attractants groups consisting of l-Glu (n = 32) and l-Asn (n = 32). The left bar shows the theoretical accuracy rate of RS ( 1/2). The black line on left bar represents the standard deviation calculated numerically (± 0.09 =  ± 5.7/64, see Table S1 and SI for details). The right bar shows the accuracy rate by DeSIRAM (decipherment accuracy = 0.91). The probability of an accurate identification rate of 0.91 by RS was 4.1 × 10−12. This small probability confirms the validity of the procedure.

Use of a Bayesian inference framework to analyse ensemble averaged responses

To deduce the identity of input stimulation out of complicated output responses (i.e. excitation-adaptation time courses), a statistical framework, Decipherment procedure of chemical Stimulus Inversely from Response Activities with Machine learning (DeSIRAM), was developed based on Bayesian inference and supported by machine leaning. The index values indeed showed characteristic dependencies on the attractant species and concentrations (Fig. 2c). In DeSIRAM, we acquire model functions from the training data set prepared with various concentrations of sample chemicals (Figs. 2c, 4b and SI). Using these functions as templates for Bayesian inference, we can perform discrimination of species from blind sample with unknown concentration and species. Here is a brief explanation of our chemical type deciphering procedure. Consider a case where parameter values of red dashed lines (y1 = 150, y4 = 70) are observed as a result of stimulation with a blind sample. Comparing sample concentrations (green and purple arrows in upper and lower panels in Fig. 2c) estimated from the two models of l-Asn case and l-Glu case (green and purple lines), l-Asn shows a closer concentration value as the estimated value. In this case, the blind sample is likely to be l-Asn. These probabilities are calculated by Eq. (1). Figure 2d shows the rates of accurate identification of chemical species (l-Glu or l-Asn) with the use of DeSIRAM or random selection (RS) (see SI for details, the error bar indicates the theoretical standard deviation). The accurate identification rate obtained with DeSIRAM was significantly higher than by RS (0.91 vs. 0.50, respectively). The probability of an accurate identification rate of 0.91 by RS was 4.1 × 10−12. Such an extremely low probability clearly demonstrates that DeSIRAM can distinguish responses to the two attractants.

Decipherment of input signals from collective responses of E. coli to well-defined attractants

The performance of DeSIRAM was evaluated using six well-defined amino acid attractants with subtle differences in structure: i.e. l-aspartate (l-Asp), d-aspartate (d-Asp), l-Asn, l-Glu, l-cysteine (l-Cys) and l-serine (l-Ser) (Figs. S6 and S7). Here, 2 of 6 standard attractants were chosen and 15 (= 6C2) groups were created. In all combinations of blind tests, DeSIRAM was able to deduce the identity of the input chemical with varied accuracy rates (such identification is hereinafter referred to as decipherment). Accuracy rates of decipherment by DeSIRAM were higher than those by RS ( 0.5) in all combinations (Figs. 3a and S8a, Table S1). Notably, even subtle changes in physicochemical properties which cannot be easily detected by artificial analysis, such as chirality (l-Asp vs. d-Asp) and the presence or absence of a single methylene group (l-Asp vs. l-Glu), were well discriminated. A blind test was also conducted with all six attractants at the same time. In this case, the accuracy rate of decipherment by DeSIRAM was 0.43, while the probability of accurate identification by RS was 1.3 × 10−19 (Fig. 3b, Table S5). The occurrence probability of the six-attractant group, which was much lower than that of any two-attractant group (Table S1), shows that DeSIRAM performed better in the former although the accuracy rate was lower. To compare groups consisting of 2, 3, 4, 5 and 6 standard attractants, values of self-information (i.e. entropy of a random variable) were calculated for all groups, where a decrease in self-information (DSI) corresponded to an increase in accuracy of decipherment performance. Although the accuracy rate decreased as the number of choices increased (Fig. 3c, blue makers), DSI increased as the number of attractants in a group increased (Fig. 3c, red makers). Thus, DeSIRAM works well to decipher information of input chemicals by analysing the chemotactic behaviour of E. coli cells, despite lacking complex detector arrays and neural networks.

Figure 3
figure 3

Discrimination of various attractants with DeSIRAM. (a) Accuracy rate spectrum of decipherment of chemical types of blind samples using DeSIRAM. Accuracy rates of decipherment of 15 groups consisting of 2 of 6 standard attractants. Vertical and horizontal axes indicate attractants included in the tested group. Six amino acid attractants are numbered arbitrary, l-Asp (1), l-Glu (2), d-Asp (3), l-Asn (4), l-Cys (5) and l-Ser (6). Black square shows data of Fig. 2d. (b) Accuracy rate of decipherment for attractant groups consisting of 6 standard attractants: l-Asp (dark blue), l-Glu (purple), d-Asp (red), l-Asn (green), l-Cys (yellow) and l-Ser (light blue). The left bar shows theoretical accuracy rate of RS ( 1/6 = 0.17). The black line on the left bar shows the standard deviation calculated numerically (= 0.04, see SI for details). The right bar shows accuracy rate by DeSIRAM. The decipherment accuracy of 0.43 is high relative to that by RS. The probability of an identification success rate of 0.43 by RS is 1.3 × 10−19. (c) Dependencies of accuracy rate (blue makers) and DSI (red markers) on number of attractant (NA) including in the tested groups. Plotted values are presented as averages and error bars indicated the standard deviations (n = 15, 20, 15, 6, 1 for NA = 2, 3, 4, 5, 6, respectively). Numbers of data (N) varies with NA. Here, 6 amino acids were prepared as standard attractants. For instance, in groups consisting of three amino acid attractants, N becomes 6C3 = 20. The DSI of each group was calculated as, (DSI = – left[ { – log_{2} left( {accuracy;rate;by;DeSIRAM} right) – left{ { – log_{2} left( {accuracy;rate;by;RS} right)} right}} right]), and, averaged in each NA group.

The accuracy rates of DeSIRAM varied from 0.61 (discrimination of l-Asp and d-Asp) to 0.96 (l-Glu and l-Ser). This variation could be explained by types of relevant chemoreceptors. l-Glu and l-Ser are sensed mainly by Tar and Tsr receptors, respectively, whereas both l-Asp and d-Asp bind to Tar. The involvement of distinct receptors may cause distinct features in the response time course, which can be readily discriminated with relatively high accuracy. Notably, despite mainly activating the same receptors (Tar), the high rate of detection accuracy of l-Glu and l-Asn (black square in Fig. 3a) is remarkable and may indicate subtle differences in signal processing as well as the metabolism of individual chemicals.

Decipherment of unorthodox chemical mixtures

To test the versatility of DeSIRAM for decipherment of input chemicals, a blind test was performed between two similar, but distinct, cola beverages (cola A and B) that were likely never encountered before by the laboratory strain of E. coli. Interestingly, the cells responded sufficiently to both cola A and B even when diluted to 1:2000 (Fig. 4a). These traces were very similar and indistinguishable from one another. However, there was a significant difference in the concentration dependency of the index values of colas A and B (Fig. 4b). Accordingly, the accuracy of DeSIRAM to distinguish cola A from B was 0.80 (Fig. 4c). These results reinforce the notion that we can extract information of input chemicals from responses of E. coli cells to those chemicals. DeSIRAM can, therefore, also be applied to identify unusual chemical mixtures by analysing the response activities of organisms, rather than preparing an array of multiple cell types with distinct receptors.

Figure 4
figure 4

Discrimination of unidentified chemical mixtures with DeSIRAM. (a) CW bias traces caused by chemical composition stimuli of two similar, but distinct, cola beverages, cola A (left) and cola B (right). Chemotactic responses to solutions diluted to 1:2000. (b) Concentration dependencies of first 3 indexes, (y_{1} ,;y_{2} ,; y_{3}), of characteristic vectors. Each graph is coloured according to chemical solution species (purple, cola A; green, cola B). Plot makers and lines have the same meaning as in Fig. 2c. (c) Accuracy rate of decipherment for chemical compositions (cola A, n = 22; cola B, n = 22). The left bar shows theoretical decipherment accuracy of RS. The black line on the left bar indicates the standard deviation calculated numerically. The right bar shows the accuracy rate by DeSIRAM. Decipherment accuracy of 0.80 is high relative to that of RS. The probability of an identification success rate of 0.80 by RS is 4.0 × 10−5. DeSIRAM with E. coli distinguished two solutions, although the compositions are not reported.

Source link