BarBIQ, a high-throughput cell-based method
In BarBIQ (Fig. 1b), we first prepared a bacterial sample (mock community and murine cecal content; see below) in a buffer and broke the bacterial clumps by vortexing (Supplementary Fig. 1 and Supplementary Note 3). Next, we mixed the bacterial sample with a solution including cellular barcodes29,30,31,32, primers, and reagents for DNA amplification and then encapsulated them in droplets (~120 µm in diameter) using the Bio-Rad Droplet DigitalTM PCR (ddPCR) system. We adjusted the concentrations of barcodes and bacteria and their ratio based on the Poisson distribution. For barcodes, ~93% of the barcode occupied droplets each had a single barcode; multiple barcodes in the remaining 7% droplets did not significantly affect the measurement since the quantification of bacteria was on average changed by the factor 1.07 (=0.93 × 1 + 0.07 × 2). Moreover, this factor is common for all bacteria so that their relative proportions were not affected. We found that the bacteria did not affect barcode (83-base single-strand DNA) encapsulation (Supplementary Fig. 3a), which should have followed a Poisson distribution33,34. For bacteria, ~90% of the bacterial occupied droplets each had a single cell; multiple cells in the remaining 10% droplets were mostly different bacteria under this condition (Supplementary Note 1), and they can be distinguished by their 16S rRNA sequences (Supplementary Note 2). We confirmed that the cecal sample bacteria were successfully encapsulated into the droplets following Poisson distribution by microscopic imaging of individual bacteria (Fig. 1d and Supplementary Fig. 2). For subsequent sequencing, bacterial cell lysis by heating35, amplifications of barcode and 16S rRNA genes (V3–V4 region, ~450 bases), attachment of sequencing adapters, and linkage36 between amplified barcodes and 16S rRNA genes were performed in a single step (Fig. 1c and Supplementary Fig. 4). This process mostly avoids the generation of chimeras from different bacteria, which were largely generated by bulk PCR in 16S rRNA gene-amplicon sequencing28. We found that the proportion of droplets in which bacteria were detected by microscopic imaging was close to the proportion of droplets in which 16S rRNA genes were amplified for a cecal sample (Fig. 1e). We also confirmed that the amplification of 16S rRNA genes, including the cell lysis process, was robust by adjusting the initial heating time for cell lysis (Supplementary Fig. 3b). After amplification, we disrupted the droplets, purified the library (the linked amplicons), and sequenced both cellular barcodes and 16S rRNA sequences for individual amplified molecules using a high-throughput MiSeq sequencer. We analyzed the sequenced molecules (i.e., reads) for each sequence type of the barcodes (i.e., cell), identified 16S rRNA sequence(s) (termed Bar sequence(s)) for each cell, and counted the number of cells for each type of 16S rRNA sequence (Methods section, Supplementary Fig. 5, and Supplementary Note 2). More than 105 cells were determined in a MiSeq run. Notably, this analysis also worked for a bacterium that had multiple 16S rRNA sequence types in its genome27 because the same cellular barcode was attached to the multiple amplified 16S rRNA sequences from the same cell, and these same bacterial 16S rRNA sequences were distinguished from incidentally coexisting 16S rRNA sequences, which were from different bacteria in a droplet, based on the quantification of cooccurrence and a statistical model (Supplementary Note 2, step 15). We finally obtained the absolute cell number per unit weight or volume for each 16S rRNA sequence type(s) (termed cOTU, explained below) in the sample by normalizing the sequencing-determined cell number using the total cell number per unit weight or volume of the same sample measured by ddPCR (Supplementary Note 1).
An essential difference between BarBIQ and conventional methods is the unit used for defining the composition of the microbiota and the quantification with the unit (Fig. 1a). In conventional methods, one of the commonly used units is the operational taxonomic unit (OTU)37,38, which represents a group of similar 16S rRNA sequences that are obtained by clustering based on the identities of sequences detected from a bulk sample23,24. Another widely used unit in conventional methods is amplicon sequence variant (ASV)19,39 which is used for a detected unique 16S rRNA sequence in the sample. OTUs and ASVs do not always represent bacterial cells because there are bacteria that have multiple 16S rRNA sequence types in their genome, as described above. However, BarBIQ identifies 16S rRNA sequence(s) (i.e., Bar sequence(s)) from each barcoded cell. To distinguish our cell-based method from conventional methods using OTUs, we named the 16S rRNA sequence(s) (i.e., Bar sequence(s)) from the same cell ‘cell-based operational taxonomy unit (cOTU)’. Furthermore, BarBIQ quantifies the number of cells for each cOTU, while conventional methods measure the number of amplified 16S rRNA gene molecules (16S rRNA gene abundance)12,13.
Efficacy of BarBIQ performed using human gut bacterial strains
We demonstrated that BarBIQ robustly worked for a mock community that contained designed abundances of 10 cultured human gut bacterial strains (Methods section, Supplementary Table 1); these strains have been used as a model community that represented the four most prominent bacterial phyla (Actinobacteria (1 strain), Bacteroidetes (3 strains), Firmicutes (4 strains), and Proteobacteria (2 strains)) in the healthy human gut microbiota40 and included both Gram-positive (5 strains) and Gram-negative (5 strains) bacteria. We found 16 Bar sequences from the mock community (Fig. 2a and Supplementary Data 1). These identified Bar sequences were completely consistent among the three technical replicates (Supplementary Data 1). All 16 Bar sequences were identical to one of the Sanger sequencing-identified 16S rRNA sequences (San sequences) that we identified from each of the 10 cultured strains (Fig. 2a, b, Methods section, Supplementary Data 2). On the other hand, there were another 13 San sequences that had one or two substitutions from the 16 sequences above (San-Bar matched sequences) (Fig. 2b and Supplementary Data 2). We found that these 13 San sequences were reasonably explained by PCR errors in the amplification for Sanger sequencing with the error rate of the polymerase we used (Supplementary Fig. 6a), suggesting that the San sequences and Bar sequences were completely consistent. We then identified 10 cOTUs from the 16 Bar sequences based on the cellular barcodes (Methods section, Supplementary Note 2), and each corresponded to one of the 10 strains, which showed as well that two pairs of Bar sequences that each differed in one base and was from the same bacterial cell (Fig. 2a). Therefore, we concluded that BarBIQ identified the correct number of strains (cOTUs) in the mock community and had single-base accuracy and resolution for 16S rRNA sequence identification.


a Comparison of the 16S rRNA sequences identified by Sanger sequencing and by BarBIQ. Edit distance, Levenshtein distance68, defined as the minimum number of substitutions, insertions, and deletions; San sequence, Sanger sequencing-identified 16S rRNA sequence; ATCC/JCM/DSM <number>, strain ID; A, B, or C, San sequences for each strain; Bar-sequence-MK-<number>, BarBIQ-identified sequences (Bar sequences); COTU-MK-<number>, cell-based operational taxonomy units (cOTUs); red star, Bar sequences that had one base difference. Source data are provided as a Source Data file. b Comparison (Venn diagram) among San sequences identified from each of the ten strains, Bar sequences, amplicon sequence variants (ASVs), and the representative sequences of operational taxonomy units (OTU-RepSeqs) identified from the mock community. Circle, total sequences from each method; numbers in circles, number of unique or identical sequences detected by the given method(s); numbers in the parentheses, total number of sequences detected by the given method. c Comparison of the absolute cell abundance per unit volume of 10 strains in the mock community measured by BarBIQ ([C]BarBIQ) (Supplementary Data 6) and by microscopic imaging ([C]Microscope) (Supplementary Table 1). Data are presented as mean values ± SD (n = 3 for [C]BarBIQ, n = 5 for [C]Microscope). Blue thin line, a fitting line with a fixed slope of one in log scale (intercept: -0.035) by considering the standard errors of both [C]BarBIQ and [C]Microscope, indicating the averaged ratio [C]BarBIQ/[C]Microscope as 0.92; gray thick line, 95% confidence interval of the fitted line, indicating the 95% confidence interval of the ratio [C]BarBIQ/[C]Microscope as 0.68~1.25; r, Pearson coefficient; R2, coefficient of determination. d Comparison of the proportional abundances of 15 ASVs in the mock community measured by Conv_ASV (proportional [C]ASV) (Supplementary Data 5) with the proportional [C]Microscope measured by microscopic imaging. Data are presented as mean values ± SD (n = 2 for proportional [C]ASV, n = 5 for proportional [C]Microscope). Strains that had commonly detected sequence(s) are shown. The strains that were compared with multiple identical ASVs are shown in colors. By the same fitting as c (intercept: −0.28), the averaged ratio of proportional [C]ASV and proportional [C]Microscope was 0.52, and its 95% confidence interval was 0.28–0.99.
In comparison, we analyzed the same mock community by conventional methods with ASV-based analysis using DADA219 and with OTU-based analysis using Mothur41 (Methods section) and identified 28 ASVs and 12 OTUs (Supplementary Data 3 and 4). The numbers of ASVs and OTUs were larger than the number of strains in the mock community, indicating that at least some of the ASVs and OTUs did not represent the strains. Furthermore, we compared the identity of ASVs and the representative sequences of OTUs (OTU-RepSeqs, Methods section) to both the San sequences from the ten strains and the Bar sequences from the mock community. For ASV, although 15 ASVs were identical to one of the San-Bar matched sequences, the other 13 ASVs were not identical to any San sequence (Fig. 2b). We found that the abundances of the 13 nonidentical ASVs varied more than the estimated sampling noise, while the 15 identical ASVs followed the estimated sampling noise well (Supplementary Fig. 6b), and the abundances of the 13 nonidentical ASVs were globally much different from that of the 15 identical ASVs (Supplementary Fig. 6b and Supplementary Data 5), suggesting that the 13 nonidentical ASVs were not correct 16S rRNA sequences from the ten strains. For OTUs, only two of 12 OTU-RepSeqs were identical to one of the San-Bar matched sequences from different strains, and the other 10 OTU-RepSeqs were not identical to any San sequence (Fig. 2b and Supplementary Fig. 6c), meaning that the OTU-based analysis did not detect the correct 16S rRNA sequences from at least eight strains in the 10-strain mock community. Importantly, the 13 San sequences that were not identical to any Bar sequence did not match any ASV or OTU-RepSeqs (Fig. 2b), which supported our interpretation that these 13 San sequences included amplification error(s). Thus, BarBIQ had the highest accuracy of the 16S rRNA sequence identification for the mock community in comparison with that of the conventional methods that we used here.
By BarBIQ, we then measured the cell abundance (number of cells) per unit volume of each cOTU ([C]BarBIQ) in the mock community (Supplementary Data 6) and compared [C]BarBIQ with the cell abundance per unit volume in the mock community, which was calculated by the dilution factor from the measured abundance by microscopic imaging ([C]Microscope) for each strain (Fig. 2c, Methods section, Supplementary Fig. 7, and Supplementary Note 4). First, the reproducibility among three technical replications of [C]BarBIQ was high (standard deviation/mean: 0.01~0.25, N = 3; Fig. 2c). Second, the Pearson product-moment correlation coefficient r (Pearson’s r) between [C]BarBIQ and [C]Microscope was 0.98, and the averaged ratio [C]BarBIQ/[C]Microscope was 0.92 (95% confidence interval: 0.68–1.25; Fig. 2c). Thus, BarBIQ accurately measured the cell abundance of each bacterial strain in the mock community with high reproducibility.
We then compared the 16S rRNA gene abundance of the 15 San sequence-identical ASVs ([C]ASV) to the [C]Microscope (the strains that had multiple 16S rRNA sequences were compared to multiple identical ASVs) with proportional normalization. As expected, the proportional [C]ASV was not well correlated with the proportional [C]Microscope (Fig. 2d). To understand this difference, we estimated the cell abundance from [C]ASV ([C]ASV-estimated); for 6 of the 10 strains that have 16S rRNA gene copy numbers in their genome that are registered in the rrnDB database42, we summed the abundance of multiple ASVs that were identical to the San sequences from the same strain and calculated the proportional [C]ASV-estimated from the [C]ASV using their 16S rRNA gene copy numbers (Supplementary Fig. 8). The Pearson’s r between the proportional [C]ASV-estimated and proportional [C]Microscope (0.97) were increased from that between proportional [C]ASV and proportional [C]Microscope (0.91), which was consistent with the design principle of the conventional method that measured the number of gene copies but not cells.
Measurement conditions for murine cecal microbiota
As an application of BarBIQ for biological samples, we surveyed the microbiota of the murine cecum at two locations (distal and proximal, Fig. 1b) in C57BL/6J male mice mainly depending on diets. The mice maintained under three conditions as follows were analyzed (Methods section): (i) three 6-week-old mice (CEa, CEb, and CEc) from CLEA Japan that was maintained by a balanced nutrient diet CE-2 for their full life span (CE2-nutrient group); (ii) four 8-week-old mice (VSa, VSb, VSc, and VSd) that were purchased from Japan SLC, Inc. and maintained for 5 more weeks by being fed with a compositionally well-defined nutrition-balanced diet (Supplementary Table 2) (VA-sufficient group) (Fig. 3a); and (iii) four mice (VDa, VDb, VDc, and VDd) that underwent the same as those in (ii) except that the vitamin A was not included in the diet (Supplementary Table 2) for the last 3 weeks (VA-deficient group) (Fig. 3a). Investigating the effect of dietary vitamin A on gut microbiota is important because bacteria are essential for vitamin A absorption and storage, and vitamin A deficiency affects the vision, growth, and immune function of the human body, which causes public health problems43,44. For the BarBIQ measurement, we first separated cells and extracellular DNA (ecDNA) from the cecal content of each location in each mouse by filtration, and then measured the filter-residue (cell-sample) and flow-through (ecDNA-sample) (Methods section, Supplementary Fig. 9, and Supplementary Note 5), because extracellular bacterial DNA may affect the quantification of the intestinal microbiota, as recently reported45. We note that this filtration procedure may be used for different types of samples, e.g., oral and skin samples, even if purification and/or concentration is required. We performed three technical replicates for both cell- and ecDNA-samples at each location in the mouse CEa and one measurement for all others; each measurement required <1 mg content.


a Schematic of the experimental design of the dietary vitamin A experiments (Methods). b Sequence identity profile of Bar sequences and ASVs; identity, the identity between each Bar sequence or ASV and its closest 16S rRNA sequence in three public databases: GreenGenes, Ribosomal Database Project, and Silva. Source data are provided in Supplementary Data 1 and 3. c Comparison (Venn diagram) among San sequences, Bar sequences, ASVs, and OTU-RepSeqs identified from the cell-sample at the proximal location of the mouse VDd. Numbers, same as Fig. 2b.
Accurate identification of the 16S rRNA sequences of cOTUs for murine ceca
In total for all cell-samples, we counted 3.4 × 105 bacterial cells and identified 810 cOTUs containing 954 Bar sequences (Supplementary Data 1). In addition, we uniquely identified another 50 Bar sequences from the ecDNA-samples (Supplementary Data 1); we did not define these Bar sequences as cOTUs since the ecDNAs do not represent cells. Importantly, 383 identified Bar sequences (38% of the 1004 (954 + 50)) were not registered in three widely used public databases (GreenGenes46, Ribosomal Database Project47, and Silva48), and all Bar sequences had >93% identity to their closest 16S rRNA sequences in these three databases (Fig. 3b, Methods section, Supplementary Data 1). To confirm that the database-unregistered Bar sequences were accurate, we performed Sanger sequencing for the PCR amplicons of 16S rRNA genes from randomly selected single bacterial cells of the cell-sample from the proximal location of the mouse VDd (VDdprox) and identified 34 unique San sequences (Methods section, Supplementary Data 2). Seventeen (six were not registered in the databases) of the 34 were identical to one of the Bar sequences identified from the same sample VDdprox (Fig. 3c); the remaining 17 San sequences had one, two, or three substitutions or one deletion from the Bar sequences, which was reasonably explained by simulated PCR errors for Sanger sequencing with the error rate of the polymerase we used (Supplementary Fig. 10a). These results suggested that the identified Bar sequences are accurate, which was also evidenced by the mock community experiment above.
In comparison, we also measured 16 cell-samples (4 mice × 2 locations × 2 diet conditions) from the VA-sufficient and VA-deficient groups (VA group) using the ASV-based and OTU-based analyses, respectively (Supplementary Data 3, 4, 5, and 7). To investigate the accuracy of sequence identification for both the ASV-based and OTU-based analyses here, we again used the results of cell-sample VDdprox as above. For ASV, 16 San sequences were identical to one of the ASVs, and the other 18 nonidentical San sequences were reasonably explained by simulated PCR errors for Sanger sequencing (Fig. 3c and Supplementary Fig. 10b). However, for all 16 cell-samples, 4% of ASVs (21 ASVs) had a low (<70%) identity (7% ASVs had <93% identity) with the closest registered 16S rRNA sequence in the databases (Fig. 3b), while the identity between almost all pairs of registered 16S rRNA sequences in the databases was >70% (Supplementary Fig. 10c), suggesting that these low-identity ASVs were not 16S rRNA sequences but probably PCR-errored sequences, e.g., chimeras. For OTUs, although 13 San sequences were identical to one of the OTU-RepSeqs, the other 21 nonidentical San sequences had large differences from the OTU-RepSeqs, which cannot be explained by PCR errors for Sanger sequencing globally (Fig. 3c and Supplementary Fig. 10d), suggesting that the 16S rRNA sequence identification accuracy by the OTU-based analysis was not high. Importantly, the 17 San sequences that were not identical to any Bar sequence did not match any ASV or OTU-RepSeqs (Fig. 3c), which supported our interpretation that these 17 San sequences included amplification error(s). Collectively, for both the mock community and the murine cecal samples, BarBIQ showed the highest accuracy of 16S rRNA sequence identification, which enables BarBIQ to identify unknown 16S rRNA sequences without any predetermined sequences.
Highly reproducible cell abundance quantification of cOTUs in murine ceca
For each identified cOTU in each cell-sample, we quantified the absolute cell abundance per unit weight by BarBIQ (Supplementary Data 8) and calculated the relative cell abundance by normalization among samples using the median of ratios method49 (Methods section). We found that the total absolute cell abundances per unit weight were different among mice (maximum fold change: 2.6 (distal) and 3.2 (proximal)), and the abundance at the distal location was always higher (1.1–3.4 times) than that at the proximal location for all mice (Supplementary Fig. 11). For each cOTU, we confirmed that the results of both relative and absolute cell abundance were highly reproducible by technical replicates for the same sample (CEadist and CEaprox) (Fig. 4a and Supplementary Fig. 12a, b) and that the noise for quantification was mainly from sampling by simulation (Supplementary Fig. 12c–e).


a Comparison (dots) of the relative abundances of cOTUs between a pair of technical replicates (other pairs in Supplementary Fig. 12). Orange lines, theoretical confidential interval (99.9%) for sampling noise based on the Poisson distribution and normalization to the relative abundance (to the proportional abundance for b and c); cyan lines, 2-fold change; magenta lines, 10-fold change; blue dots, cOTUs that showed larger difference than that of the sampling noise and 2-fold change and was smaller than the 10-fold change; proportions and numbers in the parentheses, proportions and the numbers of dots in corresponding colors, respectively; CEa, mouse; dist and prox, locations; 1 and 2, technical replicates (see text). b Comparison between the proportional abundances of the common pairs of cOTUs and ASVs with all 16 cell-samples shown in the same format as a. Red dots, common pairs of cOTUs and ASVs that showed larger difference than the sampling noise and 10-fold change. c Comparison between the proportional abundances of the common pairs of cOTUs and OTUs with all 16 cell-samples shown in the same format as a and b. Source data are provided as a Source Data file.
To investigate the difference in the quantifications by BarBIQ with those by the ASV-based and OTU-based analyses, we compared the proportional cell abundances (divided by the total number of cells) of the cOTUs to the proportional 16S rRNA gene abundances of the ASVs and OTUs with the commonly detected 16S rRNA sequences in each of the 16 cell-samples in the VA group; the cOTU that contained multiple Bar sequences was compared with each corresponding ASV or OTU. As expected, the differences were large (Fig. 4b, c and Supplementary Fig. 13a, b): in all samples, between cOTUs and ASVs, 25% of the compared pairs were >2 times different, and 2% were >10 times; between cOTUs and OTUs, 26% were >2 times and 5% were >10 times. In addition, the 16S rRNA gene abundances of ASVs and OTUs for the commonly detected 16S rRNA sequences were also different (12% >2 times and 2% >10 times) (Supplementary Fig. 13a–c). To understand this difference, we considered the copy numbers of 16S rRNA genes in the genome as described for the mock community (Supplementary Fig. 14), which was again consistent with the design principle of these methods in which BarBIQ measures cell abundance and the ASV-based and OTU-based analyses measure 16S rRNA gene copies.
cOTU-based alpha and beta diversities
BarBIQ enabled both alpha and beta diversity analyses of the bacterial microbiota based on cells (organisms); defining alpha and beta diversity based on organisms is important for understanding the bacterial ecosystem, and this was proposed approximately a half century ago50. As an example of alpha diversities, we defined cOTU-based taxon richness (cOTU richness) as the number of observed cOTUs in a certain total number of detected cells from a sample; taxon richness is a common biodiversity assessment51. We found that the cOTU richness was very different (~2 times) between the CE2-nutrient and VA groups (Fig. 5a), which may be due to the different diets, maintaining facilities, or ages among the groups. On the other hand, the cOTU richness was consistent between the VA-sufficient and VA-deficient groups as well as between distal and proximal locations within each group (Fig. 5a).


a The cOTU richness of each cell-sample determined by subsampling 6608 cells using the function rarefy in the R package Vegan. CE2, CE2 nutriment group; VA-suf, VA-sufficient group; VA-def, VA-deficient group; dist and prox, locations. b, d, and f, Principal coordinate analysis (PCoA) of Bray-Curtis dissimilarities calculated based on the relative cell abundances of cOTUs (b), ASVs (d), and OTUs (f) between each pair of cell-samples in the VA group. Labels, same as (a); gray line, linkage from the same mouse; circles, 95% confidence ellipses for each group. c, e, and g, Quantitative comparison of Bray-Curtis dissimilarities in b, d, and f, respectively. Distal, all possible pairs from VA-sufdist and VA-defdist, respectively; Proximal, all possible pairs from VA-sufprox and VA-defprox, respectively. Boxes in a, c, e, and g represent 25th to 75th percentiles (the interquartile range), horizontal black lines indicate medians, and whiskers show 1.5 times the interquartile range (n = 3 for CE2dist and CE2prox; n = 4 for VA-sufdist, VA-sufprox, VA-defdist, and VA-defprox; n = 16 for Distal and Proximal). P values were calculated by the Kruskal–Wallis rank-sum test. Source data are provided as a Source Data file.
As an example of beta diversities, we calculated Bray-Curtis dissimilarity (abundance-based beta diversity)52 based on the relative cell abundances of the detected cOTUs (Methods section), which quantified the global difference for each pair of samples, and performed principal coordinates analysis (PCoA) for the CE2-nutrient (Supplementary Fig. 15a) and VA groups (Fig. 5b). First, the distal and proximal location groups were not separated for the CE2-nutrient, VA-sufficient, and VA-deficient groups. In detail, the dissimilarities between samples from different mice at the distal or proximal location (magenta symbols in Supplementary Fig. 15b) were higher than those at different locations from the same mouse (blue symbols in Supplementary Fig. 15b), indicating that the mouse variances were globally larger than the location variances. Second, the VA-sufficient and VA-deficient groups were well separated for both the distal and proximal locations (Fig. 5b). Interestingly, the difference between the VA-sufficient and VA-deficient groups at the proximal location was significantly larger than those at the distal location (Fig. 5c). This phenomenon was also found by the Bray-Curtis dissimilarities calculated based on the absolute cell abundances of the cOTUs (Supplementary Fig. 15c). This location difference suggested that the microbiota at the proximal location of the murine cecum was globally affected more than that at the distal location by VA-deficient diet feeding for 3 weeks.
We also calculated the Bray-Curtis dissimilarities between cell-samples using the 16S rRNA gene abundance measured by the ASV-based and OTU-based analyses and performed PCoA for the VA group (Fig. 5d, f, Methods section). Smaller dissimilarities between the distal and proximal locations were found again within the same mice compared to that between mice at each location (Supplementary Fig. 15d, e). However, the dissimilarities based on 16S rRNA gene abundance between VA-sufficient and VA-deficient groups at the distal location were not significantly different from those at the proximal location, which was inconsistent with the cOTU-based BarBIQ results (Fig. 5e, g). These results suggested that biological conclusions may be tilted by the difference between 16S rRNA gene abundance and cell abundance.
Differential cell abundance of cOTUs between cecal locations
We first compared the relative cell abundances of each cOTU between the distal and proximal locations in each mouse (one example in Fig. 6a; all in Supplementary Fig. 16a). The results showed that the proportions of differentially abundant cOTUs (differences were larger than the sampling noise and 2-fold) between the locations of the same mouse in the CE2-nutrient (4–13%; 22–75 cOTUs), VA-sufficient (7–16%; 15–43 cOTUs), and VA-deficient groups (2–8%; 5–18 cOTUs) were all significantly larger than those between technical replicates (0.2–0.9%; 1–5 cOTUs) and that the mice in the VA-deficient group had the lowest proportions of differentially abundant cOTUs (Fig. 6b). We also found similar results for the absolute cell abundances (Supplementary Fig. 16b, c). These results suggested that the bacterial compositions were slightly different between the two locations in each mouse, which was consistent with the Bray–Curtis dissimilarities above (Fig. 5b and Supplementary Fig. 15b). However, most of the differentially abundant cOTUs between locations in the same mouse were not consistent among mice (Supplementary Fig. 17), suggesting that each cOTU should be investigated to accurately characterize the microbiota.


a Comparison between the relative cell abundances of cOTUs detected from the distal location and proximal location of the mouse VSa, shown in Fig. 4a (other mice in Supplementary Fig. 16a). b Proportion of location-dependent differentially abundant cOTUs (differences were larger than the sampling noise and 2-fold) in each mouse. Technical, all pairs from three technical replicates within the distal and proximal locations for the mouse CEa; CE2, the mice in the CE2 nutriment group; VA-suf, the mice in the VA-sufficient group; VA-def, the mice in the VA-deficient group. Boxes represent 25th to 75th percentiles (the interquartile range), horizontal black lines indicate medians, and whiskers show 1.5 times the interquartile range (n = 6 for Technical, n = 3 for CE2, n = 4 for VA-suf and VA-def). P values were calculated by the Kruskal–Wallis rank-sum test. Source data are provided as a Source Data file.
Differential cell abundance of cOTUs regarding dietary vitamin A deficiency
To investigate the effect of dietary vitamin A deficiency on each cOTU, we performed differential abundance analysis53 between the VA-sufficient and VA-deficient groups using the relative cell abundances of the cOTUs for the distal and proximal locations; 153 cOTUs for distal location and 150 cOTUs for proximal location were compared after removing low abundant noisy cOTUs (Methods section, Supplementary Fig. 18a, b). We found that eight cOTUs from five genera (Acetatifactor, Barnesiella, Lactobacillus, Marvinbryantia, and Romboutsia) and two phyla (predicted by the Bar sequence(s) for each cOTU; Methods section) were significantly different (false discovery rate (FDR) < 0.05 and fold change > 2) between the VA-sufficient and VA-deficient groups at the proximal location, while only one (cOTU-CM-2002) (of the eight above) showed a significant change at the distal location (Fig. 7a, b and Supplementary Fig. 18a, b). This result is consistent with a study showing that only a few strains were significantly increased or decreased when a colonized human gut-derived bacterial community (92 strains) in mice were fed a vitamin A-deficient diet for 3 weeks43. The maximum compositions of the eight cOTUs respectively among all 16 cell-samples in the VA-group were in the range of 0.4% (COTU-CM-0025) to 6.5% (cOTU-CM-2002) (Supplementary Data 8). The cOTU-CM-2002 exhibited the largest increase upon the vitamin A-deficient diet (fold change: 320–1100 (log2-fold change with standard error: 9.2 ± 0.9) at the proximal location and 140–320 (7.7 ± 0.6) at the distal location) (Fig. 7b and Supplementary Fig. 18a, b). This cOTU was from the genus Barnesiella; however, 11 of 13 analyzed cOTUs in the genus Barnesiella did not show a significant difference. Similarly, in the genera Acetatifactor, Lactobacillus, and Marvinbryantia, some cOTUs showed significant differences, but others did not (Fig. 7a). These results indicated that vitamin A deficiency shaped the microbiota at the cOTU level and that the cOTU level difference was larger at the proximal location than at the distal location.


a The detected cOTUs and taxonomies (predicted by the RDP classifier, see Methods section). The genera that included at least one differentially abundant cOTU between the VA-sufficient and VA-deficient groups at either the distal or proximal location are shown (other in Supplementary Data 8). Different, FDR < 0.05 and fold change > 2. b The estimated fold changes (dots) and standard errors (error bars, n = 4) by DESeq2 between the VA-sufficient group (VA-suf) and VA-deficient group (VA-def) of the differentially abundant cOTUs (FDR < 0.05, fold change > 2) at either distal (Dist) or proximal (Prox) location. The differential analysis of COTU-CM-0025 at the distal location was not performed since its abundances were too low (Methods section). c The ratio between the number of differentially abundant cOTUs, ASVs, or OTUs at the proximal location (Nprox) and at the distal location (Ndist) as a function of threshold for FDR (left) and for fold change (right). FDR < 0.05 and a fold change > 2 (gray dotted lines) were considered “significant differences” in other analyses. Source data for (b) and (c) are provided as a Source Data file.
We again compared BarBIQ with conventional methods in vitamin A diet experiments by performing differential abundance analysis between VA-sufficient and VA-deficient groups using the relative 16S rRNA gene abundances of the ASVs and OTUs, respectively, for each location (distal location: 223 ASVs and 131 OTUs were compared; proximal location: 236 ASVs and 138 OTUs) (Methods section; Supplementary Fig. 18c–f). The number of significantly different ASVs or OTUs (FDR < 0.05 and fold change > 2) between VA-sufficient and VA-deficient groups at the distal location (6 ASVs or 6 OTUs) was similar to that at the proximal location (8 ASVs or 9 OTUs), which was inconsistent with the results based on cOTUs. This discrepancy between cOTUs and ASVs or OTUs was robust even when we changed the thresholds of FDR and fold change, which defined the significance (Fig. 7c). These results again suggested that the inconsistency between the cell abundance and 16S rRNA gene abundance, which has been pointed out for years,12,13,25 is not negligible for understanding this biological phenomena. Importantly, the largest differentially abundant cOTU (cOTU-CM-2002) determined by BarBIQ (mentioned above) was determined as two separate units by both the ASV-based (ASV28 and ASV31) and OTU-based analyses (Otu029 and Otu032), which also indicated the essential difference between cell-based identification by BarBIQ and gene-based identification by conventional methods. In addition, the OTU-RepSeq of Otu032 had one base difference from ASV31, which is an example of an inconsistency between the ASV-based and OTU-based analyses. Notably, five of the six differentially abundant ASVs at distal location and seven of the eight ASVs at proximal locations were detected by BarBIQ, while four of the six OTUs at distal location and five of the nine OTUs at proximal locations were detected by BarBIQ (Supplementary Data 1, 3, and 4).
Quantification of ecDNA in murine ceca
Although our main goal of ecDNA separation was to accurately measure cell abundance, the measurement of ecDNA may be useful for some studies, e.g., measuring the drug killing efficiency for specific bacteria by monitoring ecDNAs from dead cells. Here, we applied BarBIQ to DNA fragments containing 16S rRNA gene(s), which were segmented by vortexing, (Methods section, Supplementary Note 5) from the ecDNA-samples and detected 664 cOTUs that were determined by the cell-samples and uniquely identified 50 Bar sequences. For these cOTUs and Bar sequences, we quantified the number of DNA fragments (Supplementary Data 8). We found that the ratios of total DNA fragment abundances (Supplementary Fig. 19a) and total cell abundances (ecDNA/cell) per unit weight (Supplementary Fig. 11) for each location and mouse in the CE2-nutrient group (0.01–0.04) were much smaller than those in the VA group (0.08–1.3) (Supplementary Fig. 19b). We then compared the ecDNA/cell ratios for five cOTUs that were commonly detected in all cell- and ecDNA-samples. Interestingly, the ratios of cOTU-CM-0074 and cOTU-CM-2009 in the CE2-nutrient group were significantly smaller than those in the VA group, which was consistent with the tendency of the total ecDNA/cell ratios, while the ratios of cOTU-CM-0823 were comparable between the CE2-nutrient and VA groups (Supplementary Fig. 19c). The results showing the variance in cOTUs suggested that the distinct tendencies of the ratio between ecDNA and cells, which may represent the ratio between broken and nonbroken cells in different bacteria, might be related to bacteria-microenvironment interactions.

