Preloader

Construction of integrative transcriptome to boost systematic exploration of Bougainvillea

Sample collection

The tissue samples were collected from the pot-culture Bougainvillea (Bougainvillea × buttiana Miss Manila) at the same time point of normal flower period (from September to November) at Xiamen city, PR China. The samples comprised of thorns, buds, bracts, leaves, stems, and flowers (Table 1). The tissues were excised either from different parts of the same plant or from different plants at the same time point, washed with distilled water, and briefly air-dried in a clean environment. The tissues were mixed, randomly divided into two replicates, packed in silver paper, frozen in liquid nitrogen, and stored in the School of Life Science, Xiamen University, Chen’s lab.

Table 1 Illumina RNA-sequencing datasets used in this study.

RNA library construction and deep sequencing

The sample mixtures were lysed with 1 mL TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Total RNA was prepared according to the manufacturer’s instructions. RNA purity was evaluated with a NanoPhotometer spectrophotometer (Implen USA, Westlake Village, CA, USA). The RNA concentrations were measured with a Qubit RNA assay kit and a Qubit 2.0 fluorometer (Life Technologies, Waltham, MA, USA). RNA integrity was evaluated with the RNA Nano 6000 assay kit in an Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). RNA samples passing the quality control test were stored at − 20 °C until later use.

The RNA library was constructed by the Novogene Co. Ltd. (Beijing, China). Three micrograms RNA per sample was used as the input material for library preparation. Polyadenylated (poly(A)) RNA was purified from total RNA with poly T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). After reverse transcription, the fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA), and the cDNA fragments with length in 150–200 bp were preferentially selected for PCR. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. RNA sequencing was performed in an Illumina HiSeq 4000 system (Illumina, San Diego, CA, USA) using the 125-bp, strand-specific, paired-end mode.

RNA-seq data preprocessing

Before proceeding to the transcriptome assembly, the RNA-seq raw data were filtered with Trimmomatic10 to remove adaptor-contained reads, N-content > 10% reads, and low-quality reads which the percent of base with sequence quality SQ ≤ 5 is more than 50% of the whole reads.

Construction of the integrative transcriptome

The integrative transcriptome for Bougainvillea was constructed with TransIntegrator11 by integrating multiple RNA-seq transcriptomes. In this study, we defined a new term “integrative transcriptome” as the collection of all transcripts expressed in different tissues. It is kind of combination of transcriptomes but much more than that. The algorithm of TransIntegrator has been well described in11, and we only specified the parameters of current application briefly here. In principle, it took four continuous steps for integrative transcriptome construction as follows: (1) 20 Illumina RNA sequencing datasets were de novo assembled separately with Trinity software (version 1.0; Parameters: contig length > 200 bp, K-mer > 5, and max reads of per graph > 200 bp)12. (2) Subsequently, all expressed transcripts of the transcriptomes were mixed together and clustered with CD-HIT-EST (version 4.5.4)13 by setting a sequence identity threshold -c > 90% and an alignment coverage -aS > 80%. The clustered sequences were identified, the longest sequences were elected as the representatives, and the shorter sequences were discarded. (3) The representative transcripts were bridged with CAP3 (version 12/21/07, default parameters) to form longer sequences14 if two sequences had > 40 bp overlap (> 90% sequence identity). (4) The integrative transcriptome for Bougainvillea was refined by removing sequences < 300 bp.

The conventional procedure was adopted to annotate the integrative transcriptome: The coding transcripts were identified with Annocript (version 1.1.3; default parameters)15 by referring to the SwissProt and Pfam databases. The rRNA was annotated with RNAmmer (version 1.0.0; default parameters)16. The tRNA was annotated with tRNAscan-SE (version 2.1.3; default parameters)17. The ncRNAs and miRNA precursors were identified by using the BLAST tool (version 1.3.1; e-value: 1e−5, identify: 90%) against the NONCODE database18 and miRBase19. All annotations were integrated by discarding the low confident annotations subject to the default threshold of corresponding annotation tools.

Validation of the integrative transcriptome

We validated the integrative transcriptome using both computational and experimental methods. Computationally, we mapped the transcriptomic reads against all assembled transcripts with BLASTn to confirm every bases of the transcripts were supported by multiple (≥ 5) reads. In addition, we assessed the integrative transcriptome (as the gene sets) with BUSCO (version 4.1.4; name = “embryophyta_OthoDB10”)20. Moreover, we selected a list of transcripts for experimental validation by meeting criteria of: (1) some transcripts are comparatively dissimilar (identify < 50%) to their matched A. thaliana homologs. (2) Some transcripts are significantly longer or shorter than their A. thaliana homologs. (3) And the transcript sequence can be determined by single read-through of Sanger sequencing. Accordingly, eight transcripts were selected for experimental validation, including three dissimilar transcripts (NBP35, RGL2, and ATZNMP) to the matched A. thaliana homologs, four shorter transcripts (PAF2, PGY2, NBP35, and ATZNMP), three longer transcripts (EXO70C1, IMPA, and RGL2), and one nearly identical transcript (PRT6). The selected eight transcripts were first reversely transcribed into cDNAs (TransGen, China) from the extracted total RNA samples and further amplified by the Polymerase Chain Reaction (PCR) with specifically designed primers (Supplementary Table 1). The PCR products were checked for band size on agarose gel. Furthermore, the sequences of right size PCR products were determined by the Sanger sequencing.

The integrative transcriptome-based transcriptome assembly and performance evaluation

Usually, the reference-based transcriptome assembly exhibits more accurate and more stable performance than that of de novo approaches. In this study, to make the integrative transcriptome an alternative reference, it was annotated and preformatted as two files: BougainvilleaXbuttiana_Manila.fa and BougainvilleaXbuttiana_Manila.gtf. These two files can be downloaded from the project website. Subsequently, we carried out the reference-based transcriptome assembly using the processed integrative transcriptome as the alternative reference: reads aligned with HISAT2 (version 2.1.0; default parameters)21, assembled and estimated transcript abundances with StringTie (version 1.3.4d; default parameters)22, and quantified with R package Ballgown (default parameters)23.

The quality of transcriptomes assembled by different methods were evaluated by examining the completeness and fragmentation ratio of the transcripts with BUSCO (version 4.1.4; database = “embryophyta_OthoDB10”, number of BUSCOs = 1614)20. Before the BUSCO evaluation, we excluded transcripts < 300 bp from the transcriptomes under the consideration of significant difference in transcript length distribution (Fig. 4a). In addition, the potential bridge and assembly score were determined with TransRate (version 1.3, default parameters)24. A good quality transcriptome is expected to have larger transcript completeness, smaller fragmentation ratio, less potential bridges, and higher assembly score. As the controls, we also demonstrated de novo transcriptome assembly, the only applicable solution up-to-date, with Trinity software (version 2.8.5; default parameters) and Velvet (version 1.2.10; default parameters)25. The quality assessment was conducted on an external Illumina sequencing datasets, which was determined in young leaves of Bougainvillea × buttiana (SRA: SRR10076832), the only transcriptomic experiment of same species as that of this study. For fair evaluation, this external dataset was also incorporated into the integrative transcriptome to capture the leaves-specifically expressed transcripts.

Database construction

For user convenience, the transcript library was presented as an online interactive database called InTransBo, which was constructed on the Linux–Apache–JSP platform. MySQL software was used to manage data storage, access, and maintenance. Efficient and friendly user interfaces were designed with JavaScript for interactive transcript search and retrieval.

Statement of consent

This study on Bougainvillea complies with relevant institutional, national, international guidelines and legislation. All the samples in this study were collected from the pot-culture Bougainvillea provided by The Wanyin Environmental Technology Limited Company.

Source link