Fish collection and acclimatisation
Three-month old healthy L. rohita fingerlings of around 10 ± 2 g weight, were collected from Powarkheda Regional Centre of ICAR-CIFE, Madhya Pradesh, India. Laboratory conditions used for fingerling acclimatisation included aeration 24 h, daylight 12 h, 10% daily water exchange, water temperature 28–30 °C and feeding twice by 2% of body weight. Following an acclimatisation of seven days, five healthy fishes were placed in an aquarium under starving conditions for one day followed by euthanization for sample collection. Nineteen different types of samples were collected as shown in Table 1 which includes one whole embryonic tissue sample, blood plasma and 17 tissues. Fifteen of the tissues were collected from fingerlings whereas plasma and gonadal tissues from adult fishes. Blood plasma was collected from female fish and embryos were sampled after four days of fertilisation. Collected samples were stored at −80 °C till further use.
Protein extraction for in-depth proteomic profiling
For extraction of proteins, organ wise samples collected from individual fish were pooled and taken forward. For lysing the tissue, urea buffer containing 8 M Urea, 50 mM Tris-HCl, 1 mM MgCl2 and 75 mM NaCl was used. For fifteen of the tissues including spleen, spinal cord, skin, scales, muscle, male gonad, liver, kidney, heart, gut, gill, female gonad, eye, brain and air bladder, pH shift solubilisation method20 was used for protein extraction. For these tissues, proteins were extracted using urea buffer in three different pH i.e., pH 2.5, 8 and 13. To around 75–100 mg of tissue sample, 300 µl of lysis buffer was added followed by sonication for 2-3 times (Vibra-Cell™ Ultrasonic Liquid Processors, VCX 130 (Sonics). The sample was bead beated using Zirconium/Silica beads (Cat. No. 11079110z) for 90 s. It was followed by centrifugation at 8000 rpm at 4 °C for 15 min to get a clear supernatant containing proteins. For the embryo sample, whole embryos were processed using Trizol method21 of protein extraction. Plasma sample were directly (without any depletion) taken for downstream analysis.
Protein quantification and quality check on SDS-PAGE
Protein quantification was performed by Bradford protein assay, using Bovine Serum Albumin (BSA) as a standard. Accordingly, absorbance was taken at 595 nm and standard curve was plotted using BSA dilutions and concentration for all the unknown samples was determined. In order to check the quality of the protein extract, 1-dimensional SDS-PAGE was performed for which 15 ug protein was loaded for each sample onto a mini-vertical gel (Bio-Rad Mini PROTEAN® 3 Cell, Bio-Rad Laboratories), in accordance to Laemmli protocol22. As the extracted protein was present in urea containing buffer, no heating step was performed before SDS-PAGE to avoid the risk of Carbamylation. Gel electrophoresis was performed for 1-2 hours followed by staining in Coomassie blue R350 solution in methanol and acetic acid. Gel was destained to visualise the protein bands (Supplementary Fig. S1a).
Fractionation, in-gel digestion and peptide preparation
For in-gel digestion, 30 µg protein from each sample was run on SDS-PAGE as above. Each sample was run in duplicate and at least six slices per lane were excised (Fig. 1a). For plasma sample, 11 gel fractions were processed for in-gel digestion. The electrophoresis was performed for only 30–40 minutes i.e., ~1 cm in the resolving gel. Before performing the digestion of protein, stain was removed followed by protein reduction and alkylation. For removing stain from the gel pieces an alternate treatment with buffer salt ammonium bicarbonate (NH4HCO3) and organic solvent Acetonitrile (ACN) solution was performed. Proteins were reduced using Dithiothreitol (DTT) and alkylated using Iodoacetamide (IAA). For protein digestion, trypsin was used in ~1:30 enzyme to protein (w/w) ratio. Peptides were extracted from the gel pieces after 16–18 hours of digestion using an increasing gradient of ACN solution. Peptides were desalted using C18 Empore™ SPE Disks matrix (Merck). Peptide quantification was done using Scopes method23 and one µg of peptide was subjected to mass spectrometric analysis.


An overview of experimental design and analysis workflow. (a) Fishes were dissected to collect the tissue/samples followed by protein extraction and SDS-PAGE. Gel slices were excised and processed for in-gel based tryptic digestion followed by Liquid chromatography tandem mass spectrometry (LC-MS/MS) and analysis in Trans proteomic pipeline (TPP), (b) Raw data obtained from DDA-MS were processed along the pipeline for building PeptideAtlas. Raw files were first converted to mzml followed by comet search and analysis pipeline including peptide prophet, reSpect, iPROphet, protein prophet and final filtering and validation to compile the atlas.
Data-dependant Acquisition by Liquid Chromatography Tandem Mass spectrometry (LC-MS/MS)
An Easy-nLC nano-flow liquid chromatography 1200 system was used for the separation of peptides following in-gel digestion (Fig. 1a). With a flow rate of 5 µl/min, one µg desalted peptides were loaded to pre-analytical column (Thermo Scientific, PN 164564-CMD, Trap column nanoViper C18, 5 µm, 100 Å, Acclaim PepMap 100- 100 µm x 2 cm). The peptides were run over a gradient of 120 min in solvent B which was a solution of 80% ACN with 0.1% Formic acid (FA). The flow rate was kept as 300 nl/min for resolving peptides on the analytical column (Thermo Scientific, PN ES903, C18- 75 μm × 50 cm, 2 μm particle, PepMap RSLC, 100 Å pore size). Mass spectrometric data was acquired using Orbitrap mass analyser in DDA mode in a full scan range of 375–1700 m/z at a mass resolution of 60,000. For dynamic exclusion, the mass tolerance was set as ± 10 for 40 s and for MS2 precursors, the isolation mass window was set to 1.2 Da. High energy Collision Dissociation (HCD) method was used for MS/MS fragmentation. For MS1 and MS2, AGC target was set to be 400000 and 10000, respectively. A lock mass of 445.12003 m/z was used for positive internal calibration.
The mass spectrometric data used in this study for developing PeptideAtlas of Labeo rohita has been utilised for tissue wise profiling of post-translational modifications (PTMs) and comparative protein expression analysis as reported in our recent study24.
Protein identification, TPP analysis and PeptideAtlas assembly
The raw mass spectrometry data (.raw) generated from the Orbitrap Fusion mass spectrometer was converted to .mzML files using MSconvert 3.0.5533 tool25. The converted mzML files were searched using Comet (2019.01 rev.1)26 tool against L. rohita NCBI protein database. This database consisted of protein sequences generated by translation of coding sequences (CDS) through gene predictions after whole genome sequencing of Labeo rohita (Bio project: PRJNA437789). The database had locus tag IDs (prefix Rohu_) and EMBL/Bank/GenBank/DDBJ CSS IDs (prefix RXN). UniProt database for this species (ProteomeID- UP000290572) consists of a UniProt protein identifier for each CD. The NCBI database had 32687 entries and the UniProt database which was downloaded on 16th August, 2019, has 32379 entries and is the subset of the NCBI database. For initial comet search, NCBI database was used whereas all downstream steps including protein identification and PeptideAtlas assembly were performed using combined database of NCBI and UniProt. We utilized the combined database so that the proteins which are not yet included in the UniProt database, can also be covered in PeptideAtlas build.
To the protein database, an equal number of decoy and contaminant sequences were added. Decoy sequences were generated using “randomize sequences and interleave entries” decoy algorithm whereas the contaminant sequences were taken from common Repository of Adventitious Proteins, cRAP, database (http://www.thegpm.org/crap/). The parameters used for the data analysis in Trans-Proteomic Pipeline (TPP) suite include peptide mass tolerance 20 ppm, fragment ions bin tolerance 0.05 m/z and monoisotopic mass offset 0.0 m/z, two allowed missed cleavages, fully tryptic and semi-tryptic peptides, oxidation of tryptophan and methionine (+15.994915 Da) as variable modifications and carbamidomethylation of cysteine (+57.021464 Da) as static modification. Protein identification was performed using TPP V 5.2.0 Flammagenitus27. To score for peptide spectral match (PSM), integrated tools of PeptideProphet and iProphet were used for individual files and the score unique peptides in combined PeptideProphet files. Finally, ProteinProphet tool was used for protein identification based on iProphet input and true identifications were selected at less than 1% FDR28,29,30. The whole workflow is represented in Fig. 1b.
The chimeric spectra were accessed by reanalysing the iProphet files using reSpect algorithm31. In brief, reSpect search was performed on iProphet files by increasing the precursor mass tolerance to 3.0 Da. TPP analysis was performed as mentioned earlier and the process of reSpect and TPP analysis was repeated once. A minimum iProphet probability ≥ 0.0 was used for the reSpect search. PeptideAtlas processing pipeline was used to build PeptideAtlas by combining the iProphet results from regular TPP and reSpect search results. The spectrum was filtered at variable probability to get constant peptide spectrum match (PSM) FDR of 0.0008% for each experiment. The statistically significant results were organized in the “Rohu PeptideAtlas”, which is built and maintained by ISB at the given link. http://www.peptideatlas.org/builds/rohu/.
Ortholog analysis for the identified proteome
Ortholog analysis for the total canonical proteins was performed in EGGNOG-mapper genome-wide functional annotation tool32 (http://eggnog-mapper.embl.de/). Firstly, the FASTA sequences were acquired from UniProt33 of all the protein IDs and taken as input list (Supplementary Table S1). During this analysis, taxonomic scope was selected as Actinopterygii, orthology restrictions selected as ‘transfer annotation from any ortholog’, seed ortholog detection criteria were set to be 0.001.
Acquisition of selected reaction monitoring (SRM) data for targeted verification
The targeted proteomic data was acquired using a Thermo TSQ Altis Triple Quadrupole Mass Spectrometer linked to a Thermo Vanquish HPLC system. The data was acquired using an SRM/ MRM (Selected/ Multiple reaction monitoring) acquisition mode. A Hypersil GOLD analytical column (Thermo Fisher Scientific, 100 × 2 mm, C18) was used for the reverse phase separation of peptides. Samples were run at a flow rate of 450 µl/ min. One µg of desalted peptide sample was subjected to the column and run for 10 minutes. The liquid chromatography system used, consisted of 0.1% formic acid (FA) in milliQ water as solvent A and 80% Acetonitrile (ACN) and 0.1% FA as solvent B. Throughout the run, the column temperature was set to be 45 οC and cycle time was kept as 2 s. The Skyline daily software34 (version 20.2.1) was utilised for analysing the data.

