Preloader

Integration of genome-wide association studies and gene coexpression networks unveils promising soybean resistance genes against five common fungal pathogens

Data summary and genomic distribution of SNPs

After filtering the datasets to keep only fungal species represented by both SNP and transcriptome information, we kept five common phytopathogenic fungi: Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi (Fig. 1A). Overall, SNPs were located in gene-rich regions of the genome (Fig. 1B). SNPs were unevenly distributed across chromosomes, except for F. virguliforme (Fig. 1C). Further, we found that most SNPs were located in intergenic regions (Fig. 1D). Hence, predicting SNP effect on genes would not be suitable for this trait.

Figure 1
figure1

Data summary and genomic distribution of SNPs. (A) Frequency of SNPs and RNA-seq samples included in this study. (B) Genomic coordinates of resistance SNPs against each fungal pathogen. The outer track represents gene density, whereas inner tracks represent the SNP positions for each species. (C) SNP distribution across chromosomes. Overall, there is an uneven distribution of SNPs across chromosomes. (D) Genomic location of SNPs. Most SNPs are located in intergenic regions.

Candidate gene mining reveals a highly species-specific immune response

Using defense-related genes as guides, the cageminer algorithm identified 188, 56, 11, 8, and 3 high-confidence genes for F. virguliforme, F. graminearum, C. gregata, M. phaseolina, and P. pachyrhizi, respectively (Fig. 2). Only three genes were shared between species, revealing a high specificity in plant-pathogen interactions for these species. The three genes are shared by F. virguliforme and F. graminearum, suggesting that some conservation can occur at the genus level, but not at other broader taxonomic levels.

Figure 2
figure2

Venn diagram of prioritized candidate resistance genes against each species. The diagram demonstrates a high species-specific response to each pathogen, as genes are mostly not shared. Only three genes are shared between F. graminearum and F. virguliforme, suggesting some conservation at the genus level.

The specificity of resistance genes to particular species has been widely reported26,27,28,29. This phenomenon imposes a challenge for biotechnological applications, as it requires pyramiding many different genes to render elite cultivars resistant to different pathogens. However, we cannot rule out that the species-specific trend we observed results from low diversity in the association panels in the GWAS we analyzed. Additionally, as SNP and transcriptome data are not available for multiple pathogen strains, we might overlook broad-spectrum resistance genes that confer resistance to multiple strains of the same species27.

Further, we manually curated the high-confidence candidate resistance genes to predict the putative role of their products in plant immunity (Supplementary Table S4). Most of the prioritized candidates (28%) encode proteins involved in immune signaling, although this does not apply to all fungi species (Fig. 3). The main discrepancy in the functional classification of candidates was observed for candidate resistance genes against P. pachyrhizi. However, this is likely due to sampling bias, as the number of SNPs associated with resistance to P. pachyrhizi is limited as compared to other species. Candidates also encode proteins that play a role in recognition, phytohormone metabolism, systemic acquired resistance, transport, transcriptional regulation, oxidative stress, apoptosis, physical defense, and direct function against fungi (Fig. 3).

Figure 3
figure3

Prioritized candidate resistance genes and their putative role in plant immunity. Numbers in circles represent absolute frequencies of resistance genes against C. gregata (blue), F. graminearum (red), F. virguliforme (green), M. phaseolina (purple), and P pachyrhizi (turquoise). PRR, pattern recognition receptor. PAMP, pathogen-associated molecular pattern. MAPKKK, mitogen-activated protein kinase kinase kinase. MAPKK, mitogen-activated protein kinase kinase. MAPK, mitogen-activated protein kinase. SAR, systemic acquired resistance. RBOH, respiratory burst oxidase homolog. ROS, reactive oxygen species. RLK, receptor-like kinase. PR, pathogenesis-related. Figure designed with Biorender (biorender.com).

Interestingly, 21 candidate genes lack functional description and, hence, we could not infer their roles in plant immunity (n = 2, 4, 14, and 1 for C. gregata, F. virguliforme, and P. pachyrhizi, respectively). Nevertheless, as they were identified as high-confidence candidate genes, we hypothesize that they encode defense-related proteins. This finding reveals that besides the identification of high-confidence candidate genes, our algorithm can serve as a network-based approach to predict functions of unannotated genes, similar to previous approaches30,31.

We also developed a scheme that was used to rank high-confidence candidate genes (Table 2). Ranking candidates is particularly useful to prioritize genes when there are several candidates, such as for F. virguliforme and F. graminearum. Here, we suggest using the top 10 candidate resistance genes against each pathogen for experimental validation in future studies. Experimental tests with transgenic or edited soybeans using our set of target genes will likely reveal which genes are more suitable to develop soybean lines with increased resistance to each fungal disease.

Table 2 Top 10 candidate resistance genes against each fungal species and their putative roles in plant immunity.

Pangenome presence/absence variation analysis demonstrates that most prioritized genes are core genes

We analyzed PAV patterns for our prioritized candidate genes in the recently published pangenome of cultivated soybeans to unveil which soybean genotypes contain prioritized candidate genes and explore gene presence/absence variation patterns across genomes18. We found that most candidates are present in all 204 accessions (Supplementary Fig. 1A). This trend is not surprising, as the gene content in this pangenome is highly conserved, with ~ 91% of the genes being shared by > 99% of the genomes. Although the variable genome is enriched in genes associated with defense, signaling, and plant development, this trend was not found in our gene set.

Further, we investigated if gene PAV patterns could be explained by the geographical origins of the accessions (Supplementary Fig. 1B). We observed no clustering by geographical origin, suggesting that gene PAV is not affected by population structure. As this pangenome is comprised of improved soybean accessions18, the lack of population structure effect can be due to breeding programs targeting optimal adaptation to different environmental conditions (e.g., latitude and climate), even if they are in the same country.

Screening of the USDA germplasm reveals a room for genetic improvement

We inspected the USDA germplasm to find the top 5 most resistant genotypes against each fungal pathogen (see Materials and Methods for details). Strikingly, the most resistant genotypes do not contain all resistance alleles, revealing that, theoretically, they could be further improved to increase resistance (Table 3). All resistance-associated SNPs against P. pachyrhizi are present in some accessions, but this is because only two SNPs have been reported for this species. Additionally, none of the reported SNPs for F. graminearum have been identified in the SoySNP50k collection. Hence, we could not predict the most resistant accessions to this fungal species in the USDA germplasm.

Table 3 Top 5 most resistant soybean accessions against each fungal pathogen.

Although some individual genes can confer full race-specific resistance to some pathogens, their durability in the field is often short because of pathogen evolution27. Thus, pyramiding quantitative trait loci (QTL) that confer partial resistance has been proposed as a strategy to confer long-term resistance28. To accomplish this, the most resistant genotypes identified here can be targets of allele pyramiding in breeding programs using marker-assisted selection. Alternatively, these genotypes might have their genomes edited with CRISPR/Cas systems to introduce beneficial alleles or remove deleterious alleles, ultimately boosting resistance.

Development of a user-friendly web application for network exploration

To facilitate network exploration and data reuse, we developed a user-friendly web application named SoyFungiGCN (https://soyfungigcn.venanciogroup.uenf.br/). Users can input a soybean gene of interest (Wm82.a2.v1 assembly) and visualize the gene’s module, scaled intramodular degree, and hub status (Fig. 4A). Additionally, users can explore enriched GO terms, Mapman bins and/or Interpro domains associated with the input gene’s module (Fig. 4A). Users can also visualize a network plot with the input gene and its coexpression neighbors (Fig. 4B). This resource can be particularly useful for researchers studying soybean response to other fungal species, as they can check if their genes of interest are located in defense-related coexpression modules. Also, researchers studying other species can verify if the soybean ortholog of their genes of interest is located in a defense-related module. The application is also available as an R package named SoyFungiGCN (https://github.com/almeidasilvaf/SoyFungiGCN). This package lets users run the application locally as a Shiny app, ensuring the application will always be available, even in case of server downtime.

Figure 4
figure4

Functionalities in the SoyFungiGCN web application. A. Screenshot of the page users see when they access the application. In the sidebar, users can specify the ID of a gene of interest (Wm82.a2.v1 assembly). For each gene, users can see the gene’s module (orange box), scaled degree (red box), hub gene status (green box), and an interactive table with enrichment results for MapMan bins, Interpro domains and Gene Ontology terms associated the gene’s module. P values from enrichment results are adjusted for multiple testing with Benjamini–Hochberg correction. B. Network visualization plot. Users can optionally visualize the input gene and its position in the module by clicking the plus (+) icon in the “Network visualization” tab below the enrichment table. As the plot can take a few seconds to render (~ 2–5 s), it is hidden by default.

Source link