1
|
Billows N, Phelan J, Xia D, Peng Y, Clark TG, Chang YM. Large-scale statistical analysis of Mycobacterium tuberculosis genome sequences identifies compensatory mutations associated with multi-drug resistance. Sci Rep 2024; 14:12312. [PMID: 38811658 PMCID: PMC11137121 DOI: 10.1038/s41598-024-62946-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 05/22/2024] [Indexed: 05/31/2024] Open
Abstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis, has a significant impact on global health worldwide. The development of multi-drug resistant strains that are resistant to the first-line drugs isoniazid and rifampicin threatens public health security. Rifampicin and isoniazid resistance are largely underpinned by mutations in rpoB and katG respectively and are associated with fitness costs. Compensatory mutations are considered to alleviate these fitness costs and have been observed in rpoC/rpoA (rifampicin) and oxyR'-ahpC (isoniazid). We developed a framework (CompMut-TB) to detect compensatory mutations from whole genome sequences from a large dataset comprised of 18,396 M. tuberculosis samples. We performed association analysis (Fisher's exact tests) to identify pairs of mutations that are associated with drug-resistance, followed by mediation analysis to identify complementary or full mediators of drug-resistance. The analyses revealed several potential mutations in rpoC (N = 47), rpoA (N = 4), and oxyR'-ahpC (N = 7) that were considered either 'highly likely' or 'likely' to confer compensatory effects on drug-resistance, including mutations that have previously been reported and validated. Overall, we have developed the CompMut-TB framework which can assist with identifying compensatory mutations which is important for more precise genome-based profiling of drug-resistant TB strains and to further understanding of the evolutionary mechanisms that underpin drug-resistance.
Collapse
Affiliation(s)
- Nina Billows
- Royal Veterinary College, University of London, London, UK.
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.
| | - Jody Phelan
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Dong Xia
- Royal Veterinary College, University of London, London, UK
| | | | - Taane G Clark
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
- Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
| | - Yu-Mei Chang
- Royal Veterinary College, University of London, London, UK
| |
Collapse
|
2
|
Takou M, Bellis ES, Lasky JR. Predicting gene expression responses to environment in Arabidopsis thaliana using natural variation in DNA sequence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.591174. [PMID: 38712066 PMCID: PMC11071634 DOI: 10.1101/2024.04.25.591174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis- regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis- regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis- regulatory sequences, although more information needs to be incorporated and better models may be required.
Collapse
|
3
|
Zhang Q, Yang Z, Yang J. Dissecting the colocalized GWAS and eQTLs with mediation analysis for high-dimensional exposures and confounders. Biometrics 2024; 80:ujae050. [PMID: 38801257 DOI: 10.1093/biomtc/ujae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/14/2024] [Accepted: 05/14/2024] [Indexed: 05/29/2024]
Abstract
To leverage the advancements in genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping for traits and molecular phenotypes to gain mechanistic understanding of the genetic regulation, biological researchers often investigate the expression QTLs (eQTLs) that colocalize with QTL or GWAS peaks. Our research is inspired by 2 such studies. One aims to identify the causal single nucleotide polymorphisms that are responsible for the phenotypic variation and whose effects can be explained by their impacts at the transcriptomic level in maize. The other study in mouse focuses on uncovering the cis-driver genes that induce phenotypic changes by regulating trans-regulated genes. Both studies can be formulated as mediation problems with potentially high-dimensional exposures, confounders, and mediators that seek to estimate the overall indirect effect (IE) for each exposure. In this paper, we propose MedDiC, a novel procedure to estimate the overall IE based on difference-in-coefficients approach. Our simulation studies find that MedDiC offers valid inference for the IE with higher power, shorter confidence intervals, and faster computing time than competing methods. We apply MedDiC to the 2 aforementioned motivating datasets and find that MedDiC yields reproducible outputs across the analysis of closely related traits, with results supported by external biological evidence. The code and additional information are available on our GitHub page (https://github.com/QiZhangStat/MedDiC).
Collapse
Affiliation(s)
- Qi Zhang
- Department of Mathematics and Statistics, University of New Hampshire, Durham, NH 03824, United States
| | - Zhikai Yang
- Complex Biosystems Program and Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, United States
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, United States
| |
Collapse
|
4
|
Yang Z, Zhao T, Cheng H, Yang J. Microbiome-enabled genomic selection improves prediction accuracy for nitrogen-related traits in maize. G3 (BETHESDA, MD.) 2024; 14:jkad286. [PMID: 38113533 PMCID: PMC11090461 DOI: 10.1093/g3journal/jkad286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 05/19/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
Root-associated microbiomes in the rhizosphere (rhizobiomes) are increasingly known to play an important role in nutrient acquisition, stress tolerance, and disease resistance of plants. However, it remains largely unclear to what extent these rhizobiomes contribute to trait variation for different genotypes and if their inclusion in the genomic selection protocol can enhance prediction accuracy. To address these questions, we developed a microbiome-enabled genomic selection method that incorporated host SNPs and amplicon sequence variants from plant rhizobiomes in a maize diversity panel under high and low nitrogen (N) field conditions. Our cross-validation results showed that the microbiome-enabled genomic selection model significantly outperformed the conventional genomic selection model for nearly all time-series traits related to plant growth and N responses, with an average relative improvement of 3.7%. The improvement was more pronounced under low N conditions (8.4-40.2% of relative improvement), consistent with the view that some beneficial microbes can enhance N nutrient uptake, particularly in low N fields. However, our study could not definitively rule out the possibility that the observed improvement is partially due to the amplicon sequence variants being influenced by microenvironments. Using a high-dimensional mediation analysis method, our study has also identified microbial mediators that establish a link between plant genotype and phenotype. Some of the detected mediator microbes were previously reported to promote plant growth. The enhanced prediction accuracy of the microbiome-enabled genomic selection models, demonstrated in a single environment, serves as a proof-of-concept for the potential application of microbiome-enabled plant breeding for sustainable agriculture.
Collapse
Affiliation(s)
- Zhikai Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Tianjing Zhao
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| |
Collapse
|
5
|
Gomez-Cano F, Rodriguez J, Zhou P, Chu YH, Magnusson E, Gomez-Cano L, Krishnan A, Springer NM, de Leon N, Grotewold E. Prioritizing Metabolic Gene Regulators through Multi-Omic Network Integration in Maize. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582075. [PMID: 38464086 PMCID: PMC10925184 DOI: 10.1101/2024.02.26.582075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Elucidating gene regulatory networks (GRNs) is a major area of study within plant systems biology. Phenotypic traits are intricately linked to specific gene expression profiles. These expression patterns arise primarily from regulatory connections between sets of transcription factors (TFs) and their target genes. In this study, we integrated publicly available co-expression networks derived from more than 6,000 RNA-seq samples, 283 protein-DNA interaction assays, and 16 million of SNPs used to identify expression quantitative loci (eQTL), to construct TF-target networks. In total, we analyzed ~4.6M interactions to generate four distinct types of TF-target networks: co-expression, protein-DNA interaction (PDI), trans-expression quantitative loci (trans-eQTL), and cis-eQTL combined with PDIs. To improve the functional annotation of TFs based on its target genes, we implemented three different strategies to integrate these four types of networks. We subsequently evaluated the effectiveness of our method through loss-of function mutant and random networks. The multi-network integration allowed us to identify transcriptional regulators of hormone-, metabolic- and development-related processes. Finally, using the topological properties of the fully integrated network, we identified potentially functional redundant TF paralogs. Our findings retrieved functions previously documented for numerous TFs and revealed novel functions that are crucial for informing the design of future experiments. The approach here-described lays the foundation for the integration of multi-omic datasets in maize and other plant systems.
Collapse
Affiliation(s)
- Fabio Gomez-Cano
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824-6473, USA
- Current address: Department of Molecular, Cellular, and Development Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jonas Rodriguez
- Department of Plant and Agroecosystem Sciences, University of Wisconsin Madison, Madison, WI 53706, USA
| | - Peng Zhou
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN 55108
| | - Yi-Hsuan Chu
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824-6473, USA
| | - Erika Magnusson
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN 55108
| | - Lina Gomez-Cano
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824-6473, USA
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nathan M Springer
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN 55108
- Current address: Global Breeding, Bayer Crop Sciences, Chesterfield MO 63017, USA
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin Madison, Madison, WI 53706, USA
| | - Erich Grotewold
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824-6473, USA
| |
Collapse
|