1
|
Chakraborty S, Xu J. Biconvex Clustering. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2197474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
2
|
NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
|
3
|
Hawe JS, Saha A, Waldenberger M, Kunze S, Wahl S, Müller-Nurasyid M, Prokisch H, Grallert H, Herder C, Peters A, Strauch K, Theis FJ, Gieger C, Chambers J, Battle A, Heinig M. Network reconstruction for trans acting genetic loci using multi-omics data and prior information. Genome Med 2022; 14:125. [PMID: 36344995 PMCID: PMC9641770 DOI: 10.1186/s13073-022-01124-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/11/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information. METHODS We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data. RESULTS Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses. CONCLUSIONS We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms.
Collapse
Affiliation(s)
- Johann S Hawe
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Heart Centre Munich, Department of Cardiology, Technical University Munich, Munich, Germany.,Department of Informatics, Technical University of Munich, Garching, Germany
| | - Ashis Saha
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Waldenberger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Sonja Kunze
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Simone Wahl
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,IBE, Faculty of Medicine, LMU Munich, 81377, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technische Universität München, Munich, Germany
| | - Harald Grallert
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), Neuherberg, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany.,Division of Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Annette Peters
- Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Fabian J Theis
- Department of Informatics, Technical University of Munich, Garching, Germany.,Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Christian Gieger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - John Chambers
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.,Lee Kong Chian School of Medicine, Nanyang Technological University, 308232, Singapore, Singapore
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Matthias Heinig
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany. .,Department of Informatics, Technical University of Munich, Garching, Germany. .,Munich Heart Association, Partner Site Munich, DZHK (German Centre for Cardiovascular Research), 10785, Berlin, Germany.
| |
Collapse
|
4
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
5
|
Kashima M, Shida Y, Yamashiro T, Hirata H, Kurosaka H. Intracellular and Intercellular Gene Regulatory Network Inference From Time-Course Individual RNA-Seq. FRONTIERS IN BIOINFORMATICS 2021; 1:777299. [PMID: 36303726 PMCID: PMC9580923 DOI: 10.3389/fbinf.2021.777299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 10/26/2021] [Indexed: 11/13/2022] Open
Abstract
Gene regulatory network (GRN) inference is an effective approach to understand the molecular mechanisms underlying biological events. Generally, GRN inference mainly targets intracellular regulatory relationships such as transcription factors and their associated targets. In multicellular organisms, there are both intracellular and intercellular regulatory mechanisms. Thus, we hypothesize that GRNs inferred from time-course individual (whole embryo) RNA-Seq during development can reveal intercellular regulatory relationships (signaling pathways) underlying the development. Here, we conducted time-course bulk RNA-Seq of individual mouse embryos during early development, followed by pseudo-time analysis and GRN inference. The results demonstrated that GRN inference from RNA-Seq with pseudo-time can be applied for individual bulk RNA-Seq similar to scRNA-Seq. Validation using an experimental-source-based database showed that our approach could significantly infer GRN for all transcription factors in the database. Furthermore, the inferred ligand-related and receptor-related downstream genes were significantly overlapped. Thus, the inferred GRN based on whole organism could include intercellular regulatory relationships, which cannot be inferred from scRNA-Seq based only on gene expression data. Overall, inferring GRN from time-course bulk RNA-Seq is an effective approach to understand the regulatory relationships underlying biological events in multicellular organisms.
Collapse
Affiliation(s)
- Makoto Kashima
- College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Japan
- *Correspondence: Makoto Kashima,
| | - Yuki Shida
- Department of Orthodontics and Dentofacial Orthopedics, Osaka University, Suita, Japan
| | - Takashi Yamashiro
- Department of Orthodontics and Dentofacial Orthopedics, Osaka University, Suita, Japan
| | - Hiromi Hirata
- College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Japan
| | - Hiroshi Kurosaka
- Department of Orthodontics and Dentofacial Orthopedics, Osaka University, Suita, Japan
| |
Collapse
|
6
|
Wei P, Sagarna R, Ke Y, Ong YS. Practical Multisource Transfer Regression With Source-Target Similarity Captures. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3498-3509. [PMID: 32784144 DOI: 10.1109/tnnls.2020.3012457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A key challenge in many applications of multisource transfer learning is to explicitly capture the diverse source-target similarities. In this article, we are concerned with stretching the set of practical approaches based on Gaussian process (GP) models to solve multisource transfer regression problems. Precisely, we first investigate the feasibility and performance of a family of transfer covariance functions that represent the pairwise similarity of each source and the target domain. We theoretically show that using such a transfer covariance function for general GP modeling can only capture the same similarity coefficient for all the sources, and thus may result in unsatisfactory transfer performance. This outcome, together with the scalability issues of a single GP based approach, leads us to propose TCMSStack , an integrated framework incorporating a separate transfer covariance function for each source and stacking. Contrary to typical stacking approaches, TCMSStack learns the source-target similarity in each base GP model by considering the dependencies of the other sources along the process. We introduce two instances of the proposed TCMSStack . Extensive experiments on one synthetic and two real-world data sets, with learning settings up to 11 sources for the latter, demonstrate the effectiveness of our approach.
Collapse
|
7
|
Ma CZ, Brent MR. Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data. Bioinformatics 2021; 37:1234-1245. [PMID: 33135076 PMCID: PMC8189679 DOI: 10.1093/bioinformatics/btaa947] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 09/26/2020] [Accepted: 10/27/2020] [Indexed: 12/20/2022] Open
Abstract
Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cynthia Z Ma
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
8
|
Lenz AR, Galán-Vásquez E, Balbinot E, de Abreu FP, Souza de Oliveira N, da Rosa LO, de Avila e Silva S, Camassola M, Dillon AJP, Perez-Rueda E. Gene Regulatory Networks of Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 Inferred by a Computational Biology Approach. Front Microbiol 2020; 11:588263. [PMID: 33193246 PMCID: PMC7652724 DOI: 10.3389/fmicb.2020.588263] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/23/2020] [Indexed: 11/29/2022] Open
Abstract
Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 are well-known cellulase fungal producers. However, few studies addressing global mechanisms for gene regulation of these two important organisms are available so far. A recent finding that the 2HH wild-type is closely related to P. oxalicum leads to a combined study of these two species. Firstly, we provide a global gene regulatory network for P. echinulatum 2HH and P. oxalicum 114-2, based on TF-TG orthology relationships, considering three related species with well-known regulatory interactions combined with TFBSs prediction. The network was then analyzed in terms of topology, identifying TFs as hubs, and modules. Based on this approach, we explore numerous identified modules, such as the expression of cellulolytic and xylanolytic systems, where XlnR plays a key role in positive regulation of the xylanolytic system. It also regulates positively the cellulolytic system by acting indirectly through the cellodextrin induction system. This remarkable finding suggests that the XlnR-dependent cellulolytic and xylanolytic regulatory systems are probably conserved in both P. echinulatum and P. oxalicum. Finally, we explore the functional congruency on the genes clustered in terms of communities, where the genes related to cellular nitrogen, compound metabolic process and macromolecule metabolic process were the most abundant. Therefore, our approach allows us to confer a degree of accuracy regarding the existence of each inferred interaction.
Collapse
Affiliation(s)
- Alexandre Rafael Lenz
- Unidad Académica Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de Mexico, Mérida, Mexico
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
- Departamento de Ciências Exatas e da Terra, Universidade do Estado da Bahia, Salvador, Brazil
| | - Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemàticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de Mexico, Ciudad Universitaria, Mexico
| | - Eduardo Balbinot
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Nikael Souza de Oliveira
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Letícia Osório da Rosa
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila e Silva
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Marli Camassola
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Aldo José Pinheiro Dillon
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Ernesto Perez-Rueda
- Unidad Académica Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de Mexico, Mérida, Mexico
- Facultad de Ciencias, Centro de Genómica y Bioinformática, Universidad Mayor, Santiago, Chile
| |
Collapse
|
9
|
Clemente-Moreno MJ, Omranian N, Sáez PL, Figueroa CM, Del-Saz N, Elso M, Poblete L, Orf I, Cuadros-Inostroza A, Cavieres LA, Bravo L, Fernie AR, Ribas-Carbó M, Flexas J, Nikoloski Z, Brotman Y, Gago J. Low-temperature tolerance of the Antarctic species Deschampsia antarctica: A complex metabolic response associated with nutrient remobilization. PLANT, CELL & ENVIRONMENT 2020; 43:1376-1393. [PMID: 32012308 DOI: 10.1111/pce.13737] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 01/19/2020] [Accepted: 01/21/2020] [Indexed: 06/10/2023]
Abstract
The species Deschampsia antarctica (DA) is one of the only two native vascular species that live in Antarctica. We performed ecophysiological, biochemical, and metabolomic studies to investigate the responses of DA to low temperature. In parallel, we assessed the responses in a non-Antarctic reference species (Triticum aestivum [TA]) from the same family (Poaceae). At low temperature (4°C), both species showed lower photosynthetic rates (reductions were 70% and 80% for DA and TA, respectively) and symptoms of oxidative stress but opposite responses of antioxidant enzymes (peroxidases and catalase). We employed fused least absolute shrinkage and selection operator statistical modelling to associate the species-dependent physiological and antioxidant responses to primary metabolism. Model results for DA indicated associations with osmoprotection, cell wall remodelling, membrane stabilization, and antioxidant secondary metabolism (synthesis of flavonols and phenylpropanoids), coordinated with nutrient mobilization from source to sink tissues (confirmed by elemental analysis), which were not observed in TA. The metabolic behaviour of DA, with significant changes in particular metabolites, was compared with a newly compiled multispecies dataset showing a general accumulation of metabolites in response to low temperatures. Altogether, the responses displayed by DA suggest a compromise between catabolism and maintenance of leaf functionality.
Collapse
Affiliation(s)
- María José Clemente-Moreno
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| | - Nooshin Omranian
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476 Potsdam, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Patricia L Sáez
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, Concepción, Chile
| | | | - Néstor Del-Saz
- Laboratorio de Fisiología Vegetal, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, Concepción, Chile
| | - Mhartyn Elso
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, Concepción, Chile
| | - Leticia Poblete
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, Concepción, Chile
| | - Isabel Orf
- Department of Life Sciences, Ben Gurion University of the Negev, Beersheva, Israel
| | | | - Lohengrin A Cavieres
- ECOBIOSIS, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción and Instituto de Ecología y Biodiversidad-IEB, Concepción, Chile
| | - León Bravo
- Lab. de Fisiología y Biología Molecular Vegetal, Dpt. de Cs. Agronómicas y Recursos Naturales, Facultad de Cs. Agropecuarias y Forestales, Instituto de Agroindustria, & Center of Plant, Soil Interaction and Natural Resources Biotechnology, Scientific and Technological Bioresource Nucleus, Universidad de La Frontera, Temuco, Chile
| | - Alisdair R Fernie
- Central Metabolism Group, Molecular Physiology Department, Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam, Germany
| | - Miquel Ribas-Carbó
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| | - Jaume Flexas
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476 Potsdam, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
- Center of Plant System Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Yariv Brotman
- Department of Life Sciences, Ben Gurion University of the Negev, Beersheva, Israel
| | - Jorge Gago
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| |
Collapse
|
10
|
Saint-Antoine MM, Singh A. Network inference in systems biology: recent developments, challenges, and applications. Curr Opin Biotechnol 2020; 63:89-98. [PMID: 31927423 PMCID: PMC7308210 DOI: 10.1016/j.copbio.2019.12.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/03/2019] [Indexed: 12/12/2022]
Abstract
One of the most interesting, difficult, and potentially useful topics in computational biology is the inference of gene regulatory networks (GRNs) from expression data. Although researchers have been working on this topic for more than a decade and much progress has been made, it remains an unsolved problem and even the most sophisticated inference algorithms are far from perfect. In this paper, we review the latest developments in network inference, including state-of-the-art algorithms like PIDC, Phixer, and more. We also discuss unsolved computational challenges, including the optimal combination of algorithms, integration of multiple data sources, and pseudo-temporal ordering of static expression data. Lastly, we discuss some exciting applications of network inference in cancer research, and provide a list of useful software tools for researchers hoping to conduct their own network inference analyses.
Collapse
Affiliation(s)
- Michael M Saint-Antoine
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware 19716, USA
| | - Abhyudai Singh
- Electrical and Computer Engineering, University of Delaware, Newark, Delaware 19716, USA.
| |
Collapse
|
11
|
Jackson CA, Castro DM, Saldi GA, Bonneau R, Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. eLife 2020; 9:e51254. [PMID: 31985403 PMCID: PMC7004572 DOI: 10.7554/elife.51254] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 01/10/2020] [Indexed: 11/13/2022] Open
Abstract
Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.
Collapse
Affiliation(s)
- Christopher A Jackson
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| | | | | | - Richard Bonneau
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
- Courant Institute of Mathematical Sciences, Computer Science DepartmentNew York UniversityNew YorkUnited States
- Center For Data ScienceNew York UniversityNew YorkUnited States
- Flatiron Institute, Center for Computational BiologySimons FoundationNew YorkUnited States
| | - David Gresham
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| |
Collapse
|
12
|
Clemente-Moreno MJ, Omranian N, Sáez P, Figueroa CM, Del-Saz N, Elso M, Poblete L, Orf I, Cuadros-Inostroza A, Cavieres L, Bravo L, Fernie A, Ribas-Carbó M, Flexas J, Nikoloski Z, Brotman Y, Gago J. Cytochrome respiration pathway and sulphur metabolism sustain stress tolerance to low temperature in the Antarctic species Colobanthus quitensis. THE NEW PHYTOLOGIST 2020; 225:754-768. [PMID: 31489634 DOI: 10.1111/nph.16167] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 08/22/2019] [Indexed: 05/28/2023]
Abstract
Understanding the strategies employed by plant species that live in extreme environments offers the possibility to discover stress tolerance mechanisms. We studied the physiological, antioxidant and metabolic responses to three temperature conditions (4, 15, and 23°C) of Colobanthus quitensis (CQ), one of the only two native vascular species in Antarctica. We also employed Dianthus chinensis (DC), to assess the effects of the treatments in a non-Antarctic species from the same family. Using fused LASSO modelling, we associated physiological and biochemical antioxidant responses with primary metabolism. This approach allowed us to highlight the metabolic pathways driving the response specific to CQ. Low temperature imposed dramatic reductions in photosynthesis (up to 88%) but not in respiration (sustaining rates of 3.0-4.2 μmol CO2 m-2 s-1 ) in CQ, and no change in the physiological stress parameters was found. Its notable antioxidant capacity and mitochondrial cytochrome respiratory activity (20 and two times higher than DC, respectively), which ensure ATP production even at low temperature, was significantly associated with sulphur-containing metabolites and polyamines. Our findings potentially open new biotechnological opportunities regarding the role of antioxidant compounds and respiratory mechanisms associated with sulphur metabolism in stress tolerance strategies to low temperature.
Collapse
Affiliation(s)
- María José Clemente-Moreno
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| | - Nooshin Omranian
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam-Golm, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| | - Patricia Sáez
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, 4030000, Concepción, Chile
| | - Carlos María Figueroa
- Instituto de Agrobiotecnología del Litoral, UNL, CONICET, FBCB, 3000, Santa Fe, Argentina
| | - Néstor Del-Saz
- Laboratorio de Fisiología Vegetal, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, 4030000, Concepción, Chile
| | - Mhartyn Elso
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, 4030000, Concepción, Chile
| | - Leticia Poblete
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, 4030000, Concepción, Chile
| | - Isabel Orf
- Department of Life Sciences, Ben Gurion University of the Negev, 8410501, Beer Sheva, Israel
| | | | - Lohengrin Cavieres
- ECOBIOSIS, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, 4030000, Concepción, Chile
| | - León Bravo
- Laboratorio de Fisiología y Biología Molecular Vegetal, Departamento de Cs. Agronómicas y Recursos Naturales, Facultad de Ciencias Agropecuarias y Forestales, Instituto de Agroindustria, Universidad de La Frontera, Temuco, Chile
- Center of Plant, Soil Interaction and Natural Resources Biotechnology, Scientific and Technological Bioresource Nucleus, Universidad de La Frontera, 4811230, Temuco, Chile
| | - Alisdair Fernie
- Central Metabolism Group, Molecular Physiology Department, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Golm, Germany
| | - Miquel Ribas-Carbó
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| | - Jaume Flexas
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam-Golm, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
- Center of Plant System Biology and Biotechnology (CPSBB), 4000, Plovdiv, Bulgaria
| | - Yariv Brotman
- Department of Life Sciences, Ben Gurion University of the Negev, 8410501, Beer Sheva, Israel
| | - Jorge Gago
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| |
Collapse
|
13
|
Zhang W, Li W, Zhang J, Wang N. Data Integration of Hybrid Microarray and Single Cell Expression Data to Enhance Gene Network Inference. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190104142228] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background:
Gene Regulatory Network (GRN) inference algorithms aim to explore
casual interactions between genes and transcriptional factors. High-throughput transcriptomics
data including DNA microarray and single cell expression data contain complementary
information in network inference.
Objective:
To enhance GRN inference, data integration across various types of expression data
becomes an economic and efficient solution.
Method:
In this paper, a novel E-alpha integration rule-based ensemble inference algorithm is
proposed to merge complementary information from microarray and single cell expression data.
This paper implements a Gradient Boosting Tree (GBT) inference algorithm to compute
importance scores for candidate gene-gene pairs. The proposed E-alpha rule quantitatively
evaluates the credibility levels of each information source and determines the final ranked list.
Results:
Two groups of in silico gene networks are applied to illustrate the effectiveness of the
proposed E-alpha integration. Experimental outcomes with size50 and size100 in silico gene
networks suggest that the proposed E-alpha rule significantly improves performance metrics
compared with single information source.
Conclusion:
In GRN inference, the integration of hybrid expression data using E-alpha rule
provides a feasible and efficient way to enhance performance metrics than solely increasing
sample sizes.
Collapse
Affiliation(s)
- Wei Zhang
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| | - Wenchao Li
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| | - Jianming Zhang
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| | - Ning Wang
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| |
Collapse
|
14
|
Bhola A, Singh S. Visualisation and Modelling of High-Dimensional Cancerous Gene Expression Dataset. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2019. [DOI: 10.1142/s0219649219500011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The increase in the number of dimensions of cancerous gene expression dataset causes an increase in complexity, misinterpretation and decrease in the visualisation of the particular dataset for further analysis. Therefore, dimensionality reduction, visualisation and modelling tasks of these dataset become challenging. In this paper, a framework is developed which helps to understand, visualise and model high-dimensional cancerous gene expression dataset into lower dimensions which may be helpful in revealing cancer mechanism and diagnosis. Initially, cancerous gene expression datasets are preprocessed to make them complete, precise and efficient; and principal component analysis is applied for dimensionality reduction and visualisation purpose. The regression is used to model the cancerous gene expression dataset so that type of association (linear or nonlinear) and directions between gene profiles may be estimated. To assess the performance of the developed framework, three different types of cancerous gene expression datasets are taken namely: breast (GEO Acc. No. GDS5076), lung (GEO Acc. No. GDS5040) and prostate (GEO Acc. No. GDS5072) which are publicly available. To validate the results of the regression the cross-validation method is used. The results revealed that a linear approach is to be used for prostate cancer dataset and nonlinear approach for breast and lung cancer datasets in finding an association between gene pairs.
Collapse
Affiliation(s)
- Abhishek Bhola
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector 12, Chandigarh 160012, India
| | - Shailendra Singh
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector 12, Chandigarh 160012, India
| |
Collapse
|
15
|
Castro DM, de Veaux NR, Miraldi ER, Bonneau R. Multi-study inference of regulatory networks for more accurate models of gene regulation. PLoS Comput Biol 2019; 15:e1006591. [PMID: 30677040 PMCID: PMC6363223 DOI: 10.1371/journal.pcbi.1006591] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 02/05/2019] [Accepted: 10/23/2018] [Indexed: 12/16/2022] Open
Abstract
Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets. Due to increasing availability of biological data, methods to properly integrate data generated across the globe become essential for extracting reproducible insights into relevant research questions. In this work, we developed a framework to reconstruct gene regulatory networks from expression datasets generated in separate studies—and thus, because of technical variation (different dates, handlers, laboratories, protocols etc…), challenging to integrate. Since regulatory mechanisms are often shared across conditions, we hypothesized that drawing conclusions from various data sources would improve performance of gene regulatory network inference. By transferring knowledge among regulatory models, our method is able to detect weaker patterns that are conserved across datasets, while also being able to detect dataset-unique interactions. We also allow incorporation of prior knowledge on network structure to favor models that are somewhat similar to the prior itself. Using two model organisms, we show that joint network inference outperforms inference from a single dataset. We also demonstrate that our method is robust to false edges in the prior and to low condition overlap across datasets, and that it can outperform current data integration strategies.
Collapse
Affiliation(s)
| | - Nicholas R de Veaux
- Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| | - Emily R Miraldi
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.,Divisions of Immunobiology & Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Richard Bonneau
- New York University, New York, NY 10003, USA.,Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| |
Collapse
|
16
|
Zheng R, Li M, Chen X, Wu FX, Pan Y, Wang J. BiXGBoost: a scalable, flexible boosting-based method for reconstructing gene regulatory networks. Bioinformatics 2018; 35:1893-1900. [DOI: 10.1093/bioinformatics/bty908] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 10/28/2018] [Accepted: 11/04/2018] [Indexed: 12/11/2022] Open
Affiliation(s)
- Ruiqing Zheng
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Xiang Chen
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- School of Information Science and Engineering, Central South University, Changsha, China
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
| | - Yi Pan
- School of Information Science and Engineering, Central South University, Changsha, China
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
17
|
Liu ZP. Towards precise reconstruction of gene regulatory networks by data integration. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0139-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
18
|
Kang Y, Liow HH, Maier EJ, Brent MR. NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources. Bioinformatics 2017; 34:249-257. [PMID: 28968736 PMCID: PMC5860202 DOI: 10.1093/bioinformatics/btx563] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 03/14/2017] [Accepted: 09/11/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. Results We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. Availability and implementation Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiming Kang
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Hien-Haw Liow
- Department of Mathematics, Washington University, Saint Louis, MO, USA
| | - Ezekiel J Maier
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| |
Collapse
|