1
|
Cahill R, Wang Y, Xian RP, Lee AJ, Zeng H, Yu B, Tasic B, Abbasi-Asl R. Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology. Proc Natl Acad Sci U S A 2024; 121:e2319804121. [PMID: 39226356 DOI: 10.1073/pnas.2319804121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 07/24/2024] [Indexed: 09/05/2024] Open
Abstract
The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.
Collapse
Affiliation(s)
- Robert Cahill
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
| | - Yu Wang
- Department of Statistics, University of California, Berkeley, CA 94720
| | - R Patrick Xian
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
| | - Alex J Lee
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
| | - Hongkui Zeng
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Bin Yu
- Department of Statistics, University of California, Berkeley, CA 94720
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720
| | | | - Reza Abbasi-Asl
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143
| |
Collapse
|
2
|
Cui W, Long Q, Xiao M, Wang X, Feng G, Li X, Wang P, Zhou Y. Refining computational inference of gene regulatory networks: integrating knockout data within a multi-task framework. Brief Bioinform 2024; 25:bbae361. [PMID: 39082651 PMCID: PMC11289685 DOI: 10.1093/bib/bbae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/09/2024] [Accepted: 07/16/2024] [Indexed: 08/03/2024] Open
Abstract
Constructing accurate gene regulatory network s (GRNs), which reflect the dynamic governing process between genes, is critical to understanding the diverse cellular process and unveiling the complexities in biological systems. With the development of computer sciences, computational-based approaches have been applied to the GRNs inference task. However, current methodologies face challenges in effectively utilizing existing topological information and prior knowledge of gene regulatory relationships, hindering the comprehensive understanding and accurate reconstruction of GRNs. In response, we propose a novel graph neural network (GNN)-based Multi-Task Learning framework for GRN reconstruction, namely MTLGRN. Specifically, we first encode the gene promoter sequences and the gene biological features and concatenate the corresponding feature representations. Then, we construct a multi-task learning framework including GRN reconstruction, Gene knockout predict, and Gene expression matrix reconstruction. With joint training, MTLGRN can optimize the gene latent representations by integrating gene knockout information, promoter characteristics, and other biological attributes. Extensive experimental results demonstrate superior performance compared with state-of-the-art baselines on the GRN reconstruction task, efficiently leveraging biological knowledge and comprehensively understanding the gene regulatory relationships. MTLGRN also pioneered attempts to simulate gene knockouts on bulk data by incorporating gene knockout information.
Collapse
Affiliation(s)
- Wentao Cui
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
| | - Meng Xiao
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Xuezhi Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Guihai Feng
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Xin Li
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Pengfei Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Yuanchun Zhou
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| |
Collapse
|
3
|
Wu S, Jin K, Tang M, Xia Y, Gao W. Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. Interdiscip Sci 2024; 16:318-332. [PMID: 38342857 DOI: 10.1007/s12539-024-00604-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/26/2023] [Accepted: 01/03/2024] [Indexed: 02/13/2024]
Abstract
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
Collapse
Affiliation(s)
- Songyang Wu
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Kui Jin
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Mingjing Tang
- School of Life Science, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming, 650500, China.
| | - Yuelong Xia
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Wei Gao
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| |
Collapse
|
4
|
Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024; 27:109352. [PMID: 38510148 PMCID: PMC10951644 DOI: 10.1016/j.isci.2024.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/29/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.
Collapse
Affiliation(s)
- Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
5
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. NPJ Syst Biol Appl 2024; 10:35. [PMID: 38565850 PMCID: PMC10987498 DOI: 10.1038/s41540-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Gene regulatory mechanisms (GRMs) control the formation of spatial and temporal expression patterns that can serve as regulatory signals for the development of complex shapes. Synthetic developmental biology aims to engineer such genetic circuits for understanding and producing desired multicellular spatial patterns. However, designing synthetic GRMs for complex, multi-dimensional spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given two-dimensional spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target spatial pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover complex genetic circuits producing spatial patterns.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA.
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, Baltimore, Baltimore, MD, USA.
| |
Collapse
|
6
|
Huang Y, Yu G, Yang Y. MIGGRI: A multi-instance graph neural network model for inferring gene regulatory networks for Drosophila from spatial expression images. PLoS Comput Biol 2023; 19:e1011623. [PMID: 37939200 PMCID: PMC10659162 DOI: 10.1371/journal.pcbi.1011623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 11/20/2023] [Accepted: 10/22/2023] [Indexed: 11/10/2023] Open
Abstract
Recent breakthrough in spatial transcriptomics has brought great opportunities for exploring gene regulatory networks (GRNs) from a brand-new perspective. Especially, the local expression patterns and spatio-temporal regulation mechanisms captured by spatial expression images allow more delicate delineation of the interplay between transcript factors and their target genes. However, the complexity and size of spatial image collections pose significant challenges to GRN inference using image-based methods. Extracting regulatory information from expression images is difficult due to the lack of supervision and the multi-instance nature of the problem, where a gene often corresponds to multiple images captured from different views. While graph models, particularly graph neural networks, have emerged as a promising method for leveraging underlying structure information from known GRNs, incorporating expression images into graphs is not straightforward. To address these challenges, we propose a two-stage approach, MIGGRI, for capturing comprehensive regulatory patterns from image collections for each gene and known interactions. Our approach involves a multi-instance graph neural network (GNN) model for GRN inference, which first extracts gene regulatory features from spatial expression images via contrastive learning, and then feeds them to a multi-instance GNN for semi-supervised learning. We apply our approach to a large set of Drosophila embryonic spatial gene expression images. MIGGRI achieves outstanding performance in the inference of GRNs for early eye development and mesoderm development of Drosophila, and shows robustness in the scenarios of missing image information. Additionally, we perform interpretable analysis on image reconstruction and functional subgraphs that may reveal potential pathways or coordinate regulations. By leveraging the power of graph neural networks and the information contained in spatial expression images, our approach has the potential to advance our understanding of gene regulation in complex biological systems.
Collapse
Affiliation(s)
- Yuyang Huang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Gufeng Yu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| |
Collapse
|
7
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
8
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550573. [PMID: 37546866 PMCID: PMC10402059 DOI: 10.1101/2023.07.26.550573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Synthetic developmental biology aims to engineer gene regulatory mechanisms (GRMs) for understanding and producing desired multicellular patterns and shapes. However, designing GRMs for spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover pattern-producing genetic circuits.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, School of Medicine, 22 S. Greene Street, Baltimore, MD 21201, USA
| |
Collapse
|
9
|
Wu YH, Huang YA, Li JQ, You ZH, Hu PW, Hu L, Leung VCM, Du ZH. Knowledge graph embedding for profiling the interaction between transcription factors and their target genes. PLoS Comput Biol 2023; 19:e1011207. [PMID: 37339154 PMCID: PMC10313080 DOI: 10.1371/journal.pcbi.1011207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 06/30/2023] [Accepted: 05/23/2023] [Indexed: 06/22/2023] Open
Abstract
Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.
Collapse
Affiliation(s)
- Yang-Han Wu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Yu-An Huang
- School of Computer Science, Northwesterm Polytechnical University, Xi’an, Shaanxi, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Zhu-Hong You
- School of Computer Science, Northwesterm Polytechnical University, Xi’an, Shaanxi, China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Victor C. M. Leung
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Zhi-Hua Du
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| |
Collapse
|
10
|
Fang Z, Ford AJ, Hu T, Zhang N, Mantalaris A, Coskun AF. Subcellular spatially resolved gene neighborhood networks in single cells. CELL REPORTS METHODS 2023; 3:100476. [PMID: 37323566 PMCID: PMC10261906 DOI: 10.1016/j.crmeth.2023.100476] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 02/18/2023] [Accepted: 04/18/2023] [Indexed: 06/17/2023]
Abstract
Image-based spatial omics methods such as fluorescence in situ hybridization (FISH) generate molecular profiles of single cells at single-molecule resolution. Current spatial transcriptomics methods focus on the distribution of single genes. However, the spatial proximity of RNA transcripts can play an important role in cellular function. We demonstrate a spatially resolved gene neighborhood network (spaGNN) pipeline for the analysis of subcellular gene proximity relationships. In spaGNN, machine-learning-based clustering of subcellular spatial transcriptomics data yields subcellular density classes of multiplexed transcript features. The nearest-neighbor analysis produces heterogeneous gene proximity maps in distinct subcellular regions. We illustrate the cell-type-distinguishing capability of spaGNN using multiplexed error-robust FISH data of fibroblast and U2-OS cells and sequential FISH data of mesenchymal stem cells (MSCs), revealing tissue-source-specific MSC transcriptomics and spatial distribution characteristics. Overall, the spaGNN approach expands the spatial features that can be used for cell-type classification tasks.
Collapse
Affiliation(s)
- Zhou Fang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Machine Learning Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
| | - Adam J. Ford
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Thomas Hu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Nicholas Zhang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
| | - Athanasios Mantalaris
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Ahmet F. Coskun
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
- Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
11
|
A survey on gene expression data analysis using deep learning methods for cancer diagnosis. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2023; 177:1-13. [PMID: 35988771 DOI: 10.1016/j.pbiomolbio.2022.08.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/09/2022] [Accepted: 08/12/2022] [Indexed: 02/07/2023]
Abstract
Gene Expression Data is the biological data to extract meaningful hidden information from the gene dataset. This gene information is used for disease diagnosis especially in cancer treatment based on the variations in gene expression levels. DNA microarray is an efficient method for gene expression classification and prediction of cancer disease for specific types of cancer. Due to the abundance of computing power, deep learning (DL) has become a widespread technique in the healthcare sector. The gene expression dataset has a limited number of samples but a large number of features. Data augmentation is needed for gene expression datasets to overcome the dimensionality problem in gene data. It is a technique to generating the synthetic samples to increase the diversity of data. Deep learning methods are designed to learn and extract the features that come from the raw input data in the form of multidimensional arrays. This paper reviews the existing research in deep learning techniques like Feed Forward Neural Network (FFN), Convolutional Neural Network (CNN), Autoencoder (AE) and Recurrent Neural Network (RNN) for the classification and prediction of cancer disease and its types through gene expression data analysis.
Collapse
|
12
|
Inference of gene regulatory networks based on the Light Gradient Boosting Machine. Comput Biol Chem 2022; 101:107769. [DOI: 10.1016/j.compbiolchem.2022.107769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/12/2022] [Accepted: 09/06/2022] [Indexed: 11/23/2022]
|
13
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
14
|
Tan H, Qiu S, Wang J, Yu G, Guo W, Guo M. Weighted deep factorizing heterogeneous molecular network for genome-phenome association prediction. Methods 2022; 205:18-28. [PMID: 35690250 DOI: 10.1016/j.ymeth.2022.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 05/14/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022] Open
Abstract
Genome-phenome association (GPA) prediction can promote the understanding of biological mechanisms about complex pathology of phenotypes (i.e., traits and diseases). Traditional heterogeneous network-based GPA approaches overwhelmingly need to project heterogeneous data toward homogeneous network for data fusion and prediction, such projections result in the loss of heterogeneous network structure information. Matrix factorization based data fusion can avoid such projection by integrating multi-type data in a coherent way, but they typically perform linear factorization and cannot mine the nonlinear relationships between molecules, which compromise the accuracy of GPA analysis. Furthermore, most of them can not selectively synergy network topology and node attribution information in a principle way. In this paper, we propose a weighted deep matrix factorization based solution (WDGPA) to predict GPAs by selectively and differentially fusing heterogeneous molecular network and diverse attributes of nodes. WDGPA firstly assigns weights to inter/intra-relational data matrices and attribute data matrices, and performs deep matrix factorization on these matrices of heterogeneous network in a cooperative manner to obtain the nonlinear representations of different nodes. In addition, it performs low-rank representation learning on the attribute data with the shared nonlinear representations. In this way, both the network topology and node attributes are jointly mined to explore the representations of molecules and complex interplays between molecules and phenotypes. WDGPA then uses the representational vectors of gene and phenotype nodes to predict GPAs. Experimental results on maize and human datasets confirm that WDGPA outperforms competitive methods by a large margin under different evaluation protocols.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Sichao Qiu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Jun Wang
- Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Guoxian Yu
- Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Wei Guo
- Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| |
Collapse
|
15
|
Du ZH, Wu YH, Huang YA, Chen J, Pan GQ, Hu L, You ZH, Li JQ. GraphTGI: an attention-based graph embedding model for predicting TF-target gene interactions. Brief Bioinform 2022; 23:6576453. [PMID: 35511108 DOI: 10.1093/bib/bbac148] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/25/2022] [Accepted: 03/31/2022] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Interaction between transcription factor (TF) and its target genes establishes the knowledge foundation for biological researches in transcriptional regulation, the number of which is, however, still limited by biological techniques. Existing computational methods relevant to the prediction of TF-target interactions are mostly proposed for predicting binding sites, rather than directly predicting the interactions. To this end, we propose here a graph attention-based autoencoder model to predict TF-target gene interactions using the information of the known TF-target gene interaction network combined with two sequential and chemical gene characters, considering that the unobserved interactions between transcription factors and target genes can be predicted by learning the pattern of the known ones. To the best of our knowledge, the proposed model is the first attempt to solve this problem by learning patterns from the known TF-target gene interaction network. RESULTS In this paper, we formulate the prediction task of TF-target gene interactions as a link prediction problem on a complex knowledge graph and propose a deep learning model called GraphTGI, which is composed of a graph attention-based encoder and a bilinear decoder. We evaluated the prediction performance of the proposed method on a real dataset, and the experimental results show that the proposed model yields outstanding performance with an average AUC value of 0.8864 +/- 0.0057 in the 5-fold cross-validation. It is anticipated that the GraphTGI model can effectively and efficiently predict TF-target gene interactions on a large scale. AVAILABILITY Python code and the datasets used in our studies are made available at https://github.com/YanghanWu/GraphTGI.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yang-Han Wu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jie Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Gui-Qing Pan
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Lun Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| |
Collapse
|
16
|
Huang YA, Pan GQ, Wang J, Li JQ, Chen J, Wu YH. Heterogeneous graph embedding model for predicting interactions between TF and target gene. Bioinformatics 2022; 38:2554-2560. [PMID: 35266510 DOI: 10.1093/bioinformatics/btac148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/13/2022] [Accepted: 03/09/2022] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Identifying the target genes of transcription factors (TFs) is of great significance for biomedical researches. However, using biological experiments to identify TF-target gene interactions is still time consuming, expensive and limited to small scale. Existing computational methods for predicting underlying genes for TF to target is mainly proposed for their binding sites rather than the direct interaction. To bridge this gap, we in this work proposed a deep learning prediction model, named HGETGI, to identify the new TF-target gene interaction. Specifically, the proposed HGETGI model learns the patterns of the known interaction between TF and target gene complemented with their involvement in different human disease mechanisms. It performs prediction based on random walk for meta-path sampling and node embedding in a skip-gram manner. RESULTS We evaluated the prediction performance of the proposed method on a real dataset and the experimental results show that it can achieve the average area under the curve of 0.8519 ± 0.0731 in 5-fold cross validation. Besides, we conducted case studies on the prediction of two important kinds of TF, NFKB1 and TP53. As a result, 33 and 32 in the top-40 ranking lists of NFKB1 and TP53 were successfully confirmed by looking up another public database(hTftarget). It is envisioned that the proposed HGETGI method is feasible and effective for predicting TF-target gene interactions on a large scale. AVAILABILITY AND IMPLEMENTATION The source code and dataset are available at https://github.com/PGTSING/HGETGI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Gui-Qing Pan
- College of Computer Science and Software Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jia Wang
- College of Computer Science and Software Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jie Chen
- College of Computer Science and Software Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yang-Han Wu
- College of Computer Science and Software Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, China
| |
Collapse
|
17
|
Li X, Ma S, Liu J, Tang J, Guo F. Inferring gene regulatory network via fusing gene expression image and RNA-seq data. Bioinformatics 2022; 38:1716-1723. [PMID: 34999771 DOI: 10.1093/bioinformatics/btac008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 12/09/2021] [Accepted: 01/04/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Recently, with the development of high-throughput experimental technology, reconstruction of gene regulatory network (GRN) has ushered in new opportunities and challenges. Some previous methods mainly extract gene expression information based on RNA-seq data, but the associated information is very limited. With the establishment of gene expression image database, it is possible to infer GRN from image data with rich spatial information. RESULTS First, we propose a new convolutional neural network (called SDINet), which can extract gene expression information from images and identify the interaction between genes. SDINet can obtain the detailed information and high-level semantic information from the images well. And it can achieve satisfying performance on image data (Acc: 0.7196, F1: 0.7374). Second, we apply the idea of our SDINet to build an RNA-model, which also achieves good results on RNA-seq data (Acc: 0.8962, F1: 0.8950). Finally, we combine image data and RNA-seq data, and design a new fusion network to explore the potential relationship between them. Experiments show that our proposed network fusing two modalities can obtain satisfying performance (Acc: 0.9116, F1: 0.9118) than any single data. AVAILABILITY AND IMPLEMENTATION Data and code are available from https://github.com/guofei-tju/Combine-Gene-Expression-images-and-RNA-seq-data-For-infering-GRN.
Collapse
Affiliation(s)
- Xuejian Li
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Shiqiang Ma
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Jin Liu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518005, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
18
|
Kazempour A, Kazempoor R. The effect of Lacticaseibacillus casei on inflammatory cytokine (IL-8) gene expression induced by exposure to Shigella sonnei in Zebrafish (Danio rerio). ARQ BRAS MED VET ZOO 2022. [DOI: 10.1590/1678-4162-12513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
ABSTRACT This study aimed to evaluate the protective function of probiotics against Shigella sonnei pathogenicity. For this purpose, 400 zebrafish were divided into four groups with two replications: (T1): receiving Lacticaseibacillus casei for 27 days, (T2): receiving L. casei for 27 days followed by 72 hr exposure to S. sonnei, (T3): receiving basal diet for 27 days followed by 72 hr exposure to S. sonnei, and control group (C): receiving basal diet without exposure to the pathogen. According to the results, feeding with L. casei for 27 days reduced the interleukin-8 (IL-8) expression significantly (P<0.05). The results showed a decrease in IL-8 expression in the group exposed to the pathogen and fed with the probiotic compared to the group only fed with the basal diet (P<0.05). Considering the role of IL-8 as a pro-inflammatory cytokine, our results indicated that feeding with L. casei could modulate inflammatory responses.
Collapse
|
19
|
Inference on the structure of gene regulatory networks. J Theor Biol 2022; 539:111055. [PMID: 35150721 DOI: 10.1016/j.jtbi.2022.111055] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/29/2022] [Accepted: 02/03/2022] [Indexed: 11/20/2022]
Abstract
In this paper, we conduct theoretical analyses on inferring the structure of gene regulatory networks. Depending on the experimental method and data type, the inference problem is classified into 20 different scenarios. For each scenario, we discuss the problem that with enough data, under what assumptions, what can be inferred about the structure. For scenarios that have been covered in the literature, we provide a brief review. For scenarios that have not been covered in literature, if the structure can be inferred, we propose new mathematical inference methods and evaluate them on simulated data. Otherwise, we prove that the structure cannot be inferred.
Collapse
|
20
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
21
|
Grisanti Canozo FJ, Zuo Z, Martin JF, Samee MAH. Cell-type modeling in spatial transcriptomics data elucidates spatially variable colocalization and communication between cell-types in mouse brain. Cell Syst 2022; 13:58-70.e5. [PMID: 34626538 PMCID: PMC8776574 DOI: 10.1016/j.cels.2021.09.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 08/06/2021] [Accepted: 09/10/2021] [Indexed: 01/21/2023]
Abstract
Single-cell spatial transcriptomics (sc-ST) holds the promise to elucidate architectural aspects of complex tissues. Such analyses require modeling cell types in sc-ST datasets through their integration with single-cell RNA-seq datasets. However, this integration, is nontrivial since the two technologies differ widely in the number of profiled genes, and the datasets often do not share many marker genes for given cell types. We developed a neural network model, spatial transcriptomics cell-types assignment using neural networks (STANN), to overcome these challenges. Analysis of STANN's predicted cell types in mouse olfactory bulb (MOB) sc-ST data delineated MOB architecture beyond its morphological layer-based conventional description. We find that cell-type proportions remain consistent within individual morphological layers but vary significantly between layers. Notably, even within a layer, cellular colocalization patterns and intercellular communication mechanisms show high spatial variations. These observations imply a refinement of major cell types into subtypes characterized by spatially localized gene regulatory networks and receptor-ligand usage.
Collapse
Affiliation(s)
| | - Zhen Zuo
- Baylor College of Medicine, Houston, TX 77030, USA
| | - James F Martin
- Baylor College of Medicine, Houston, TX 77030, USA; Texas Heart Institute, Houston, TX 77030, USA
| | | |
Collapse
|
22
|
Monti M, Fiorentino J, Milanetti E, Gosti G, Tartaglia GG. Prediction of Time Series Gene Expression and Structural Analysis of Gene Regulatory Networks Using Recurrent Neural Networks. ENTROPY (BASEL, SWITZERLAND) 2022; 24:141. [PMID: 35205437 PMCID: PMC8871363 DOI: 10.3390/e24020141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/14/2022] [Accepted: 01/15/2022] [Indexed: 11/17/2022]
Abstract
Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural network (RNN) models boosted the interpretability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow one to hierarchically distinguish different architectures of the GRN. We show that the GRN responded differently to the addition of noise in the prediction by the RNN and we related the noise response to the analysis of the attention mechanism. In conclusion, this work provides a way to understand and exploit the attention mechanism of RNNs and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.
Collapse
Affiliation(s)
- Michele Monti
- RNA System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genoa, Italy
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Jonathan Fiorentino
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
| | - Edoardo Milanetti
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
- Department of Physics, Sapienza University of Rome, 00185 Rome, Italy
| | - Giorgio Gosti
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
- Department of Physics, Sapienza University of Rome, 00185 Rome, Italy
| | - Gian Gaetano Tartaglia
- RNA System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genoa, Italy
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Center for Life Nano- & Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy; (J.F.); (E.M.); (G.G.)
- Department of Biology and Biotechnology Charles Darwin, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
23
|
Zheng L, Liu Z, Yang Y, Shen HB. Accurate inference of gene regulatory interactions from spatial gene expression with deep contrastive learning. Bioinformatics 2022; 38:746-753. [PMID: 34664632 DOI: 10.1093/bioinformatics/btab718] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 09/19/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Reverse engineering of gene regulatory networks (GRNs) has long been an attractive research topic in system biology. Computational prediction of gene regulatory interactions has remained a challenging problem due to the complexity of gene expression and scarce information resources. The high-throughput spatial gene expression data, like in situ hybridization images that exhibit temporal and spatial expression patterns, has provided abundant and reliable information for the inference of GRNs. However, computational tools for analyzing the spatial gene expression data are highly underdeveloped. RESULTS In this study, we develop a new method for identifying gene regulatory interactions from gene expression images, called ConGRI. The method is featured by a contrastive learning scheme and deep Siamese convolutional neural network architecture, which automatically learns high-level feature embeddings for the expression images and then feeds the embeddings to an artificial neural network to determine whether or not the interaction exists. We apply the method to a Drosophila embryogenesis dataset and identify GRNs of eye development and mesoderm development. Experimental results show that ConGRI outperforms previous traditional and deep learning methods by a large margin, which achieves accuracies of 76.7% and 68.7% for the GRNs of early eye development and mesoderm development, respectively. It also reveals some master regulators for Drosophila eye development. AVAILABILITYAND IMPLEMENTATION https://github.com/lugimzheng/ConGRI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lujing Zheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- SJTU Paris Elite Institute of Technology (SPEIT), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhenhuan Liu
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
24
|
Krishnakumar R, Ruffing AM. OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data. PLoS Comput Biol 2022; 18:e1009731. [PMID: 34986143 PMCID: PMC8765615 DOI: 10.1371/journal.pcbi.1009731] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 01/18/2022] [Accepted: 12/07/2021] [Indexed: 11/19/2022] Open
Abstract
Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high recall or high specificity. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded as additional data is generated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility, and adaptability. Bacteria and archaea, single-cell organisms collectively known as prokaryotes, live in all imaginable environments and comprise the majority of living organisms on this planet. Prokaryotes play a critical role in the homeostasis of multicellular organisms (such as animals and plants) and ecosystems. In addition, bacteria can be pathogenic and cause a variety of diseases in these same hosts and ecosystems. In short, understanding the biology and molecular functions of bacteria and archaea and devising mechanisms to engineer and optimize their properties are critical scientific endeavors with significant implications in healthcare, agriculture, manufacturing, and climate science among others. One major molecular difference between unicellular and multicellular organisms is the way they express genes–multicellular organisms make individual RNA molecules for each gene while, prokaryotes express operons (i.e., a group of genes coding functionally related proteins) in contiguous polycistronic RNA molecules. Understanding which genes exist within operons is critical for elucidating basic biology and for engineering organisms. In this work, we use a combination of statistical and machine learning-based methods to use next-generation sequencing data to predict operon structure across a range of prokaryotes. Our method provides an easily implemented, robust, accurate, and flexible way to determine operon structure in an organism-agnostic manner using readily available data.
Collapse
Affiliation(s)
- Raga Krishnakumar
- Systems Biology Department, Sandia National Laboratories, Livermore, California, United States of America
- * E-mail:
| | - Anne M. Ruffing
- Molecular and Microbiology Department, Sandia National Laboratories, Albuquerque, New Mexico, United States of America
| |
Collapse
|
25
|
Farahmand S, Fernandez AI, Ahmed FS, Rimm DL, Chuang JH, Reisenbichler E, Zarringhalam K. Deep learning trained on hematoxylin and eosin tumor region of Interest predicts HER2 status and trastuzumab treatment response in HER2+ breast cancer. Mod Pathol 2022; 35:44-51. [PMID: 34493825 DOI: 10.1038/s41379-021-00911-w] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 08/13/2021] [Accepted: 08/13/2021] [Indexed: 12/19/2022]
Abstract
The current standard of care for many patients with HER2-positive breast cancer is neoadjuvant chemotherapy in combination with anti-HER2 agents, based on HER2 amplification as detected by in situ hybridization (ISH) or protein immunohistochemistry (IHC). However, hematoxylin & eosin (H&E) tumor stains are more commonly available, and accurate prediction of HER2 status and anti-HER2 treatment response from H&E would reduce costs and increase the speed of treatment selection. Computational algorithms for H&E have been effective in predicting a variety of cancer features and clinical outcomes, including moderate success in predicting HER2 status. In this work, we present a novel convolutional neural network (CNN) approach able to predict HER2 status with increased accuracy over prior methods. We trained a CNN classifier on 188 H&E whole slide images (WSIs) manually annotated for tumor Regions of interest (ROIs) by our pathology team. Our classifier achieved an area under the curve (AUC) of 0.90 in cross-validation of slide-level HER2 status and 0.81 on an independent TCGA test set. Within slides, we observed strong agreement between pathologist annotated ROIs and blinded computational predictions of tumor regions / HER2 status. Moreover, we trained our classifier on pre-treatment samples from 187 HER2+ patients that subsequently received trastuzumab therapy. Our classifier achieved an AUC of 0.80 in a five-fold cross validation. Our work provides an H&E-based algorithm that can predict HER2 status and trastuzumab response in breast cancer at an accuracy that may benefit clinical evaluations.
Collapse
Affiliation(s)
- Saman Farahmand
- University of Massachusetts-Boston, Department of Mathematics, Boston, MA, USA.,University of Massachusetts-Boston, Computational Sciences PhD Program, Boston, MA, USA
| | - Aileen I Fernandez
- Yale University, Yale School of Medicine, Department of Pathology, New Haven, CT, USA
| | - Fahad Shabbir Ahmed
- Yale University, Yale School of Medicine, Department of Pathology, New Haven, CT, USA
| | - David L Rimm
- Yale University, Yale School of Medicine, Department of Pathology, New Haven, CT, USA
| | - Jeffrey H Chuang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA. .,UCONN Health, Department of Genetics and Genome Sciences, Farmington, CT, USA.
| | - Emily Reisenbichler
- Yale University, Yale School of Medicine, Department of Pathology, New Haven, CT, USA.
| | - Kourosh Zarringhalam
- University of Massachusetts-Boston, Department of Mathematics, Boston, MA, USA. .,University of Massachusetts-Boston, Computational Sciences PhD Program, Boston, MA, USA.
| |
Collapse
|
26
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
27
|
Novakovsky G, Saraswat M, Fornes O, Mostafavi S, Wasserman WW. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol 2021; 22:280. [PMID: 34579793 PMCID: PMC8474956 DOI: 10.1186/s13059-021-02499-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 09/15/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task. RESULTS We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF. CONCLUSIONS Our results confirm that transfer learning is a powerful technique for TF binding prediction.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
| | - Manu Saraswat
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada.
| | - Sara Mostafavi
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
- Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Canadian Institute for Advanced Research, CIFAR AI Chair, and Child and Brain Development, Toronto, ON, M5G 1 M1, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada.
| |
Collapse
|
28
|
Lee JY, Nguyen B, Orosco C, Styczynski MP. SCOUR: a stepwise machine learning framework for predicting metabolite-dependent regulatory interactions. BMC Bioinformatics 2021; 22:365. [PMID: 34238207 PMCID: PMC8268592 DOI: 10.1186/s12859-021-04281-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 06/30/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms-two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR. RESULTS We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6-27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification. CONCLUSIONS SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.
Collapse
Affiliation(s)
- Justin Y Lee
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Britney Nguyen
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Carlos Orosco
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Mark P Styczynski
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
29
|
Westerman EL, Bowman SEJ, Davidson B, Davis MC, Larson ER, Sanford CPJ. Deploying Big Data to Crack the Genotype to Phenotype Code. Integr Comp Biol 2021; 60:385-396. [PMID: 32492136 DOI: 10.1093/icb/icaa055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Mechanistically connecting genotypes to phenotypes is a longstanding and central mission of biology. Deciphering these connections will unite questions and datasets across all scales from molecules to ecosystems. Although high-throughput sequencing has provided a rich platform on which to launch this effort, tools for deciphering mechanisms further along the genome to phenome pipeline remain limited. Machine learning approaches and other emerging computational tools hold the promise of augmenting human efforts to overcome these obstacles. This vision paper is the result of a Reintegrating Biology Workshop, bringing together the perspectives of integrative and comparative biologists to survey challenges and opportunities in cracking the genotype to phenotype code and thereby generating predictive frameworks across biological scales. Key recommendations include promoting the development of minimum "best practices" for the experimental design and collection of data; fostering sustained and long-term data repositories; promoting programs that recruit, train, and retain a diversity of talent; and providing funding to effectively support these highly cross-disciplinary efforts. We follow this discussion by highlighting a few specific transformative research opportunities that will be advanced by these efforts.
Collapse
Affiliation(s)
- Erica L Westerman
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Sarah E J Bowman
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA.,Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences at the University at Buffalo, Buffalo, NY 14203, USA
| | - Bradley Davidson
- Department of Biology, Swarthmore College, Swarthmore, PA 19081, USA
| | - Marcus C Davis
- Department of Biology, James Madison University, Harrisonburg, VA 22807, USA
| | - Eric R Larson
- Department of Natural Resources and Environmental Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Christopher P J Sanford
- Department of Ecology, Evolution and Organismal Biology, Kennesaw State University, Kennesaw, GA 30144, USA
| |
Collapse
|
30
|
Zhang M, Sheffield T, Zhan X, Li Q, Yang DM, Wang Y, Wang S, Xie Y, Wang T, Xiao G. Spatial molecular profiling: platforms, applications and analysis tools. Brief Bioinform 2021; 22:bbaa145. [PMID: 32770205 PMCID: PMC8138878 DOI: 10.1093/bib/bbaa145] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 05/26/2020] [Accepted: 06/09/2020] [Indexed: 12/24/2022] Open
Abstract
Molecular profiling technologies, such as genome sequencing and proteomics, have transformed biomedical research, but most such technologies require tissue dissociation, which leads to loss of tissue morphology and spatial information. Recent developments in spatial molecular profiling technologies have enabled the comprehensive molecular characterization of cells while keeping their spatial and morphological contexts intact. Molecular profiling data generate deep characterizations of the genetic, transcriptional and proteomic events of cells, while tissue images capture the spatial locations, organizations and interactions of the cells together with their morphology features. These data, together with cell and tissue imaging data, provide unprecedented opportunities to study tissue heterogeneity and cell spatial organization. This review aims to provide an overview of these recent developments in spatial molecular profiling technologies and the corresponding computational methods developed for analyzing such data.
Collapse
Affiliation(s)
- Minzhe Zhang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Thomas Sheffield
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Xiaowei Zhan
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Qiwei Li
- Department of Mathematics Sciences at University of Texas at Dallas
| | - Donghan M Yang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Yunguan Wang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Shidan Wang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Yang Xie
- Quantitative Biomedical Research Center at the University of Texas Southwestern Medical Center
| | - Tao Wang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Guanghua Xiao
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| |
Collapse
|
31
|
Sajid M, Channakesavula CN, Stone SR, Kaur P. Synthetic Biology towards Improved Flavonoid Pharmacokinetics. Biomolecules 2021; 11:biom11050754. [PMID: 34069975 PMCID: PMC8157843 DOI: 10.3390/biom11050754] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 05/13/2021] [Accepted: 05/17/2021] [Indexed: 12/14/2022] Open
Abstract
Flavonoids are a structurally diverse class of natural products that have been found to have a range of beneficial activities in humans. However, the clinical utilisation of these molecules has been limited due to their low solubility, chemical stability, bioavailability and extensive intestinal metabolism in vivo. Recently, the view has been formed that site-specific modification of flavonoids by methylation and/or glycosylation, processes that occur in plants endogenously, can be used to improve and adapt their biophysical and pharmacokinetic properties. The traditional source of flavonoids and their modified forms is from plants and is limited due to the low amounts present in biomass, intrinsic to the nature of secondary metabolite biosynthesis. Access to greater amounts of flavonoids, and understanding of the impact of modifications, requires a rethink in terms of production, more specifically towards the adoption of plant biosynthetic pathways into ex planta synthesis approaches. Advances in synthetic biology and metabolic engineering, aided by protein engineering and machine learning methods, offer attractive and exciting avenues for ex planta flavonoid synthesis. This review seeks to explore the applications of synthetic biology towards the ex planta biosynthesis of flavonoids, and how the natural plant methylation and glycosylation pathways can be harnessed to produce modified flavonoids with more favourable biophysical and pharmacokinetic properties for clinical use. It is envisaged that the development of viable alternative production systems for the synthesis of flavonoids and their methylated and glycosylated forms will help facilitate their greater clinical application.
Collapse
|
32
|
Mousavi R, Konuru SH, Lobo D. Inference of dynamic spatial GRN models with multi-GPU evolutionary computation. Brief Bioinform 2021; 22:6217729. [PMID: 33834216 DOI: 10.1093/bib/bbab104] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 02/15/2021] [Accepted: 03/09/2021] [Indexed: 02/06/2023] Open
Abstract
Reverse engineering mechanistic gene regulatory network (GRN) models with a specific dynamic spatial behavior is an inverse problem without analytical solutions in general. Instead, heuristic machine learning algorithms have been proposed to infer the structure and parameters of a system of equations able to recapitulate a given gene expression pattern. However, these algorithms are computationally intensive as they need to simulate millions of candidate models, which limits their applicability and requires high computational resources. Graphics processing unit (GPU) computing is an affordable alternative for accelerating large-scale scientific computation, yet no method is currently available to exploit GPU technology for the reverse engineering of mechanistic GRNs from spatial phenotypes. Here we present an efficient methodology to parallelize evolutionary algorithms using GPU computing for the inference of mechanistic GRNs that can develop a given gene expression pattern in a multicellular tissue area or cell culture. The proposed approach is based on multi-CPU threads running the lightweight crossover, mutation and selection operators and launching GPU kernels asynchronously. Kernels can run in parallel in a single or multiple GPUs and each kernel simulates and scores the error of a model using the thread parallelism of the GPU. We tested this methodology for the inference of spatiotemporal mechanistic gene regulatory networks (GRNs)-including topology and parameters-that can develop a given 2D gene expression pattern. The results show a 700-fold speedup with respect to a single CPU implementation. This approach can streamline the extraction of knowledge from biological and medical datasets and accelerate the automatic design of GRNs for synthetic biology applications.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Sri Harsha Konuru
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| |
Collapse
|
33
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
34
|
Kwon MS, Lee BT, Lee SY, Kim HU. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr Opin Biotechnol 2020; 65:163-170. [DOI: 10.1016/j.copbio.2020.02.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 12/18/2022]
|
35
|
Buono L, Martinez-Morales JR. Retina Development in Vertebrates: Systems Biology Approaches to Understanding Genetic Programs: On the Contribution of Next-Generation Sequencing Methods to the Characterization of the Regulatory Networks Controlling Vertebrate Eye Development. Bioessays 2020; 42:e1900187. [PMID: 31997389 DOI: 10.1002/bies.201900187] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 01/16/2020] [Indexed: 12/18/2022]
Abstract
The ontogeny of the vertebrate retina has been a topic of interest to developmental biologists and human geneticists for many decades. Understanding the unfolding of the genetic program that transforms a field of progenitors cells into a functionally complex and multi-layered sensory organ is a formidable challenge. Although classical genetic studies succeeded in identifying the key regulators of retina specification, understanding the architecture of their gene network and predicting their behavior are still a distant hope. The emergence of next-generation sequencing platforms revolutionized the field unlocking the access to genome-wide datasets. Emerging techniques such as RNA-seq, ChIP-seq, ATAC-seq, or single cell RNA-seq are used to characterize eye developmental programs. These studies provide valuable information on the transcriptional and cis-regulatory profiles of precursors and differentiated cells, outlining the trajectories that connect each intermediate state. Here, recent systems biology efforts are reviewed to understand the genetic programs shaping the vertebrate retina.
Collapse
Affiliation(s)
- Lorena Buono
- Centro Andaluz de Biología del Desarrollo (CSIC/UPO/JA) , Seville, 41013 , Spain
| | | |
Collapse
|