1
|
Eshun J, Lamar NC, Aksoy SG, Akers S, Garcia B, Cunningham H, Chin G, Bilbrey JA. Identifying Sample Provenance From SEM/EDS Automated Particle Analysis via Few-Shot Learning Coupled With Similarity Graph Clustering. MICROSCOPY AND MICROANALYSIS : THE OFFICIAL JOURNAL OF MICROSCOPY SOCIETY OF AMERICA, MICROBEAM ANALYSIS SOCIETY, MICROSCOPICAL SOCIETY OF CANADA 2024; 30:741-750. [PMID: 39083424 DOI: 10.1093/mam/ozae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/06/2024] [Accepted: 07/04/2024] [Indexed: 08/02/2024]
Abstract
Automated particle analysis (APA) provides a vast amount of compositional data via energy-dispersive X-ray spectroscopy along with size and shape data via scanning electron microscopy for individual particles in a sample. In many instances, APA data are leveraged to support identification of the source of a sample based on the detection of particles of a specific composition. Often, the particles that provide context make up a minuscule portion of the sample. Additionally, the interpretation of complex samples can be difficult due to the diversity of compositions both in the mixture and within a particle. In this work, we demonstrate a method to compute and cluster similarity graphs that describe inter-particle relationships within a sample using a multi-modal few-shot learning neural network. As a proof-of-concept, we show that samples known to have been exposed to gunshot residue can be distinguished from samples occasionally mistaken for gunshot residue. Our workflow builds upon standard APA techniques and data processing methods to unveil additional information in a readily interpretable and quantitatively comparable format.
Collapse
Affiliation(s)
- Jasmine Eshun
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - Natalie C Lamar
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - Sinan G Aksoy
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - Sarah Akers
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - Benjamin Garcia
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - Heather Cunningham
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - George Chin
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| | - Jenna A Bilbrey
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, WA 99352, USA
| |
Collapse
|
2
|
Li H, Han Z, Sun Y, Wang F, Hu P, Gao Y, Bai X, Peng S, Ren C, Xu X, Liu Z, Chen H, Yang Y, Bo X. CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection. Nat Commun 2024; 15:5997. [PMID: 39013885 PMCID: PMC11252405 DOI: 10.1038/s41467-024-50426-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 07/09/2024] [Indexed: 07/18/2024] Open
Abstract
Cancer is rarely the straightforward consequence of an abnormality in a single gene, but rather reflects a complex interplay of many genes, represented as gene modules. Here, we leverage the recent advances of model-agnostic interpretation approach and develop CGMega, an explainable and graph attention-based deep learning framework to perform cancer gene module dissection. CGMega outperforms current approaches in cancer gene prediction, and it provides a promising approach to integrate multi-omics information. We apply CGMega to breast cancer cell line and acute myeloid leukemia (AML) patients, and we uncover the high-order gene module formed by ErbB family and tumor factors NRG1, PPM1A and DLG2. We identify 396 candidate AML genes, and observe the enrichment of either known AML genes or candidate AML genes in a single gene module. We also identify patient-specific AML genes and associated gene modules. Together, these results indicate that CGMega can be used to dissect cancer gene modules, and provide high-order mechanistic insights into cancer development and heterogeneity.
Collapse
Affiliation(s)
- Hao Li
- Academy of Military Medical Sciences, Beijing, China
| | - Zebei Han
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yu Sun
- Academy of Military Medical Sciences, Beijing, China
| | - Fu Wang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Pengzhen Hu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yuang Gao
- Department of Hematology, PLA General Hospital, the Fifth Medical Center, Beijing, China
| | - Xuemei Bai
- Academy of Military Medical Sciences, Beijing, China
| | - Shiyu Peng
- Academy of Military Medical Sciences, Beijing, China
| | - Chao Ren
- Academy of Military Medical Sciences, Beijing, China
| | - Xiang Xu
- Academy of Military Medical Sciences, Beijing, China
| | - Zeyu Liu
- Academy of Military Medical Sciences, Beijing, China
| | - Hebing Chen
- Academy of Military Medical Sciences, Beijing, China.
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China.
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing, China.
| |
Collapse
|
3
|
Liu Y, Li D, Zhang X, Xia S, Qu Y, Ling X, Li Y, Kong X, Zhang L, Cui CP, Li D. A protein sequence-based deep transfer learning framework for identifying human proteome-wide deubiquitinase-substrate interactions. Nat Commun 2024; 15:4519. [PMID: 38806474 PMCID: PMC11133436 DOI: 10.1038/s41467-024-48446-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/26/2024] [Indexed: 05/30/2024] Open
Abstract
Protein ubiquitination regulates a wide range of cellular processes. The degree of protein ubiquitination is determined by the delicate balance between ubiquitin ligase (E3)-mediated ubiquitination and deubiquitinase (DUB)-mediated deubiquitination. In comparison to the E3-substrate interactions, the DUB-substrate interactions (DSIs) remain insufficiently investigated. To address this challenge, we introduce a protein sequence-based ab initio method, TransDSI, which transfers proteome-scale evolutionary information to predict unknown DSIs despite inadequate training datasets. An explainable module is integrated to suggest the critical protein regions for DSIs while predicting DSIs. TransDSI outperforms multiple machine learning strategies against both cross-validation and independent test. Two predicted DUBs (USP11 and USP20) for FOXP3 are validated by "wet lab" experiments, along with two predicted substrates (AR and p53) for USP22. TransDSI provides new functional perspective on proteins by identifying regulatory DSIs, and offers clues for potential tumor drug target discovery and precision drug application.
Collapse
Affiliation(s)
- Yuan Liu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Dianke Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- State Key Laboratory of Farm Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xin Zhang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Simin Xia
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- School of Basic Medical Sciences, Anhui Medical University, Hefei, 230032, China
| | - Yingjie Qu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Xinping Ling
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- College of Life Sciences, Hebei University, Baoding, 071002, China
| | - Yang Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Xiangren Kong
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Lingqiang Zhang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Chun-Ping Cui
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
| | - Dong Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
| |
Collapse
|
4
|
Hirani R, Noruzi K, Khuram H, Hussaini AS, Aifuwa EI, Ely KE, Lewis JM, Gabr AE, Smiley A, Tiwari RK, Etienne M. Artificial Intelligence and Healthcare: A Journey through History, Present Innovations, and Future Possibilities. Life (Basel) 2024; 14:557. [PMID: 38792579 PMCID: PMC11122160 DOI: 10.3390/life14050557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/22/2024] [Accepted: 04/24/2024] [Indexed: 05/26/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a powerful tool in healthcare significantly impacting practices from diagnostics to treatment delivery and patient management. This article examines the progress of AI in healthcare, starting from the field's inception in the 1960s to present-day innovative applications in areas such as precision medicine, robotic surgery, and drug development. In addition, the impact of the COVID-19 pandemic on the acceleration of the use of AI in technologies such as telemedicine and chatbots to enhance accessibility and improve medical education is also explored. Looking forward, the paper speculates on the promising future of AI in healthcare while critically addressing the ethical and societal considerations that accompany the integration of AI technologies. Furthermore, the potential to mitigate health disparities and the ethical implications surrounding data usage and patient privacy are discussed, emphasizing the need for evolving guidelines to govern AI's application in healthcare.
Collapse
Affiliation(s)
- Rahim Hirani
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
- Graduate School of Biomedical Sciences, New York Medical College, Valhalla, NY 10595, USA
| | - Kaleb Noruzi
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
| | - Hassan Khuram
- College of Medicine, Drexel University, Philadelphia, PA 19129, USA
| | - Anum S. Hussaini
- Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Esewi Iyobosa Aifuwa
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
| | - Kencie E. Ely
- Kirk Kerkorian School of Medicine, University of Nevada Las Vegas, Las Vegas, NV 89106, USA
| | - Joshua M. Lewis
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
| | - Ahmed E. Gabr
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
| | - Abbas Smiley
- School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA
| | - Raj K. Tiwari
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
- Graduate School of Biomedical Sciences, New York Medical College, Valhalla, NY 10595, USA
| | - Mill Etienne
- School of Medicine, New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA; (R.H.)
- Department of Neurology, New York Medical College, Valhalla, NY 10595, USA
| |
Collapse
|
5
|
Williams A. Multiomics data integration, limitations, and prospects to reveal the metabolic activity of the coral holobiont. FEMS Microbiol Ecol 2024; 100:fiae058. [PMID: 38653719 PMCID: PMC11067971 DOI: 10.1093/femsec/fiae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/25/2024] [Accepted: 04/22/2024] [Indexed: 04/25/2024] Open
Abstract
Since their radiation in the Middle Triassic period ∼240 million years ago, stony corals have survived past climate fluctuations and five mass extinctions. Their long-term survival underscores the inherent resilience of corals, particularly when considering the nutrient-poor marine environments in which they have thrived. However, coral bleaching has emerged as a global threat to coral survival, requiring rapid advancements in coral research to understand holobiont stress responses and allow for interventions before extensive bleaching occurs. This review encompasses the potential, as well as the limits, of multiomics data applications when applied to the coral holobiont. Synopses for how different omics tools have been applied to date and their current restrictions are discussed, in addition to ways these restrictions may be overcome, such as recruiting new technology to studies, utilizing novel bioinformatics approaches, and generally integrating omics data. Lastly, this review presents considerations for the design of holobiont multiomics studies to support lab-to-field advancements of coral stress marker monitoring systems. Although much of the bleaching mechanism has eluded investigation to date, multiomic studies have already produced key findings regarding the holobiont's stress response, and have the potential to advance the field further.
Collapse
Affiliation(s)
- Amanda Williams
- Microbial Biology Graduate Program, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
| |
Collapse
|
6
|
Quatela AS, Cangren P, Jafari F, Michel T, de Boer HJ, Oxelman B. Retrieval of long DNA reads from herbarium specimens. AOB PLANTS 2023; 15:plad074. [PMID: 38130422 PMCID: PMC10735254 DOI: 10.1093/aobpla/plad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 11/06/2023] [Indexed: 12/23/2023]
Abstract
High-throughput sequencing of herbarium specimens' DNA with short-read platforms has helped explore many biological questions. Here, for the first time, we investigate the potential of using herbarium specimens as a resource for long-read DNA sequencing technologies. We use target capture of 48 low-copy nuclear loci in 12 herbarium specimens of Silene as a basis for long-read sequencing using SMRT PacBio Sequel. The samples were collected between 1932 and 2019. A simple optimization of size selection protocol enabled the retrieval of both long DNA fragments (>1 kb) and long on-target reads for nine of them. The limited sampling size does not enable statistical evaluation of the influence of specimen age to the DNA fragmentation, but our results confirm that younger samples, that is, collected after 1990, are less fragmented and have better sequencing success than specimens collected before this date. Specimens collected between 1990 and 2019 yield between 167 and 3403 on-target reads > 1 kb. They enabled recovering between 34 loci and 48 (i.e. all loci recovered). Three samples from specimens collected before 1990 did not yield on-target reads > 1 kb. The four other samples collected before this date yielded up to 144 reads and recovered up to 25 loci. Young herbarium specimens seem promising for long-read sequencing. However, older ones have partly failed. Further exploration would be necessary to statistically test and understand the potential of older material in the quest for long reads. We would encourage greatly expanding the sampling size and comparing different taxonomic groups.
Collapse
Affiliation(s)
- Anne-Sophie Quatela
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 463, 405 30, Gothenburg, Sweden
- Gothenburg Global Biodiversity Center, Gothenburg, Box 463, 405 30, Sweden
| | - Patrik Cangren
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 463, 405 30, Gothenburg, Sweden
| | - Farzaneh Jafari
- Department of Biology, Faculty of Basic Sciences, Lorestan University, P.O. BOX 6815144316, Khorramabad, Iran
- Department of Plant Science, Center of Excellence in Phylogeny of Living Organisms, School of Biology, College of Science, University of Tehran, P.O. Box 14155-6455, Tehran, Iran
| | - Thibauld Michel
- Tropical Diversity Research Department, Royal Botanic Garden of Edinburgh, 20A Inverleith Row, Edinburgh, EH3 5LRUK
| | - Hugo J de Boer
- Natural History Museum, University of Oslo, P.O. Box 1172 Blindern, 0318 Oslo, Norway
| | - Bengt Oxelman
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 463, 405 30, Gothenburg, Sweden
- Gothenburg Global Biodiversity Center, Gothenburg, Box 463, 405 30, Sweden
| |
Collapse
|
7
|
Georgouli K, Yeom JS, Blake RC, Navid A. Multi-scale models of whole cells: progress and challenges. Front Cell Dev Biol 2023; 11:1260507. [PMID: 38020904 PMCID: PMC10661945 DOI: 10.3389/fcell.2023.1260507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open
Abstract
Whole-cell modeling is "the ultimate goal" of computational systems biology and "a grand challenge for 21st century" (Tomita, Trends in Biotechnology, 2001, 19(6), 205-10). These complex, highly detailed models account for the activity of every molecule in a cell and serve as comprehensive knowledgebases for the modeled system. Their scope and utility far surpass those of other systems models. In fact, whole-cell models (WCMs) are an amalgam of several types of "system" models. The models are simulated using a hybrid modeling method where the appropriate mathematical methods for each biological process are used to simulate their behavior. Given the complexity of the models, the process of developing and curating these models is labor-intensive and to date only a handful of these models have been developed. While whole-cell models provide valuable and novel biological insights, and to date have identified some novel biological phenomena, their most important contribution has been to highlight the discrepancy between available data and observations that are used for the parametrization and validation of complex biological models. Another realization has been that current whole-cell modeling simulators are slow and to run models that mimic more complex (e.g., multi-cellular) biosystems, those need to be executed in an accelerated fashion on high-performance computing platforms. In this manuscript, we review the progress of whole-cell modeling to date and discuss some of the ways that they can be improved.
Collapse
Affiliation(s)
- Konstantia Georgouli
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Jae-Seung Yeom
- Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Robert C. Blake
- Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Ali Navid
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| |
Collapse
|
8
|
Mattei G, Gan Z, Ramazzotti M, Palsson BO, Zielinski DC. Differential Expression Analysis Utilizing Condition-Specific Metabolic Pathways. Metabolites 2023; 13:1127. [PMID: 37999223 PMCID: PMC10672963 DOI: 10.3390/metabo13111127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/19/2023] [Accepted: 11/01/2023] [Indexed: 11/25/2023] Open
Abstract
Pathway analysis is ubiquitous in biological data analysis due to the ability to integrate small simultaneous changes in functionally related components. While pathways are often defined based on either manual curation or network topological properties, an attractive alternative is to generate pathways around specific functions, in which metabolism can be defined as the production and consumption of specific metabolites. In this work, we present an algorithm, termed MetPath, that calculates pathways for condition-specific production and consumption of specific metabolites. We demonstrate that these pathways have several useful properties. Pathways calculated in this manner (1) take into account the condition-specific metabolic role of a gene product, (2) are localized around defined metabolic functions, and (3) quantitatively weigh the importance of expression to a function based on the flux contribution of the gene product. We demonstrate how these pathways elucidate network interactions between genes across different growth conditions and between cell types. Furthermore, the calculated pathways compare favorably to manually curated pathways in predicting the expression correlation between genes. To facilitate the use of these pathways, we have generated a large compendium of pathways under different growth conditions for E. coli. The MetPath algorithm provides a useful tool for metabolic network-based statistical analyses of high-throughput data.
Collapse
Affiliation(s)
- Gianluca Mattei
- Department of Experimental and Clinical Biomedical Sciences, University of Florence, 50121 Florence, Italy; (G.M.)
| | - Zhuohui Gan
- School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou 325035, China;
| | - Matteo Ramazzotti
- Department of Experimental and Clinical Biomedical Sciences, University of Florence, 50121 Florence, Italy; (G.M.)
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093-0412, USA
| | - Daniel C. Zielinski
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093-0412, USA
| |
Collapse
|
9
|
Hwang YH, Lee EY, Lim HT, Joo ST. Multi-Omics Approaches to Improve Meat Quality and Taste Characteristics. Food Sci Anim Resour 2023; 43:1067-1086. [PMID: 37969318 PMCID: PMC10636221 DOI: 10.5851/kosfa.2023.e63] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/19/2023] [Accepted: 09/27/2023] [Indexed: 11/17/2023] Open
Abstract
With rapid advances in meat science in recent decades, changes in meat quality during the pre-slaughter phase of muscle growth and the post-slaughter process from muscle to meat have been investigated. Commonly used techniques have evolved from early physicochemical indicators such as meat color, tenderness, water holding capacity, flavor, and pH to various omic tools such as genomics, transcriptomics, proteomics, and metabolomics to explore fundamental molecular mechanisms and screen biomarkers related to meat quality and taste characteristics. This review highlights the application of omics and integrated multi-omics in meat quality and taste characteristics studies. It also discusses challenges and future perspectives of multi-omics technology to improve meat quality and taste. Consequently, multi-omics techniques can elucidate the molecular mechanisms responsible for changes of meat quality at transcriptome, proteome, and metabolome levels. In addition, the application of multi-omics technology has great potential for exploring and identifying biomarkers for meat quality and quality control that can make it easier to optimize production processes in the meat industry.
Collapse
Affiliation(s)
- Young-Hwa Hwang
- Institute of Agriculture & Life
Science, Gyeongsang National University, Jinju 52828,
Korea
| | - Eun-Yeong Lee
- Division of Applied Life Science (BK21
Four), Gyeongsang National University, Jinju 52828,
Korea
| | - Hyen-Tae Lim
- Institute of Agriculture & Life
Science, Gyeongsang National University, Jinju 52828,
Korea
- Division of Animal Science, Gyeongsang
National University, Jinju 52828, Korea
| | - Seon-Tea Joo
- Institute of Agriculture & Life
Science, Gyeongsang National University, Jinju 52828,
Korea
- Division of Applied Life Science (BK21
Four), Gyeongsang National University, Jinju 52828,
Korea
- Division of Animal Science, Gyeongsang
National University, Jinju 52828, Korea
| |
Collapse
|
10
|
Wang W, Meng X, Xiang J, Shuai Y, Bedru HD, Li M. CACO: A Core-Attachment Method With Cross-Species Functional Ortholog Information to Detect Human Protein Complexes. IEEE J Biomed Health Inform 2023; 27:4569-4578. [PMID: 37399160 DOI: 10.1109/jbhi.2023.3289490] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Protein complexes play an essential role in living cells. Detecting protein complexes is crucial to understand protein functions and treat complex diseases. Due to high time and resource consumption of experiment approaches, many computational approaches have been proposed to detect protein complexes. However, most of them are only based on protein-protein interaction (PPI) networks, which heavily suffer from the noise in PPI networks. Therefore, we propose a novel core-attachment method, named CACO, to detect human protein complexes, by integrating the functional information from other species via protein ortholog relations. First, CACO constructs a cross-species ortholog relation matrix and transfers GO terms from other species as a reference to evaluate the confidence of PPIs. Then, a PPI filter strategy is adopted to clean the PPI network and thus a weighted clean PPI network is constructed. Finally, a new effective core-attachment algorithm is proposed to detect protein complexes from the weighted PPI network. Compared to other thirteen state-of-the-art methods, CACO outperforms all of them in terms of F-measure and Composite Score, showing that integrating ortholog information and the proposed core-attachment algorithm are effective in detecting protein complexes.
Collapse
|
11
|
Yin F, Zhao H, Lu S, Shen J, Li M, Mao X, Li F, Shi J, Li J, Dong B, Xue W, Zuo X, Yang X, Fan C. DNA-framework-based multidimensional molecular classifiers for cancer diagnosis. NATURE NANOTECHNOLOGY 2023; 18:677-686. [PMID: 36973399 DOI: 10.1038/s41565-023-01348-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
A molecular classification of diseases that accurately reflects clinical behaviour lays the foundation of precision medicine. The development of in silico classifiers coupled with molecular implementation based on DNA reactions marks a key advance in more powerful molecular classification, but it nevertheless remains a challenge to process multiple molecular datatypes. Here we introduce a DNA-encoded molecular classifier that can physically implement the computational classification of multidimensional molecular clinical data. To produce unified electrochemical sensing signals across heterogeneous molecular binding events, we exploit DNA-framework-based programmable atom-like nanoparticles with n valence to develop valence-encoded signal reporters that enable linearity in translating virtually any biomolecular binding events to signal gains. Multidimensional molecular information in computational classification is thus precisely assigned weights for bioanalysis. We demonstrate the implementation of a molecular classifier based on programmable atom-like nanoparticles to perform biomarker panel screening and analyse a panel of six biomarkers across three-dimensional datatypes for a near-deterministic molecular taxonomy of prostate cancer patients.
Collapse
Affiliation(s)
- Fangfei Yin
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Haipei Zhao
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Shasha Lu
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
- School of Materials Science and Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Juwen Shen
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Min Li
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xiuhai Mao
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Fan Li
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Jiye Shi
- Division of Physical Biology, CAS Key Laboratory of Interfacial Physics and Technology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | - Jiang Li
- Division of Physical Biology, CAS Key Laboratory of Interfacial Physics and Technology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
- The Interdisciplinary Research Center, Shanghai Synchrotron Radiation Facility, Zhangjiang Laboratory, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, China
| | - Baijun Dong
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Wei Xue
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Xiurong Yang
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, China
| | - Chunhai Fan
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
12
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
13
|
Chen X, Huang L. Computational model for disease research. Brief Bioinform 2023; 24:6987819. [PMID: 36642407 DOI: 10.1093/bib/bbac615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing 100084, China
| |
Collapse
|
14
|
Fischer S, Gillis J. Defining the extent of gene function using ROC curvature. Bioinformatics 2022; 38:5390-5397. [PMID: 36271855 PMCID: PMC9750128 DOI: 10.1093/bioinformatics/btac692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/19/2022] [Accepted: 10/20/2022] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Interactions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect 'ground truth' information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. RESULTS We identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10-50 genes), and tissue-specific secondary markers (100-500 genes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets. AVAILABILITY AND IMPLEMENTATION Code for analyses and figures is available at https://github.com/yexilein/pyroc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stephan Fischer
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY 11724, USA
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris F-75015, France
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY 11724, USA
- Department of Physiology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
15
|
Lin CMA, Cooles FAH, Isaacs JD. Precision medicine: the precision gap in rheumatic disease. Nat Rev Rheumatol 2022; 18:725-733. [PMID: 36216923 DOI: 10.1038/s41584-022-00845-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2022] [Indexed: 11/09/2022]
Abstract
For many oncological conditions, the application of timely and patient-tailored targeted therapies, or precision medicine, is a major therapeutic development that has provided considerable clinical benefit. However, despite the application of increasingly sophisticated technologies, alongside advanced bioinformatic and machine-learning algorithms, this success is yet to be replicated for the rheumatic diseases. In rheumatoid arthritis, for example, despite an array of targeted biologic and conventional therapeutics, treatment choice remains largely based on trial and error. The concept of the 'precision gap' for rheumatic disease can help us to identify factors that underpin the slow progress towards the discovery and adoption of precision-medicine approaches for rheumatic disease. In a rheumatic disease such as rheumatoid arthritis, it is possible to identify four themes that have slowed progress, solutions to which should help to close the precision gap. These themes relate to our fundamental understanding of disease pathogenesis, how we determine treatment response, confounders of treatment outcomes and trial design.
Collapse
Affiliation(s)
- Chung M A Lin
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Faye A H Cooles
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
| | - John D Isaacs
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK. .,Musculoskeletal Unit, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK.
| |
Collapse
|
16
|
Maghsoudi Z, Nguyen H, Tavakkoli A, Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief Bioinform 2022; 23:6761962. [PMID: 36252928 PMCID: PMC9677478 DOI: 10.1093/bib/bbac435] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/26/2022] [Accepted: 09/08/2022] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
Collapse
Affiliation(s)
- Zeynab Maghsoudi
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Ha Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Alireza Tavakkoli
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Tin Nguyen
- Corresponding author: Tin Nguyen, Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA. Tel.: +1-775-784-6619;
| |
Collapse
|
17
|
Wang R, Wang C, Ma H. Detecting protein complexes with multiple properties by an adaptive harmony search algorithm. BMC Bioinformatics 2022; 23:414. [PMID: 36207692 PMCID: PMC9541083 DOI: 10.1186/s12859-022-04923-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 09/12/2022] [Indexed: 11/27/2022] Open
Abstract
Background Accurate identification of protein complexes in protein-protein interaction (PPI) networks is crucial for understanding the principles of cellular organization. Most computational methods ignore the fact that proteins in a protein complex have a functional similarity and are co-localized and co-expressed at the same place and time, respectively. Meanwhile, the parameters of the current methods are specified by users, so these methods cannot effectively deal with different input PPI networks. Result To address these issues, this study proposes a new method called MP-AHSA to detect protein complexes with Multiple Properties (MP), and an Adaptation Harmony Search Algorithm is developed to optimize the parameters of the MP algorithm. First, a weighted PPI network is constructed using functional annotations, and multiple biological properties and the Markov cluster algorithm (MCL) are used to mine protein complex cores. Then, a fitness function is defined, and a protein complex forming strategy is designed to detect attachment proteins and form protein complexes. Next, a protein complex filtering strategy is formulated to filter out the protein complexes. Finally, an adaptation harmony search algorithm is developed to determine the MP algorithm’s parameters automatically. Conclusions Experimental results show that the proposed MP-AHSA method outperforms 14 state-of-the-art methods for identifying protein complexes. Also, the functional enrichment analyses reveal that the protein complexes identified by the MP-AHSA algorithm have significant biological relevance. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04923-4.
Collapse
Affiliation(s)
- Rongquan Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, 100083, China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037, China
| | - Huimin Ma
- School of Computer and Communication Engineering, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, Beijing, 100083, China.
| |
Collapse
|
18
|
Colombelli F, Kowalski TW, Recamonde-Mendoza M. A hybrid ensemble feature selection design for candidate biomarkers discovery from transcriptome profiles. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
19
|
Dursun C, Kwitek AE, Bozdag S. PhenoGeneRanker: Gene and Phenotype Prioritization Using Multiplex Heterogeneous Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2950-2962. [PMID: 34283720 PMCID: PMC9704494 DOI: 10.1109/tcbb.2021.3098278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Uncovering genotype-phenotype relationships is a fundamental challenge in genomics. Gene prioritization is an important step for this endeavor to make a short manageable list from a list of thousands of genes coming from high-throughput studies. Network propagation methods are promising and state of the art methods for gene prioritization based on the premise that functionally related genes tend to be close to each other in the biological networks. Recently, we introduced PhenoGeneRanker, a network-propagation algorithm for multiplex heterogeneous networks. PhenoGeneRanker allows multi-layer gene and phenotype networks. It also calculates empirical p values of gene and phenotype ranks using random stratified sampling of seeds of genes and phenotypes based on their connectivity degree in the network. In this study, we introduce the PhenoGeneRanker Bioconductor package and its application to multi-omics rat genome datasets to rank hypertension disease-related genes and strains. We showed that PhenoGeneRanker performed better to rank hypertension disease-related genes using multiplex gene networks than aggregated gene networks. We also showed that PhenoGeneRanker performed better to rank hypertension disease-related strains using multiplex phenotype network than single or aggregated phenotype networks. We performed a rigorous hyperparameter analysis and, finally showed that Gene Ontology (GO) enrichment of statistically significant top-ranked genes resulted in hypertension disease-related GO terms.
Collapse
|
20
|
Li W, Zhang H, Li M, Han M, Yin Y. MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN. Brief Bioinform 2022; 23:6659744. [PMID: 35947989 DOI: 10.1093/bib/bbac333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/02/2022] [Accepted: 07/21/2022] [Indexed: 11/14/2022] Open
Abstract
In recent years, a number of computational approaches have been proposed to effectively integrate multiple heterogeneous biological networks, and have shown impressive performance for inferring gene function. However, the previous methods do not fully represent the critical neighborhood relationship between genes during the feature learning process. Furthermore, it is difficult to accurately estimate the contributions of different views for multi-view integration. In this paper, we propose MGEGFP, a multi-view graph embedding method based on adaptive estimation with Graph Convolutional Network (GCN), to learn high-quality gene representations among multiple interaction networks for function prediction. First, we design a dual-channel GCN encoder to disentangle the view-specific information and the consensus pattern across diverse networks. By the aid of disentangled representations, we develop a multi-gate module to adaptively estimate the contributions of different views during each reconstruction process and make full use of the multiplexity advantages, where a diversity preservation constraint is designed to prevent the over-fitting problem. To validate the effectiveness of our model, we conduct experiments on networks from the STRING database for both yeast and human datasets, and compare the performance with seven state-of-the-art methods in five evaluation metrics. Moreover, the ablation study manifests the important contribution of the designed dual-channel encoder, multi-gate module and the diversity preservation constraint in MGEGFP. The experimental results confirm the superiority of our proposed method and suggest that MGEGFP can be a useful tool for gene function prediction.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Minghe Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Mingjing Han
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, 1400 R Street, 68588, Nebraska, USA
| |
Collapse
|
21
|
Du L, Liu C, Wei R, Chen J. Uncertainty-aware dynamic integration for multi-omics classification of tumors. J Cancer Res Clin Oncol 2022:10.1007/s00432-022-04219-3. [PMID: 35925427 DOI: 10.1007/s00432-022-04219-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 07/18/2022] [Indexed: 12/12/2022]
Abstract
PURPOSE Omics data are crucial for medical diagnosis as it contains intrinsic biomedical information. Multi-omics integrated analysis has become a new direction for scientists to explore life mechanisms. Nevertheless, the quality of complex omics data often varies greatly due to different samples or even different omics types, it is challenging to dynamically capture the uncertainty for different kinds of omics data. METHODS This paper proposes a uncertainty-aware dynamic integration framework for multi-omics classification. The framework consists of three modules: deep embedding, confidence prediction, and downstream tasks. The deep embedding module extract key information from multi-omics data to obtain a low-dimensional feature representation which is used to train downstream tasks. Combined with the deep embedding module, the confidence prediction module is used to dynamically capture the uncertainty of the data. We introduce "confidNet" to assign a confidence value for each type of omics data, which is used for dynamic integration between multi-omics. RESULTS Compared with other integration methods, the proposed method can contain more crucial biomedical information in the obtained low-dimensional representation. Our framework realizes reliable integration among multiple omics, and it can still achieve high accuracy on small sample data sets. We have verified the effectiveness of the model in a large number of experiments. CONCLUSION Our framework can be widely applied to high-dimensional omics data and has great potential to facilitate medical decision-making and biological analysis.
Collapse
Affiliation(s)
- Ling Du
- School of Software, TianGong University, Tianjin, China.
| | - Chaoyi Liu
- School of Software, TianGong University, Tianjin, China
| | - Ran Wei
- School of Life Sciences, TianGongUniversity, Tianjin, China
| | - Jinmiao Chen
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 1386481, Singapore, Singapore
- Immunology Translational Research Program, Yong Loo Lin School of Medicine, Department of Microbiology and Immunology, National University of Singapore(NUS), 117545, Singapore, Singapore
| |
Collapse
|
22
|
Liang B, Sun G, Zhang X, Nie Q, Zhao Y, Yang J. Recent Advances, Challenges and Metabolic Engineering Strategies in the Biosynthesis of 3-Hydroxypropionic Acid. Biotechnol Bioeng 2022; 119:2639-2668. [PMID: 35781640 DOI: 10.1002/bit.28170] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/26/2022] [Accepted: 06/29/2022] [Indexed: 11/07/2022]
Abstract
As an attractive and valuable platform chemical, 3-hydroxypropionic acid (3-HP) can be used to produce a variety of industrially important commodity chemicals and biodegradable polymers. Moreover, the biosynthesis of 3-HP has drawn much attention in recent years due to its sustainability and environmental friendliness. Here, we focus on recent advances, challenges and metabolic engineering strategies in the biosynthesis of 3-HP. While glucose and glycerol are major carbon sources for its production of 3-HP via microbial fermentation, other carbon sources have also been explored. To increase yield and titer, synthetic biology and metabolic engineering strategies have been explored, including modifying pathway enzymes, eliminating flux blockages due to byproduct synthesis, eliminating toxic byproducts, and optimizing via genome-scale models. This review also provides insights on future directions for 3-HP biosynthesis. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Bo Liang
- Energy-rich Compounds Production by Photosynthetic Carbon Fixation Research Center, Qingdao Agricultural University, Qingdao, China.,Shandong Key Lab of Applied Mycology, College of Life Sciences, Qingdao Agricultural University, Qingdao, China
| | - Guannan Sun
- Energy-rich Compounds Production by Photosynthetic Carbon Fixation Research Center, Qingdao Agricultural University, Qingdao, China.,Shandong Key Lab of Applied Mycology, College of Life Sciences, Qingdao Agricultural University, Qingdao, China
| | - Xinping Zhang
- Energy-rich Compounds Production by Photosynthetic Carbon Fixation Research Center, Qingdao Agricultural University, Qingdao, China.,Shandong Key Lab of Applied Mycology, College of Life Sciences, Qingdao Agricultural University, Qingdao, China
| | - Qingjuan Nie
- Foreign Languages School, Qingdao Agricultural University, Qingdao, China
| | - Yukun Zhao
- Pony Testing International Group, Qingdao, China
| | - Jianming Yang
- Energy-rich Compounds Production by Photosynthetic Carbon Fixation Research Center, Qingdao Agricultural University, Qingdao, China.,Shandong Key Lab of Applied Mycology, College of Life Sciences, Qingdao Agricultural University, Qingdao, China
| |
Collapse
|
23
|
A novel liver cancer diagnosis method based on patient similarity network and DenseGCN. Sci Rep 2022; 12:6797. [PMID: 35474072 PMCID: PMC9043215 DOI: 10.1038/s41598-022-10441-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 04/05/2022] [Indexed: 11/17/2022] Open
Abstract
Liver cancer is the main malignancy in terms of mortality rate, accurate diagnosis can help the treatment outcome of liver cancer. Patient similarity network is an important information which helps in cancer diagnosis. However, recent works rarely take patient similarity into consideration. To address this issue, we constructed patient similarity network using three liver cancer omics data, and proposed a novel liver cancer diagnosis method consisted of similarity network fusion, denoising autoencoder and dense graph convolutional neural network to capitalize on patient similarity network and multi omics data. We compared our proposed method with other state-of-the-art methods and machine learning methods on TCGA-LIHC dataset to evaluate its performance. The results confirmed that our proposed method surpasses these comparison methods in terms of all the metrics. Especially, our proposed method has attained an accuracy up to 0.9857.
Collapse
|
24
|
Rahman MA, Tutul AA, Abdullah SM, Bayzid MS. CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments. PLoS One 2022; 17:e0265360. [PMID: 35436292 PMCID: PMC9015123 DOI: 10.1371/journal.pone.0265360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open
Abstract
Background
High-throughput experimental technologies are generating tremendous amounts of genomic data, offering valuable resources to answer important questions and extract biological insights. Storing this sheer amount of genomic data has become a major concern in bioinformatics. General purpose compression techniques (e.g. gzip, bzip2, 7-zip) are being widely used due to their pervasiveness and relatively good speed. However, they are not customized for genomic data and may fail to leverage special characteristics and redundancy of the biomolecular sequences.
Results
We present a new lossless compression method CHAPAO (COmpressing Alignments using Hierarchical and Probabilistic Approach), which is especially designed for multiple sequence alignments (MSAs) of biomolecular data and offers very good compression gain. We have introduced a novel hierarchical referencing technique to represent biomolecular sequences which combines likelihood based analyses of the sequence similarities and graph theoretic algorithms. We performed an extensive evaluation study using a collection of real biological data from the avian phylogenomics project, 1000 plants project (1KP), and 16S and 23S rRNA datasets. We report the performance of CHAPAO in comparison with general purpose compression techniques as well as with MFCompress and Nucleotide Archival Format (NAF)—two of the best known methods especially designed for FASTA files. Experimental results suggest that CHAPAO offers significant improvements in compression gain over most other alternative methods. CHAPAO is freely available as an open source software at https://github.com/ashiq24/CHAPAO.
Conclusion
CHAPAO advances the state-of-the-art in compression algorithms and represents a potential alternative to the general purpose compression techniques as well as to the existing specialized compression techniques for biomolecular sequences.
Collapse
Affiliation(s)
- Md Ashiqur Rahman
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Abdullah Aman Tutul
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Sifat Muhammad Abdullah
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
25
|
Zenere A, Rundquist O, Gustafsson M, Altafini C. Multi-omics protein-coding units as massively parallel Bayesian networks: empirical validation of causality structure. iScience 2022; 25:104048. [PMID: 35355520 PMCID: PMC8958332 DOI: 10.1016/j.isci.2022.104048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/17/2022] [Accepted: 03/08/2022] [Indexed: 11/29/2022] Open
Abstract
In this article we use high-throughput epigenomics, transcriptomics, and proteomics data to construct fine-graded models of the “protein-coding units” gathering all transcript isoforms and chromatin accessibility peaks associated with more than 4000 genes in humans. Each protein-coding unit has the structure of a directed acyclic graph (DAG) and can be represented as a Bayesian network. The factorization of the joint probability distribution induced by the DAGs imposes a number of conditional independence relationships among the variables forming a protein-coding unit, corresponding to the missing edges in the DAGs. We show that a large fraction of these conditional independencies are indeed verified by the data. Factors driving this verification appear to be the structural and functional annotation of the transcript isoforms, as well as a notion of structural balance (or frustration-free) of the corresponding sample correlation graph, which naturally leads to reduction of correlation (and hence to independence) upon conditioning. Protein coding unit: DAG associated with epigenetic and gene information of a protein DAGs correspond to Bayesian networks Edge absence on a DAG corresponds to conditional independence Multi-omics data (ATAC-seq, RNA-seq and mass-spec) are used for DAG validation
Collapse
|
26
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
27
|
Transcription Factor Activation Profiles (TFAP) identify compounds promoting differentiation of Acute Myeloid Leukemia cell lines. Cell Death Dis 2022; 8:16. [PMID: 35013135 PMCID: PMC8748454 DOI: 10.1038/s41420-021-00811-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/22/2021] [Accepted: 12/13/2021] [Indexed: 11/26/2022]
Abstract
Repurposing of drugs for new therapeutic use has received considerable attention for its potential to limit time and cost of drug development. Here we present a new strategy to identify chemicals that are likely to promote a desired phenotype. We used data from the Connectivity Map (CMap) to produce a ranked list of drugs according to their potential to activate transcription factors that mediate myeloid differentiation of leukemic progenitor cells. To validate our strategy, we tested the in vitro differentiation potential of candidate compounds using the HL-60 human cell line as a myeloid differentiation model. Ten out of 22 compounds, which were ranked high in the inferred list, were confirmed to promote significant differentiation of HL-60. These compounds may be considered candidate for differentiation therapy. The method that we have developed is versatile and it can be adapted to different drug repurposing projects.
Collapse
|
28
|
Begum N, Harzandi A, Lee S, Uhlen M, Moyes DL, Shoaie S. Host-mycobiome metabolic interactions in health and disease. Gut Microbes 2022; 14:2121576. [PMID: 36151873 PMCID: PMC9519009 DOI: 10.1080/19490976.2022.2121576] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/31/2022] [Accepted: 08/31/2022] [Indexed: 02/04/2023] Open
Abstract
Fungal communities (mycobiome) have an important role in sustaining the resilience of complex microbial communities and maintenance of homeostasis. The mycobiome remains relatively unexplored compared to the bacteriome despite increasing evidence highlighting their contribution to host-microbiome interactions in health and disease. Despite being a small proportion of the total species, fungi constitute a large proportion of the biomass within the human microbiome and thus serve as a potential target for metabolic reprogramming in pathogenesis and disease mechanism. Metabolites produced by fungi shape host niches, induce immune tolerance and changes in their levels prelude changes associated with metabolic diseases and cancer. Given the complexity of microbial interactions, studying the metabolic interplay of the mycobiome with both host and microbiome is a demanding but crucial task. However, genome-scale modelling and synthetic biology can provide an integrative platform that allows elucidation of the multifaceted interactions between mycobiome, microbiome and host. The inferences gained from understanding mycobiome interplay with other organisms can delineate the key role of the mycobiome in pathophysiology and reveal its role in human disease.
Collapse
Affiliation(s)
- Neelu Begum
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, UK
| | - Azadeh Harzandi
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, UK
| | - Sunjae Lee
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, UK
| | - Mathias Uhlen
- Science for Life Laboratory, KTH–Royal Institute of Technology, Stockholm, Sweden
| | - David L. Moyes
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, UK
| | - Saeed Shoaie
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, UK
- Science for Life Laboratory, KTH–Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
29
|
Ekim B, Berger B, Chikhi R. Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer. Cell Syst 2021; 12:958-968.e6. [PMID: 34525345 PMCID: PMC8562525 DOI: 10.1016/j.cels.2021.08.009] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 08/01/2021] [Accepted: 08/19/2021] [Indexed: 10/20/2022]
Abstract
DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min using 1 GB RAM. In addition, we constructed a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, and successfully search it for anti-microbial resistance (AMR) genes in 12 min. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics, and pangenomics. Code for constructing mdBGs is freely available for download at https://github.com/ekimb/rust-mdbg/.
Collapse
Affiliation(s)
- Barış Ekim
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA.
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France.
| |
Collapse
|
30
|
Artiles O, Saeed F. TurboBC: A Memory Efficient and Scalable GPU Based Betweenness Centrality Algorithm in the Language of Linear Algebra. PROCEEDINGS OF THE ... ICPP WORKSHOPS ON. INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS 2021; 2021:10. [PMID: 35440894 PMCID: PMC9015014 DOI: 10.1145/3458744.3474047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Betweenness centrality (BC) is a shortest path centrality metric used to measure the influence of individual vertices or edges on huge graphs that are used for modeling and analysis of human brain, omics data, or social networks. The application of the BC algorithm to modern graphs must deal with the size of the graphs, as well with highly irregular data-access patterns. These challenges are particularly important when the BC algorithm is implemented on Graphics Processing Units (GPU), due to the limited global memory of these processors, as well as the decrease in performance due to the load unbalance resulting from processing irregular data structures. In this paper, we present the first GPU based linear-algebraic formulation and implementation of BC, called TurboBC, a set of memory efficient BC algorithms that exhibits good performance and high scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. Our experiments demonstrate that our TurboBC algorithms obtain more than 18 GTEPs and an average speedup of 31.9x over the sequential version of the BC algorithm, and are on average 1.7x and 2.2x faster than the state-of-the-art algorithms implemented on the high performance, GPU-based, gunrock, and CPU-based, ligra libraries, respectively. These experiments also show that by minimizing their memory footprint, the TurboBC algorithms are able to compute the BC of relatively big graphs, for which the gunrock algorithms ran out of memory.
Collapse
Affiliation(s)
- Oswaldo Artiles
- School of Computing and Information Sciences, Florida, International University, Miami, Florida, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida, International University, Miami, Florida, USA
| |
Collapse
|
31
|
Seiler E, Mehringer S, Darvish M, Turc E, Reinert K. Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences. iScience 2021; 24:102782. [PMID: 34337360 PMCID: PMC8313605 DOI: 10.1016/j.isci.2021.102782] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 06/07/2021] [Accepted: 06/21/2021] [Indexed: 12/20/2022] Open
Abstract
We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.
Collapse
Affiliation(s)
- Enrico Seiler
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Svenja Mehringer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Mitra Darvish
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
32
|
Cedeño DL, Kelley CA, Chakravarthy K, Vallejo R. Modulation of Glia-Mediated Processes by Spinal Cord Stimulation in Animal Models of Neuropathic Pain. FRONTIERS IN PAIN RESEARCH 2021; 2:702906. [PMID: 35295479 PMCID: PMC8915735 DOI: 10.3389/fpain.2021.702906] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 05/31/2021] [Indexed: 12/23/2022] Open
Abstract
Glial cells play an essential role in maintaining the proper functioning of the nervous system. They are more abundant than neurons in most neural tissues and provide metabolic and catabolic regulation, maintaining the homeostatic balance at the synapse. Chronic pain is generated and sustained by the disruption of glia-mediated processes in the central nervous system resulting in unbalanced neuron–glial interactions. Animal models of neuropathic pain have been used to demonstrate that changes in immune and neuroinflammatory processes occur in the course of pain chronification. Spinal cord stimulation (SCS) is an electrical neuromodulation therapy proven safe and effective for treating intractable chronic pain. Traditional SCS therapies were developed based on the gate control theory of pain and rely on stimulating large Aβ neurons to induce paresthesia in the painful dermatome intended to mask nociceptive input carried out by small sensory neurons. A paradigm shift was introduced with SCS treatments that do not require paresthesia to provide effective pain relief. Efforts to understand the mechanism of action of SCS have considered the role of glial cells and the effect of electrical parameters on neuron–glial interactions. Recent work has provided evidence that SCS affects expression levels of glia-related genes and proteins. This inspired the development of a differential target multiplexed programming (DTMP) approach using electrical signals that can rebalance neuroglial interactions by targeting neurons and glial cells differentially. Our group pioneered the utilization of transcriptomic and proteomic analyses to identify the mechanism of action by which SCS works, emphasizing the DTMP approach. This is an account of evidence demonstrating the effect of SCS on glia-mediated processes using neuropathic pain models, emphasizing studies that rely on the evaluation of large sets of genes and proteins. We show that SCS using a DTMP approach strongly affects the expression of neuron and glia-specific transcriptomes while modulating them toward expression levels of healthy animals. The ability of DTMP to modulate key genes and proteins involved in glia-mediated processes affected by pain toward levels found in uninjured animals demonstrates a shift in the neuron–glial environment promoting analgesia.
Collapse
Affiliation(s)
- David L. Cedeño
- Research and Development, Lumbrera LLC, Bloomington, IL, United States
- Department of Psychology, Illinois Wesleyan University, Bloomington, IL, United States
- *Correspondence: David L. Cedeño
| | - Courtney A. Kelley
- Department of Psychology, Illinois Wesleyan University, Bloomington, IL, United States
| | - Krishnan Chakravarthy
- Deparment of Anesthesiology and Pain Medicine, University of California, San Diego, La Jolla, CA, United States
| | - Ricardo Vallejo
- Research and Development, Lumbrera LLC, Bloomington, IL, United States
- Department of Psychology, Illinois Wesleyan University, Bloomington, IL, United States
- Research Department, National Spine and Pain Center, Bloomington, IL, United States
| |
Collapse
|
33
|
Zhang X, Xing Y, Sun K, Guo Y. OmiEmbed: A Unified Multi-Task Deep Learning Framework for Multi-Omics Data. Cancers (Basel) 2021; 13:3047. [PMID: 34207255 PMCID: PMC8235477 DOI: 10.3390/cancers13123047] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/12/2021] [Accepted: 06/16/2021] [Indexed: 02/06/2023] Open
Abstract
High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called "the curse of dimensionality" in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
| | - Yuting Xing
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
| | - Kai Sun
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
| | - Yike Guo
- Data Science Institute, Imperial College London, London SW7 2AZ, UK; (Y.X.); (K.S.)
- Department of Computer Science, Hong Kong Baptist University, Hong Kong 999077, China
| |
Collapse
|
34
|
Berger B, Waterman MS, Yu YW. Levenshtein Distance, Sequence Comparison and Biological Database Search. IEEE TRANSACTIONS ON INFORMATION THEORY 2021; 67:3287-3294. [PMID: 34257466 PMCID: PMC8274556 DOI: 10.1109/tit.2020.2996543] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Levenshtein edit distance has played a central role-both past and present-in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA, and also with the Department of Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Michael S Waterman
- Quantitative and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | - Yun William Yu
- Department of Mathematics, University of Toronto, Toronto, ON M5S 2E4, Canada, and also with the Department of Computer and Mathematical Sciences, University of Toronto at Scarborough, Toronto, ON M1C 1A4, Canada
| |
Collapse
|
35
|
Tarazona S, Arzalluz-Luque A, Conesa A. Undisclosed, unmet and neglected challenges in multi-omics studies. NATURE COMPUTATIONAL SCIENCE 2021; 1:395-402. [PMID: 38217236 DOI: 10.1038/s43588-021-00086-z] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 01/15/2024]
Abstract
Multi-omics approaches have become a reality in both large genomics projects and small laboratories. However, the multi-omics research community still faces a number of issues that have either not been sufficiently discussed or for which current solutions are still limited. In this Perspective, we elaborate on these limitations and suggest points of attention for future research. We finally discuss new opportunities and challenges brought to the field by the rapid development of single-cell high-throughput molecular technologies.
Collapse
Affiliation(s)
- Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Angeles Arzalluz-Luque
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, Gainesville, FL, USA.
- Genetics Institute, University of Florida, Gainesville, FL, USA.
- Institute for Integrative Systems Biology, Spanish National Research Council, Valencia, Spain.
| |
Collapse
|
36
|
Ghafarpour V, Khansari M, Banaei-Moghaddam AM, Najafi A, Masoudi-Nejad A. DNA methylation association with stage progression of head and neck squamous cell carcinoma. Comput Biol Med 2021; 134:104473. [PMID: 34034219 DOI: 10.1016/j.compbiomed.2021.104473] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 04/27/2021] [Accepted: 05/02/2021] [Indexed: 01/13/2023]
Abstract
Head and Neck Squamous Cell Carcinoma (HNSCC) is the sixth most common cancer worldwide, which accounts for approximately 6% of all cases and is responsible for an estimated 2% of all cancer deaths. Despite progress in the treatment of squamous cell carcinomas, survival rates remain low. It is a fact that epigenetic modifications have numerous associations with biological processes and complex diseases such as cancer. Hence, a more systematic approach is needed to provide potential screening targets and have an effective therapy method. This study developed a workflow to analyze HM450 methylation arrays with mRNA expression profiles that identified novel signatures of epigenetic regulators for tumor progression. We identified differentially expressed genes and differentially methylated regions and the correlation between associated genes to identify epigenetic modifications underlying regulation roles. We have taken the differentiation direction of expressions into account during the integration of gene expression and DNA methylation modification to detect epigenetic regulators of core genes of tumor-stage progression. Enrichment analysis of selected key genes provides better insight into their functionality. Thus, we have investigated gene copy number alteration and mutations to filter differentially expressed genes, including some members of the fibroblast growth factor family and cyclin-dependent kinase inhibitor family with other potential known regulators. Our analysis has revealed the list of 61 commercial methylation probes positively correlated with 31 differentially expressed genes, which can be associated with HNSC metastasis stages. Most of these genes have already reported potential epigenetic regulators, and their role in cancer progression was studied. We suggest these selected probes of DNA methylation as potential targets of the epigenetic regulators in revealed genes that have displayed significant genetic and epigenetic modification behavior during cancer stage progression and tumor metastasis.
Collapse
Affiliation(s)
- Vahid Ghafarpour
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Mohammad Khansari
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
| | - Ali M Banaei-Moghaddam
- Laboratory of Genomics and Epigenomics (LGE), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran. http://lbb.ut.ac.ir/
| |
Collapse
|
37
|
Joshi A, Rienks M, Theofilatos K, Mayr M. Systems biology in cardiovascular disease: a multiomics approach. Nat Rev Cardiol 2021; 18:313-330. [PMID: 33340009 DOI: 10.1038/s41569-020-00477-1] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/02/2020] [Indexed: 12/13/2022]
Abstract
Omics techniques generate large, multidimensional data that are amenable to analysis by new informatics approaches alongside conventional statistical methods. Systems theories, including network analysis and machine learning, are well placed for analysing these data but must be applied with an understanding of the relevant biological and computational theories. Through applying these techniques to omics data, systems biology addresses the problems posed by the complex organization of biological processes. In this Review, we describe the techniques and sources of omics data, outline network theory, and highlight exemplars of novel approaches that combine gene regulatory and co-expression networks, proteomics, metabolomics, lipidomics and phenomics with informatics techniques to provide new insights into cardiovascular disease. The use of systems approaches will become necessary to integrate data from more than one omic technique. Although understanding the interactions between different omics data requires increasingly complex concepts and methods, we argue that hypothesis-driven investigations and independent validation must still accompany these novel systems biology approaches to realize their full potential.
Collapse
Affiliation(s)
- Abhishek Joshi
- King's British Heart Foundation Centre, King's College London, London, UK
- Bart's Heart Centre, St. Bartholomew's Hospital, London, UK
| | - Marieke Rienks
- King's British Heart Foundation Centre, King's College London, London, UK
| | | | - Manuel Mayr
- King's British Heart Foundation Centre, King's College London, London, UK.
| |
Collapse
|
38
|
Rehder C, Bean LJH, Bick D, Chao E, Chung W, Das S, O'Daniel J, Rehm H, Shashi V, Vincent LM. Next-generation sequencing for constitutional variants in the clinical laboratory, 2021 revision: a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2021; 23:1399-1415. [PMID: 33927380 DOI: 10.1038/s41436-021-01139-4] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 02/25/2021] [Accepted: 02/26/2021] [Indexed: 12/17/2022] Open
Abstract
Next-generation sequencing (NGS) technologies are now established in clinical laboratories as a primary testing modality in genomic medicine. These technologies have reduced the cost of large-scale sequencing by several orders of magnitude. It is now cost-effective to analyze an individual with disease-targeted gene panels, exome sequencing, or genome sequencing to assist in the diagnosis of a wide array of clinical scenarios. While clinical validation and use of NGS in many settings is established, there are continuing challenges as technologies and the associated informatics evolve. To assist clinical laboratories with the validation of NGS methods and platforms, the ongoing monitoring of NGS testing to ensure quality results, and the interpretation and reporting of variants found using these technologies, the American College of Medical Genetics and Genomics (ACMG) has developed the following technical standards.
Collapse
Affiliation(s)
| | - Lora J H Bean
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - David Bick
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Elizabeth Chao
- Division of Genetics and Genomics, Department of Pediatrics, University of California, Irvine, CA, USA
| | - Wendy Chung
- Departments of Pediatrics and Medicine, Columbia University, New York, NY, USA
| | - Soma Das
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Julianne O'Daniel
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Heidi Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vandana Shashi
- Department of Pediatrics, Duke University, Durham, NC, USA
| | - Lisa M Vincent
- Division of Pathology & Laboratory Medicine, Children's National Health System, Washington, DC, USA.,Departments of Pathology and Pediatrics, George Washington University, Washington, DC, USA
| | | |
Collapse
|
39
|
Constantino CS, Carvalho AM, Vinga S. Coupling sparse Cox models with clustering of longitudinal transcriptomics data for trauma prognosis. BioData Min 2021; 14:25. [PMID: 33853663 PMCID: PMC8048345 DOI: 10.1186/s13040-021-00257-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 03/29/2021] [Indexed: 11/18/2022] Open
Abstract
Background Longitudinal gene expression analysis and survival modeling have been proved to add valuable biological and clinical knowledge. This study proposes a novel framework to discover gene signatures and patterns in a high-dimensional time series transcriptomics data and to assess their association with hospital length of stay. Methods We investigated a longitudinal and high-dimensional gene expression dataset from 168 blunt-force trauma patients followed during the first 28 days after injury. To model the length of stay, an initial dimensionality reduction step was performed by applying Cox regression with elastic net regularization using gene expression data from the first hospitalization days. Also, a novel methodology to impute missing values to the genes selected previously was proposed. We then applied multivariate time series (MTS) clustering to analyse gene expression over time and to stratify patients with similar trajectories. The validation of the patients’ partitions obtained by MTS clustering was performed using Kaplan-Meier curves and log-rank tests. Results We were able to unravel 22 genes strongly associated with hospital’s discharge. Their expression values in the first days after trauma showed to be good predictors of the length of stay. The proposed mixed imputation method allowed to achieve a complete dataset of short time series with a minimum loss of information for the 28 days of follow-up. MTS clustering enabled to group patients with similar genes trajectories and, notably, with similar discharge days from the hospital. Patients within each cluster have comparable genes’ trajectories and may have an analogous response to injury. Conclusion The proposed framework was able to tackle the joint analysis of time-to-event information with longitudinal multivariate high-dimensional data. The application to length of stay and transcriptomics data revealed a strong relationship between gene expression trajectory and patients’ recovery, which may improve trauma patient’s management by healthcare systems. The proposed methodology can be easily adapted to other medical data, towards more effective clinical decision support systems for health applications.
Collapse
Affiliation(s)
- Cláudia S Constantino
- INESC-ID, Instituto Superior Técnico, ULisboa, R. Alves Redol 9, Lisbon, 1000-029, Portugal
| | - Alexandra M Carvalho
- Instituto de Telecomunicações, Instituto Superior Técnico, ULisboa, Av. Rovisco Pais 1, Lisbon, 1049-001, Portugal
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, ULisboa, R. Alves Redol 9, Lisbon, 1000-029, Portugal. .,IDMEC, Instituto Superior Técnico, ULisboa, Av. Rovisco Pais 1, Lisbon, 1049-001, Portugal.
| |
Collapse
|
40
|
Reyna MA, Chitra U, Elyanow R, Raphael BJ. NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. J Comput Biol 2021; 28:469-484. [PMID: 33400606 DOI: 10.1089/cmb.2020.0435] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Rebecca Elyanow
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Department of Computer Science, Brown University, Providence, Rhode Island, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
41
|
Pal S, Mondal S, Das G, Khatua S, Ghosh Z. Big data in biology: The hope and present-day challenges in it. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100869] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
42
|
Abstract
In this chapter we discuss the past, present and future of clinical biomarker development. We explore the advent of new technologies, paving the way in which health, medicine and disease is understood. This review includes the identification of physicochemical assays, current regulations, the development and reproducibility of clinical trials, as well as, the revolution of omics technologies and state-of-the-art integration and analysis approaches.
Collapse
|
43
|
Sen P, Lamichhane S, Mathema VB, McGlinchey A, Dickens AM, Khoomrung S, Orešič M. Deep learning meets metabolomics: a methodological perspective. Brief Bioinform 2020; 22:1531-1542. [PMID: 32940335 DOI: 10.1093/bib/bbaa204] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/08/2020] [Accepted: 08/10/2020] [Indexed: 12/15/2022] Open
Abstract
Deep learning (DL), an emerging area of investigation in the fields of machine learning and artificial intelligence, has markedly advanced over the past years. DL techniques are being applied to assist medical professionals and researchers in improving clinical diagnosis, disease prediction and drug discovery. It is expected that DL will help to provide actionable knowledge from a variety of 'big data', including metabolomics data. In this review, we discuss the applicability of DL to metabolomics, while presenting and discussing several examples from recent research. We emphasize the use of DL in tackling bottlenecks in metabolomics data acquisition, processing, metabolite identification, as well as in metabolic phenotyping and biomarker discovery. Finally, we discuss how DL is used in genome-scale metabolic modelling and in interpretation of metabolomics data. The DL-based approaches discussed here may assist computational biologists with the integration, prediction and drawing of statistical inference about biological outcomes, based on metabolomics data.
Collapse
Affiliation(s)
- Partho Sen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland.,School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Santosh Lamichhane
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Vivek B Mathema
- Metabolomics and Systems Biology, Department of Biochemistry, and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Aidan McGlinchey
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Alex M Dickens
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.,Center for Innovation in Chemistry (PERCH), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| | - Matej Orešič
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland.,School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| |
Collapse
|
44
|
Vrahatis AG, Kotsireas IS, Vlamos P. Detecting Common Pathways and Key Molecules of Neurodegenerative Diseases from the Topology of Molecular Networks. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2020; 1194:409-421. [PMID: 32468556 DOI: 10.1007/978-3-030-32622-7_38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
MotivationNeurodegenerative diseases (NDs), including amyotrophic lateral sclerosis, Parkinson's disease, Alzheimer's disease, and Huntington's disease, occur as a result of neurodegenerative processes. Thus, it has been increasingly appreciated that many neurodegenerative conditions overlap at multiple levels. However, traditional clinicopathological correlation approaches to better classify a disease have met with limited success. Discovering this overlap offers hope for therapeutic advances that could ameliorate many ND simultaneously. In parallel, in the last decade, systems biology approaches have become a reliable choice in complex disease analysis for gaining more delicate biological insights and have enabled the comprehension of the higher order functions of the biological systems.ResultsToward this orientation, we developed a systems biology approach for the identification of common links and pathways of ND, based on well-established and novel topological and functional measures. For this purpose, a molecular pathway network was constructed, using molecular interactions and relations of four main neurodegenerative diseases (Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and Huntington's disease). Our analysis captured the overlapped subregions forming molecular subpathways fully enriched in these four NDs. Also, it exported molecules that act as bridges, hubs, and key players for neurodegeneration concerning either their topology or their functional role.ConclusionUnderstanding these common links and central topologies under the perspective of systems biology and network theory and greater insights are provided to uncover the complex neurodegeneration processes.
Collapse
Affiliation(s)
| | - Ilias S Kotsireas
- Department of Physics and Computer Science, Wilfrid Laurier University, Waterloo, Canada
| | | |
Collapse
|
45
|
Omics biomarkers for frailty in older adults. Clin Chim Acta 2020; 510:363-372. [PMID: 32745578 DOI: 10.1016/j.cca.2020.07.057] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 07/29/2020] [Accepted: 07/29/2020] [Indexed: 12/14/2022]
Abstract
Frailty is a clinical state characterized by an age-related unsteady state of the body, a decline in physiological function, and an increased vulnerability to adverse outcomes. Early diagnosis of frailty is important for improving the quality of life in older adults and promoting healthy aging. The biological mechanisms underlying frailty have been extensively studied in recent years. Combining assessment tools and biomarkers can facilitate the early diagnosis of frailty. However, there is a lack of stable and reliable frailty-related biomarkers for use in clinical practice. Advances in the multi-omics platforms have provided new information on the molecular mechanisms underlying frailty. Thus, identifying biomarkers using omics-based approaches helps explore the physiological mechanisms underlying frailty, and aids the evaluation of the risk of frailty development and progression. This article reviews the current status of frailty biomarkers from the genomics, transcriptomics, proteomics, and metabolomics perspectives.
Collapse
|
46
|
Akdemir D, Knox R, Isidro y Sánchez J. Combining Partially Overlapping Multi-Omics Data in Databases Using Relationship Matrices. FRONTIERS IN PLANT SCIENCE 2020; 11:947. [PMID: 32765543 PMCID: PMC7381228 DOI: 10.3389/fpls.2020.00947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 06/10/2020] [Indexed: 05/08/2023]
Abstract
Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.
Collapse
Affiliation(s)
- Deniz Akdemir
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
| | - Ron Knox
- SCRDC-CRDSW, Swift Current Research and Developmental Centre, Swift Current, SK, Canada
| | - Julio Isidro y Sánchez
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM – INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| |
Collapse
|
47
|
Common problems associated with the microbial productions of aromatic compounds and corresponding metabolic engineering strategies. Biotechnol Adv 2020; 41:107548. [DOI: 10.1016/j.biotechadv.2020.107548] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 04/06/2020] [Accepted: 04/08/2020] [Indexed: 01/06/2023]
|
48
|
Abstract
Shotgun metagenomic sequencing has revolutionized our ability to detect and characterize the diversity and function of complex microbial communities. In this review, we highlight the benefits of using metagenomics as well as the breadth of conclusions that can be made using currently available analytical tools, such as greater resolution of species and strains across phyla and functional content, while highlighting challenges of metagenomic data analysis. Major challenges remain in annotating function, given the dearth of functional databases for environmental bacteria compared to model organisms, and the technical difficulties of metagenome assembly and phasing in heterogeneous environmental samples. In the future, improvements and innovation in technology and methodology will lead to lowered costs. Data integration using multiple technological platforms will lead to a better understanding of how to harness metagenomes. Subsequently, we will be able not only to characterize complex microbiomes but also to manipulate communities to achieve prosperous outcomes for health, agriculture, and environmental sustainability.
Collapse
Affiliation(s)
- Felicia N New
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853, USA;
| | - Ilana L Brito
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853, USA;
| |
Collapse
|
49
|
Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen T. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res 2020; 48:5217-5234. [PMID: 32338745 PMCID: PMC7261164 DOI: 10.1093/nar/gkaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/20/2020] [Accepted: 04/04/2020] [Indexed: 02/01/2023] Open
Abstract
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Collapse
Affiliation(s)
| | - Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| | - Pavan K Kota
- Department of Bioengineering, Houston, TX 77005, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Benjamin Coleman
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Advait Balaji
- Department of Computer Science, Houston, TX 77005, USA
| | - Gaurav Gupta
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Richard G Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Anshumali Shrivastava
- Department of Computer Science, Houston, TX 77005, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Todd J Treangen
- Department of Computer Science, Houston, TX 77005, USA
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| |
Collapse
|
50
|
Höllbacher B, Balázs K, Heinig M, Uhlenhaut NH. Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation. Comput Struct Biotechnol J 2020; 18:1330-1341. [PMID: 32612756 PMCID: PMC7306512 DOI: 10.1016/j.csbj.2020.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/21/2020] [Accepted: 05/23/2020] [Indexed: 02/06/2023] Open
Abstract
Advancements in the field of next generation sequencing lead to the generation of ever-more data, with the challenge often being how to combine and reconcile results from different OMICs studies such as genome, epigenome and transcriptome. Here we provide an overview of the standard processing pipelines for ChIP-seq and RNA-seq as well as common downstream analyses. We describe popular multi-omics data integration approaches used to identify target genes and co-factors, and we discuss how machine learning techniques may predict transcriptional regulators and gene expression.
Collapse
Affiliation(s)
- Barbara Höllbacher
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - Kinga Balázs
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany
| | - Matthias Heinig
- Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - N Henriette Uhlenhaut
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Metabolic Programming, TUM School of Life Sciences Weihenstephan, Munich 85354, Freising, Germany
| |
Collapse
|