1
|
Dutta S, Mudaranthakam DP, Li Y, Sardiu ME. PerSEveML: a web-based tool to identify persistent biomarker structure for rare events using an integrative machine learning approach. Mol Omics 2024; 20:348-358. [PMID: 38690925 DOI: 10.1039/d4mo00008k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
Omics data sets often pose a computational challenge due to their high dimensionality, large size, and non-linear structures. Analyzing these data sets becomes especially daunting in the presence of rare events. Machine learning (ML) methods have gained traction for analyzing rare events, yet there has been limited exploration of bioinformatics tools that integrate ML techniques to comprehend the underlying biology. Expanding upon our previously developed computational framework of an integrative machine learning approach, we introduce PerSEveML, an interactive web-based tool that uses crowd-sourced intelligence to predict rare events and determine feature selection structures. PerSEveML provides a comprehensive overview of the integrative approach through evaluation metrics that help users understand the contribution of individual ML methods to the prediction process. Additionally, PerSEveML calculates entropy and rank scores, which visually organize input features into a persistent structure of selected, unselected, and fluctuating categories that help researchers uncover meaningful hypotheses regarding the underlying biology. We have evaluated PerSEveML on three diverse biologically complex data sets with extremely rare events from small to large scale and have demonstrated its ability to generate valid hypotheses. PerSEveML is available at https://biostats-shinyr.kumc.edu/PerSEveML/ and https://github.com/sreejatadutta/PerSEveML.
Collapse
Affiliation(s)
- Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
| | - Dinesh Pal Mudaranthakam
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
- University of Kansas Cancer Center, Kansas City, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
- University of Kansas Cancer Center, Kansas City, USA
| | - Mihaela E Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
- University of Kansas Cancer Center, Kansas City, USA
- Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
2
|
Sganzerla Martinez G, Garduno A, Toloue Ostadgavahi A, Hewins B, Dutt M, Kumar A, Martin-Loeches I, Kelvin DJ. Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning. Int J Mol Sci 2024; 25:5920. [PMID: 38892107 PMCID: PMC11172967 DOI: 10.3390/ijms25115920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/24/2024] [Accepted: 05/25/2024] [Indexed: 06/21/2024] Open
Abstract
A common result of infection is an abnormal immune response, which may be detrimental to the host. To control the infection, the immune system might undergo regulation, therefore producing an excess of either pro-inflammatory or anti-inflammatory pathways that can lead to widespread inflammation, tissue damage, and organ failure. A dysregulated immune response can manifest as changes in differentiated immune cell populations and concentrations of circulating biomarkers. To propose an early diagnostic system that enables differentiation and identifies the severity of immune-dysregulated syndromes, we built an artificial intelligence tool that uses input data from single-cell RNA sequencing. In our results, single-cell transcriptomics successfully distinguished between mild and severe sepsis and COVID-19 infections. Moreover, by interpreting the decision patterns of our classification system, we identified that different immune cells upregulating or downregulating the expression of the genes CD3, CD14, CD16, FOSB, S100A12, and TCRɣδ can accurately differentiate between different degrees of infection. Our research has identified genes of significance that effectively distinguish between infections, offering promising prospects as diagnostic markers and providing potential targets for therapeutic intervention.
Collapse
Affiliation(s)
- Gustavo Sganzerla Martinez
- Microbiology and Immunology, Dalhousie University, Halifax, NS B3H 4H7, Canada; (G.S.M.); (A.T.O.); (B.H.); (M.D.); (A.K.)
- Department of Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology, Halifax, NS B3H 4H7, Canada
- Department of Immunology, Shantou University Medical College, Shantou 512025, China
| | - Alexis Garduno
- Department of Clinical Medicine, Trinity College Dublin, D08 NHY1 Dublin, Ireland; (A.G.); (I.M.-L.)
- Department of Intensive Care Medicine, St. James’s Hospital, D08 NHY1 Dublin, Ireland
| | - Ali Toloue Ostadgavahi
- Microbiology and Immunology, Dalhousie University, Halifax, NS B3H 4H7, Canada; (G.S.M.); (A.T.O.); (B.H.); (M.D.); (A.K.)
- Department of Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology, Halifax, NS B3H 4H7, Canada
- Department of Immunology, Shantou University Medical College, Shantou 512025, China
| | - Benjamin Hewins
- Microbiology and Immunology, Dalhousie University, Halifax, NS B3H 4H7, Canada; (G.S.M.); (A.T.O.); (B.H.); (M.D.); (A.K.)
- Department of Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology, Halifax, NS B3H 4H7, Canada
- Department of Immunology, Shantou University Medical College, Shantou 512025, China
| | - Mansi Dutt
- Microbiology and Immunology, Dalhousie University, Halifax, NS B3H 4H7, Canada; (G.S.M.); (A.T.O.); (B.H.); (M.D.); (A.K.)
- Department of Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology, Halifax, NS B3H 4H7, Canada
- Department of Immunology, Shantou University Medical College, Shantou 512025, China
| | - Anuj Kumar
- Microbiology and Immunology, Dalhousie University, Halifax, NS B3H 4H7, Canada; (G.S.M.); (A.T.O.); (B.H.); (M.D.); (A.K.)
- Department of Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology, Halifax, NS B3H 4H7, Canada
- Department of Immunology, Shantou University Medical College, Shantou 512025, China
| | - Ignacio Martin-Loeches
- Department of Clinical Medicine, Trinity College Dublin, D08 NHY1 Dublin, Ireland; (A.G.); (I.M.-L.)
- Department of Intensive Care Medicine, St. James’s Hospital, D08 NHY1 Dublin, Ireland
- Multidisciplinary Intensive Care Research Organization (MICRO), St. James’s Hospital, D08 NHY1 Dublin, Ireland
| | - David J. Kelvin
- Microbiology and Immunology, Dalhousie University, Halifax, NS B3H 4H7, Canada; (G.S.M.); (A.T.O.); (B.H.); (M.D.); (A.K.)
- Department of Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology, Halifax, NS B3H 4H7, Canada
- Department of Immunology, Shantou University Medical College, Shantou 512025, China
| |
Collapse
|
3
|
Chu LX, Wang WJ, Gu XP, Wu P, Gao C, Zhang Q, Wu J, Jiang DW, Huang JQ, Ying XW, Shen JM, Jiang Y, Luo LH, Xu JP, Ying YB, Chen HM, Fang A, Feng ZY, An SH, Li XK, Wang ZG. Spatiotemporal multi-omics: exploring molecular landscapes in aging and regenerative medicine. Mil Med Res 2024; 11:31. [PMID: 38797843 PMCID: PMC11129507 DOI: 10.1186/s40779-024-00537-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open
Abstract
Aging and regeneration represent complex biological phenomena that have long captivated the scientific community. To fully comprehend these processes, it is essential to investigate molecular dynamics through a lens that encompasses both spatial and temporal dimensions. Conventional omics methodologies, such as genomics and transcriptomics, have been instrumental in identifying critical molecular facets of aging and regeneration. However, these methods are somewhat limited, constrained by their spatial resolution and their lack of capacity to dynamically represent tissue alterations. The advent of emerging spatiotemporal multi-omics approaches, encompassing transcriptomics, proteomics, metabolomics, and epigenomics, furnishes comprehensive insights into these intricate molecular dynamics. These sophisticated techniques facilitate accurate delineation of molecular patterns across an array of cells, tissues, and organs, thereby offering an in-depth understanding of the fundamental mechanisms at play. This review meticulously examines the significance of spatiotemporal multi-omics in the realms of aging and regeneration research. It underscores how these methodologies augment our comprehension of molecular dynamics, cellular interactions, and signaling pathways. Initially, the review delineates the foundational principles underpinning these methods, followed by an evaluation of their recent applications within the field. The review ultimately concludes by addressing the prevailing challenges and projecting future advancements in the field. Indubitably, spatiotemporal multi-omics are instrumental in deciphering the complexities inherent in aging and regeneration, thus charting a course toward potential therapeutic innovations.
Collapse
Affiliation(s)
- Liu-Xi Chu
- Affiliated Cixi Hospital, Wenzhou Medical University, Ningbo, 315300, Zhejiang, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Wen-Jia Wang
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Xin-Pei Gu
- School of Pharmaceutical Sciences, Guangdong Provincial Key Laboratory of New Drug Screening, Southern Medical University, Guangzhou, 510515, China
- Department of Human Anatomy, Shandong First Medical University & Shandong Academy of Medical Sciences, Taian, 271000, Shandong, China
| | - Ping Wu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Chen Gao
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Quan Zhang
- Integrative Muscle Biology Laboratory, Division of Regenerative and Rehabilitative Sciences, University of Tennessee Health Science Center, Memphis, TN, 38163, United States
| | - Jia Wu
- Key Laboratory for Laboratory Medicine, Ministry of Education, Zhejiang Provincial Key Laboratory of Medical Genetics, School of Laboratory Medicine and Life Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Da-Wei Jiang
- Affiliated Cixi Hospital, Wenzhou Medical University, Ningbo, 315300, Zhejiang, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Jun-Qing Huang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- Key Laboratory of Imaging Diagnosis and Minimally Invasive Intervention Research, Institute of Imaging Diagnosis and Minimally Invasive Intervention Research, the Fifth Affiliated Hospital of Wenzhou Medical University, Lishui Hospital of Zhejiang University, Lishui, 323000, Zhejiang, China
| | - Xin-Wang Ying
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Jia-Men Shen
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Yi Jiang
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Li-Hua Luo
- School and Hospital of Stomatology, Wenzhou Medical University, Wenzhou, 324025, Zhejiang, China
| | - Jun-Peng Xu
- Affiliated Cixi Hospital, Wenzhou Medical University, Ningbo, 315300, Zhejiang, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Yi-Bo Ying
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Hao-Man Chen
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Ao Fang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China
| | - Zun-Yong Feng
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China.
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China.
- Departments of Diagnostic Radiology, Surgery, Chemical and Biomolecular Engineering, and Biomedical Engineering, Yong Loo Lin School of Medicine and College of Design and Engineering, National University of Singapore, Singapore, 119074, Singapore.
- Clinical Imaging Research Centre, Centre for Translational Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117599, Singapore.
- Nanomedicine Translational Research Program, NUS Center for Nanomedicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117597, Singapore.
- Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research (A*STAR), Singapore, 138673, Singapore.
| | - Shu-Hong An
- Department of Human Anatomy, Shandong First Medical University & Shandong Academy of Medical Sciences, Taian, 271000, Shandong, China.
| | - Xiao-Kun Li
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China.
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China.
| | - Zhou-Guang Wang
- Affiliated Cixi Hospital, Wenzhou Medical University, Ningbo, 315300, Zhejiang, China.
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China.
- National Key Laboratory of Macromolecular Drug Development and Manufacturing, School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, 325035, Zhejiang, China.
- Key Laboratory of Imaging Diagnosis and Minimally Invasive Intervention Research, Institute of Imaging Diagnosis and Minimally Invasive Intervention Research, the Fifth Affiliated Hospital of Wenzhou Medical University, Lishui Hospital of Zhejiang University, Lishui, 323000, Zhejiang, China.
| |
Collapse
|
4
|
Lam HYI, Ong XE, Mutwil M. Large language models in plant biology. TRENDS IN PLANT SCIENCE 2024:S1360-1385(24)00118-3. [PMID: 38797656 DOI: 10.1016/j.tplants.2024.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/29/2024] [Accepted: 04/30/2024] [Indexed: 05/29/2024]
Abstract
Large language models (LLMs), such as ChatGPT, have taken the world by storm. However, LLMs are not limited to human language and can be used to analyze sequential data, such as DNA, protein, and gene expression. The resulting foundation models can be repurposed to identify the complex patterns within the data, resulting in powerful, multipurpose prediction tools able to predict the state of cellular systems. This review outlines the different types of LLMs and showcases their recent uses in biology. Since LLMs have not yet been embraced by the plant community, we also cover how these models can be deployed for the plant kingdom.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Xing Er Ong
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| |
Collapse
|
5
|
Si Z, Li H, Shang W, Zhao Y, Kong L, Long C, Zuo Y, Feng Z. SpaNCMG: improving spatial domains identification of spatial transcriptomics using neighborhood-complementary mixed-view graph convolutional network. Brief Bioinform 2024; 25:bbae259. [PMID: 38811360 PMCID: PMC11136618 DOI: 10.1093/bib/bbae259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 05/10/2024] [Accepted: 05/16/2024] [Indexed: 05/31/2024] Open
Abstract
The advancement of spatial transcriptomics (ST) technology contributes to a more profound comprehension of the spatial properties of gene expression within tissues. However, due to challenges of high dimensionality, pronounced noise and dynamic limitations in ST data, the integration of gene expression and spatial information to accurately identify spatial domains remains challenging. This paper proposes a SpaNCMG algorithm for the purpose of achieving precise spatial domain description and localization based on a neighborhood-complementary mixed-view graph convolutional network. The algorithm enables better adaptation to ST data at different resolutions by integrating the local information from KNN and the global structure from r-radius into a complementary neighborhood graph. It also introduces an attention mechanism to achieve adaptive fusion of different reconstructed expressions, and utilizes KPCA method for dimensionality reduction. The application of SpaNCMG on five datasets from four sequencing platforms demonstrates superior performance to eight existing advanced methods. Specifically, the algorithm achieved highest ARI accuracies of 0.63 and 0.52 on the datasets of the human dorsolateral prefrontal cortex and mouse somatosensory cortex, respectively. It accurately identified the spatial locations of marker genes in the mouse olfactory bulb tissue and inferred the biological functions of different regions. When handling larger datasets such as mouse embryos, the SpaNCMG not only identified the main tissue structures but also explored unlabeled domains. Overall, the good generalization ability and scalability of SpaNCMG make it an outstanding tool for understanding tissue structure and disease mechanisms. Our codes are available at https://github.com/ZhihaoSi/SpaNCMG.
Collapse
Affiliation(s)
- Zhihao Si
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Hanshuang Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Wenjing Shang
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Yanan Zhao
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Lingjiao Kong
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Chunshen Long
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| |
Collapse
|
6
|
Cuevas-Diaz Duran R, Wei H, Wu J. Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genomics 2024; 25:444. [PMID: 38711017 PMCID: PMC11073985 DOI: 10.1186/s12864-024-10364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 04/29/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.
Collapse
Affiliation(s)
- Raquel Cuevas-Diaz Duran
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Nuevo Leon, 64710, Mexico.
| | - Haichao Wei
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA
| | - Jiaqian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA.
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| |
Collapse
|
7
|
Selvarajoo K, Maurer-Stroh S. Towards multi-omics synthetic data integration. Brief Bioinform 2024; 25:bbae213. [PMID: 38711370 DOI: 10.1093/bib/bbae213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 04/19/2024] [Indexed: 05/08/2024] Open
Abstract
Across many scientific disciplines, the development of computational models and algorithms for generating artificial or synthetic data is gaining momentum. In biology, there is a great opportunity to explore this further as more and more big data at multi-omics level are generated recently. In this opinion, we discuss the latest trends in biological applications based on process-driven and data-driven aspects. Moving ahead, we believe these methodologies can help shape novel multi-omics-scale cellular inferences.
Collapse
Affiliation(s)
- Kumar Selvarajoo
- Biomolecular Sequence to Function Division, BII, (A*STAR), Singapore, 138671, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, NUS, Singapore, 117456, Republic of Singapore
- School of Biological Sciences, Nanyang Technological University (NTU), Singapore 639798, Republic of Singapore
| | - Sebastian Maurer-Stroh
- Biomolecular Sequence to Function Division, BII, (A*STAR), Singapore, 138671, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, NUS, Singapore, 117456, Republic of Singapore
| |
Collapse
|
8
|
Ye F, Wang J, Li J, Mei Y, Guo G. Mapping Cell Atlases at the Single-Cell Level. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305449. [PMID: 38145338 PMCID: PMC10885669 DOI: 10.1002/advs.202305449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/01/2023] [Indexed: 12/26/2023]
Abstract
Recent advancements in single-cell technologies have led to rapid developments in the construction of cell atlases. These atlases have the potential to provide detailed information about every cell type in different organisms, enabling the characterization of cellular diversity at the single-cell level. Global efforts in developing comprehensive cell atlases have profound implications for both basic research and clinical applications. This review provides a broad overview of the cellular diversity and dynamics across various biological systems. In addition, the incorporation of machine learning techniques into cell atlas analyses opens up exciting prospects for the field of integrative biology.
Collapse
Affiliation(s)
- Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jiaqi Li
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Yuqing Mei
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative MedicineDr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative MedicineHangzhouZhejiang310058China
- Institute of HematologyZhejiang UniversityHangzhouZhejiang310000China
| |
Collapse
|
9
|
Mondello A, Dal Bo M, Toffoli G, Polano M. Machine learning in onco-pharmacogenomics: a path to precision medicine with many challenges. Front Pharmacol 2024; 14:1260276. [PMID: 38264526 PMCID: PMC10803549 DOI: 10.3389/fphar.2023.1260276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/26/2023] [Indexed: 01/25/2024] Open
Abstract
Over the past two decades, Next-Generation Sequencing (NGS) has revolutionized the approach to cancer research. Applications of NGS include the identification of tumor specific alterations that can influence tumor pathobiology and also impact diagnosis, prognosis and therapeutic options. Pharmacogenomics (PGx) studies the role of inheritance of individual genetic patterns in drug response and has taken advantage of NGS technology as it provides access to high-throughput data that can, however, be difficult to manage. Machine learning (ML) has recently been used in the life sciences to discover hidden patterns from complex NGS data and to solve various PGx problems. In this review, we provide a comprehensive overview of the NGS approaches that can be employed and the different PGx studies implicating the use of NGS data. We also provide an excursus of the ML algorithms that can exert a role as fundamental strategies in the PGx field to improve personalized medicine in cancer.
Collapse
Affiliation(s)
| | | | | | - Maurizio Polano
- Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano (CRO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Aviano, Italy
| |
Collapse
|
10
|
Dutta S, Mudaranthakam DP, Li Y, Sardiu ME. PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.25.564000. [PMID: 38196661 PMCID: PMC10775315 DOI: 10.1101/2023.10.25.564000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Omics datasets often pose a computational challenge due to their high dimensionality, large size, and non-linear structures. Analyzing these datasets becomes especially daunting in the presence of rare events. Machine learning (ML) methods have gained traction for analyzing rare events, yet there remains a limited exploration of bioinformatics tools that integrate ML techniques to comprehend the underlying biology. Expanding upon our previously developed computational framework of an integrative machine learning approach1, we introduce PerSEveML, an interactive web-based that uses crowd-sourced intelligence to predict rare events and determine feature selection structures. PerSEveML provides a comprehensive overview of the integrative approach through evaluation metrics that help users understand the contribution of individual ML methods to the prediction process. Additionally, PerSEveML calculates entropy and rank scores, which visually organize input features into a persistent structure of selected, unselected, and fluctuating categories that help researchers uncover meaningful hypotheses regarding the underlying biology. We have evaluated PerSEveML on three diverse biologically complex data sets with extremely rare events from small to large scale and have demonstrated its ability to generate valid hypotheses. PerSEveML is available at https://biostats-shinyr.kumc.edu/PerSEveML/ and https://github.com/sreejatadutta/PerSEveML.
Collapse
Affiliation(s)
- Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Dinesh Pal Mudaranthakam
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, USA
| | - Mihaela E Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, USA
- Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
11
|
Heuts BMH, Martens JHA. Understanding blood development and leukemia using sequencing-based technologies and human cell systems. Front Mol Biosci 2023; 10:1266697. [PMID: 37886034 PMCID: PMC10598665 DOI: 10.3389/fmolb.2023.1266697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/06/2023] [Indexed: 10/28/2023] Open
Abstract
Our current understanding of human hematopoiesis has undergone significant transformation throughout the years, challenging conventional views. The evolution of high-throughput technologies has enabled the accumulation of diverse data types, offering new avenues for investigating key regulatory processes in blood cell production and disease. In this review, we will explore the opportunities presented by these advancements for unraveling the molecular mechanisms underlying normal and abnormal hematopoiesis. Specifically, we will focus on the importance of enhancer-associated regulatory networks and highlight the crucial role of enhancer-derived transcription regulation. Additionally, we will discuss the unprecedented power of single-cell methods and the progression in using in vitro human blood differentiation system, in particular induced pluripotent stem cell models, in dissecting hematopoietic processes. Furthermore, we will explore the potential of ever more nuanced patient profiling to allow precision medicine approaches. Ultimately, we advocate for a multiparameter, regulatory network-based approach for providing a more holistic understanding of normal hematopoiesis and blood disorders.
Collapse
Affiliation(s)
- Branco M H Heuts
- Department of Molecular Biology, Faculty of Science, Radboud University, Nijmegen, Netherlands
| | - Joost H A Martens
- Department of Molecular Biology, Faculty of Science, Radboud University, Nijmegen, Netherlands
| |
Collapse
|