1
|
Pandey D, Onkara Perumal P. A scoping review on deep learning for next-generation RNA-Seq. data analysis. Funct Integr Genomics 2023; 23:134. [PMID: 37084004 DOI: 10.1007/s10142-023-01064-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/24/2023] [Accepted: 04/17/2023] [Indexed: 04/22/2023]
Abstract
In the last decade, transcriptome research adopting next-generation sequencing (NGS) technologies has gathered incredible momentum amongst functional genomics scientists, particularly amongst clinical/biomedical research groups. The progressive enfoldment/adoption of NGS technologies has incited an abundance of next-generation transcriptomic data harbouring an opulence of new knowledge in public databases. Nevertheless, knowledge discovery from these next-generation RNA-Seq. data analysis necessitates extensive bioinformatics know-how besides elaborate data analysis software packages consistent with the type and context of data analysis. Several reliability and reproducibility concerns continue to impede RNA-Seq. data analysis. Characteristic challenges comprise of data quality, hardware and networking provisions, selection and prioritisation of data analysis tools, and yet significantly implementing of robust machine learning algorithms for maximised exploitation of these experimental transcriptomic data. Over the years, numerous machine learning algorithms have been implemented for improved transcriptomic data analysis executing predominantly shallow learning approaches. More recently, deep learning algorithms are becoming more mainstream, and enactment for next-generation RNA-Seq. data analysis could be revolutionary in the coming years in the biomedical domain. In this scoping review, we attempt to determine the existing literature's size and potential nature in deep learning and NGS RNA-Seq. data analysis. An analysis of the contemporary topics of next-generation RNA-Seq. data analysis based on deep learning algorithms is critically reviewed, emphasising open-source resources.
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India
| | - P Onkara Perumal
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India.
| |
Collapse
|
2
|
Wang L, Wen D, Yin Y, Zhang P, Wen W, Gao J, Jiang Z. Musculoskeletal Ultrasound Image-Based Radiomics for the Diagnosis of Achilles Tendinopathy in Skiers. JOURNAL OF ULTRASOUND IN MEDICINE 2023; 42:363-371. [PMID: 35841273 PMCID: PMC10084008 DOI: 10.1002/jum.16059] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 05/10/2022] [Accepted: 06/23/2022] [Indexed: 02/05/2023]
Abstract
OBJECTIVES Our study aimed to develop and validate an efficient ultrasound image-based radiomic model for determining the Achilles tendinopathy in skiers. METHODS A total of 88 feet of skiers clinically diagnosed with unilateral chronic Achilles tendinopathy and 51 healthy feet were included in our study. According to the time order of enrollment, the data were divided into a training set (n = 89) and a test set (n = 50). The regions of interest (ROIs) were segmented manually, and 833 radiomic features were extracted from red, green, blue color channels and grayscale of ROIs using Pyradiomics, respectively. Three feature selection and three machine learning modeling algorithms were implemented respectively, for determining the optimal radiomics pipeline. Finally, the area under the receiver operating characteristic curve (AUC), consistency analysis, and decision analysis were used to evaluate the diagnostic performance. RESULTS By comparing nine radiomics analysis strategies of three color channels and grayscale, the radiomic model under the green channel obtained the best diagnostic performance, using the Random Forest selection and Support Vector Machine modeling, which was selected as the final machine learning model. All the selected radiomic features were significantly associated with the Achilles tendinopathy (P < .05). The radiomic model had a training AUC of 0.98, a test AUC of 0.99, a sensitivity of 0.90, and a specificity of 1, which could bring sufficient clinical net benefits. CONCLUSIONS Ultrasound image-based radiomics achieved high diagnostic performance, which could be used as an intelligent auxiliary tool for the diagnosis of Achilles tendinopathy.
Collapse
Affiliation(s)
- Likun Wang
- Department of Ultrasound, The First Affiliated Hospital of Hebei North University, Zhangjiakou, 075000, China
| | - Dehui Wen
- Department of Ultrasound, The First Affiliated Hospital of Hebei North University, Zhangjiakou, 075000, China
| | - Yanlin Yin
- Department of Orthopedics, The First Affiliated Hospital of Hebei North University, Zhangjiakou, 075000, China
| | - Peinan Zhang
- Department of Orthopedics, The First Affiliated Hospital of Hebei North University, Zhangjiakou, 075000, China
| | - Wen Wen
- Department of Ultrasound, West China Hospital, Sichuan University, Chengdu, 610000, China
| | - Jun Gao
- College of Computer Science, Sichuan University, Chengdu, 610000, China
| | - Zekun Jiang
- College of Computer Science, Sichuan University, Chengdu, 610000, China.,West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610000, China
| |
Collapse
|
3
|
Ferguson CA, Hwang JCM, Zhang Y, Cheng X. Single-Cell Classification Based on Population Nucleus Size Combining Microwave Impedance Spectroscopy and Machine Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:1001. [PMID: 36679798 PMCID: PMC9860723 DOI: 10.3390/s23021001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/04/2023] [Accepted: 01/13/2023] [Indexed: 06/17/2023]
Abstract
Many recent efforts in the diagnostic field address the accessibility of cancer diagnosis. Typical histological staining methods identify cancer cells visually by a larger nucleus with more condensed chromatin. Machine learning (ML) has been incorporated into image analysis for improving this process. Recently, impedance spectrometers have been shown to generate all-inclusive lab-on-a-chip platforms to detect nucleus abnormities. In this paper, a wideband electrical sensor and data analysis paradigm that can identify nuclear changes shows the realization of a single-cell microfluidic device to detect nuclei of altered sizes. To model cells of altered nucleus, Jurkat cells were treated to enlarge or shrink their nucleus followed by broadband sensing to obtain the S-parameters of single cells. The ability to deduce important frequencies associated with nucleus size is demonstrated and used to improve classification models in both binary and multiclass scenarios, despite a heterogeneous and overlapping cell population. The important frequency features match those predicted in a double-shell circuit model published in prior work, demonstrating a coherent new analytical technique for electrical data analysis. The electrical sensing platform assisted by ML with impressive accuracy of cell classification looks forward to a label-free and flexible approach to cancer diagnosis.
Collapse
Affiliation(s)
| | - James C. M. Hwang
- Department of Materials Science and Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
| | - Xuanhong Cheng
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
- Department of Materials Science and Engineering, Lehigh University, Bethlehem, PA 18015, USA
| |
Collapse
|
4
|
Ke W, Crist RM, Clogston JD, Stern ST, Dobrovolskaia MA, Grodzinski P, Jensen MA. Trends and patterns in cancer nanotechnology research: A survey of NCI's caNanoLab and nanotechnology characterization laboratory. Adv Drug Deliv Rev 2022; 191:114591. [PMID: 36332724 PMCID: PMC9712232 DOI: 10.1016/j.addr.2022.114591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/22/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022]
Abstract
Cancer nanotechnologies possess immense potential as therapeutic and diagnostic treatment modalities and have undergone significant and rapid advancement in recent years. With this emergence, the complexities of data standards in the field are on the rise. Data sharing and reanalysis is essential to more fully utilize this complex, interdisciplinary information to answer research questions, promote the technologies, optimize use of funding, and maximize the return on scientific investments. In order to support this, various data-sharing portals and repositories have been developed which not only provide searchable nanomaterial characterization data, but also provide access to standardized protocols for synthesis and characterization of nanomaterials as well as cutting-edge publications. The National Cancer Institute's (NCI) caNanoLab is a dedicated repository for all aspects pertaining to cancer-related nanotechnology data. The searchable database provides a unique opportunity for data mining and the use of artificial intelligence and machine learning, which aims to be an essential arm of future research studies, potentially speeding the design and optimization of next-generation therapies. It also provides an opportunity to track the latest trends and patterns in nanomedicine research. This manuscript provides the first look at such trends extracted from caNanoLab and compares these to similar metrics from the NCI's Nanotechnology Characterization Laboratory, a laboratory providing preclinical characterization of cancer nanotechnologies to researchers around the globe. Together, these analyses provide insight into the emerging interests of the research community and rise of promising nanoparticle technologies.
Collapse
Affiliation(s)
- Weina Ke
- Bioinformatics and Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Rachael M Crist
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Jeffrey D Clogston
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Stephan T Stern
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Marina A Dobrovolskaia
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Piotr Grodzinski
- Nanodelivery Systems and Devices Branch, Cancer Imaging Program, National Cancer Institute, Rockville, MD, United States
| | - Mark A Jensen
- Bioinformatics and Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States.
| |
Collapse
|
5
|
Single-cell multiomics reveals persistence of HIV-1 in expanded cytotoxic T cell clones. Immunity 2022; 55:1013-1031.e7. [PMID: 35320704 PMCID: PMC9203927 DOI: 10.1016/j.immuni.2022.03.004] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 02/19/2022] [Accepted: 03/08/2022] [Indexed: 02/02/2023]
Abstract
Understanding the drivers and markers of clonally expanding HIV-1-infected CD4+ T cells is essential for HIV-1 eradication. We used single-cell ECCITE-seq, which captures surface protein expression, cellular transcriptome, HIV-1 RNA, and TCR sequences within the same single cell to track clonal expansion dynamics in longitudinally archived samples from six HIV-1-infected individuals (during viremia and after suppressive antiretroviral therapy) and two uninfected individuals, in unstimulated conditions and after CMV and HIV-1 antigen stimulation. Despite antiretroviral therapy, persistent antigen and TNF responses shaped T cell clonal expansion. HIV-1 resided in Th1-polarized, antigen-responding T cells expressing BCL2 and SERPINB9 that may resist cell death. HIV-1 RNA+ T cell clones were larger in clone size, established during viremia, persistent after viral suppression, and enriched in GZMB+ cytotoxic effector memory Th1 cells. Targeting HIV-1-infected cytotoxic CD4+ T cells and drivers of clonal expansion provides another direction for HIV-1 eradication.
Collapse
|
6
|
Pane K, Zanfardino M, Grimaldi AM, Baldassarre G, Salvatore M, Incoronato M, Franzese M. Discovering Common miRNA Signatures Underlying Female-Specific Cancers via a Machine Learning Approach Driven by the Cancer Hallmark ERBB. Biomedicines 2022; 10:biomedicines10061306. [PMID: 35740327 PMCID: PMC9219956 DOI: 10.3390/biomedicines10061306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/25/2022] [Accepted: 05/29/2022] [Indexed: 11/29/2022] Open
Abstract
Big data processing, using omics data integration and machine learning (ML) methods, drive efforts to discover diagnostic and prognostic biomarkers for clinical decision making. Previously, we used the TCGA database for gene expression profiling of breast, ovary, and endometrial cancers, and identified a top-scoring network centered on the ERBB2 gene, which plays a crucial role in carcinogenesis in the three estrogen-dependent tumors. Here, we focused on microRNA expression signature similarity, asking whether they could target the ERBB family. We applied an ML approach on integrated TCGA miRNA profiling of breast, endometrium, and ovarian cancer to identify common miRNA signatures differentiating tumor and normal conditions. Using the ML-based algorithm and the miRTarBase database, we found 205 features and 158 miRNAs targeting ERBB isoforms, respectively. By merging the results of both databases and ranking each feature according to the weighted Support Vector Machine model, we prioritized 42 features, with accuracy (0.98), AUC (0.93–95% CI 0.917–0.94), sensitivity (0.85), and specificity (0.99), indicating their diagnostic capability to discriminate between the two conditions. In vitro validations by qRT-PCR experiments, using model and parental cell lines for each tumor type showed that five miRNAs (hsa-mir-323a-3p, hsa-mir-323b-3p, hsa-mir-331-3p, hsa-mir-381-3p, and hsa-mir-1301-3p) had expressed trend concordance between breast, ovarian, and endometrium cancer cell lines compared with normal lines, confirming our in silico predictions. This shows that an integrated computational approach combined with biological knowledge, could identify expression signatures as potential diagnostic biomarkers common to multiple tumors.
Collapse
Affiliation(s)
- Katia Pane
- IRCCS Synlab SDN, 80143 Naples, Italy; (K.P.); (A.M.G.); (M.S.); (M.I.); (M.F.)
| | - Mario Zanfardino
- IRCCS Synlab SDN, 80143 Naples, Italy; (K.P.); (A.M.G.); (M.S.); (M.I.); (M.F.)
- Correspondence:
| | - Anna Maria Grimaldi
- IRCCS Synlab SDN, 80143 Naples, Italy; (K.P.); (A.M.G.); (M.S.); (M.I.); (M.F.)
| | - Gustavo Baldassarre
- Molecular Oncology Unit, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, National Cancer Institute, 33081 Aviano, Italy;
| | - Marco Salvatore
- IRCCS Synlab SDN, 80143 Naples, Italy; (K.P.); (A.M.G.); (M.S.); (M.I.); (M.F.)
| | | | - Monica Franzese
- IRCCS Synlab SDN, 80143 Naples, Italy; (K.P.); (A.M.G.); (M.S.); (M.I.); (M.F.)
| |
Collapse
|
7
|
Yan H, Lee J, Song Q, Li Q, Schiefelbein J, Zhao B, Li S. Identification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methods. THE NEW PHYTOLOGIST 2022; 234:1507-1520. [PMID: 35211979 PMCID: PMC9314150 DOI: 10.1111/nph.18053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 02/06/2022] [Indexed: 05/16/2023]
Abstract
An essential step in the analysis of single-cell RNA sequencing data is to classify cells into specific cell types using marker genes. In this study, we have developed a machine learning pipeline called single-cell predictive marker (SPmarker) to identify novel cell-type marker genes in the Arabidopsis root. Unlike traditional approaches, our method uses interpretable machine learning models to select marker genes. We have demonstrated that our method can: assign cell types based on cells that were labelled using published methods; project cell types identified by trajectory analysis from one data set to other data sets; and assign cell types based on internal GFP markers. Using SPmarker, we have identified hundreds of new marker genes that were not identified before. As compared to known marker genes, the new marker genes have more orthologous genes identifiable in the corresponding rice single-cell clusters. The new root hair marker genes also include 172 genes with orthologs expressed in root hair cells in five non-Arabidopsis species, which expands the number of marker genes for this cell type by 35-154%. Our results represent a new approach to identifying cell-type marker genes from scRNA-seq data and pave the way for cross-species mapping of scRNA-seq data in plants.
Collapse
Affiliation(s)
- Haidong Yan
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
| | - Jiyoung Lee
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB)Virginia TechBlacksburgVA24060USA
| | - Qi Song
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB)Virginia TechBlacksburgVA24060USA
| | - Qi Li
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
| | - John Schiefelbein
- Department of Molecular, Cellular, and Developmental BiologyUniversity of MichiganAnn ArborMI48109USA
| | - Bingyu Zhao
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
| | - Song Li
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB)Virginia TechBlacksburgVA24060USA
| |
Collapse
|
8
|
Xing X, Yang F, Li H, Zhang J, Zhao Y, Gao M, Huang J, Yao J. Multi-level attention graph neural network based on co-expression gene modules for disease diagnosis and prognosis. Bioinformatics 2022; 38:2178-2186. [PMID: 35157021 DOI: 10.1093/bioinformatics/btac088] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 01/29/2022] [Accepted: 02/09/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Advanced deep learning techniques have been widely applied in disease diagnosis and prognosis with clinical omics, especially gene expression data. In the regulation of biological processes and disease progression, genes often work interactively rather than individually. Therefore, investigating gene association information and co-functional gene modules can facilitate disease state prediction. RESULTS To explore the gene modules and inter-gene relational information contained in the omics data, we propose a novel multi-level attention graph neural network (MLA-GNN) for disease diagnosis and prognosis. Specifically, we format omics data into co-expression graphs via weighted correlation network analysis, and then construct multi-level graph features, finally fuse them through a well-designed multi-level graph feature fully fusion module to conduct predictions. For model interpretation, a novel full-gradient graph saliency mechanism is developed to identify the disease-relevant genes. MLA-GNN achieves state-of-the-art performance on transcriptomic data from TCGA-LGG/TCGA-GBM and proteomic data from coronavirus disease 2019 (COVID-19)/non-COVID-19 patient sera. More importantly, the relevant genes selected by our model are interpretable and are consistent with the clinical understanding. AVAILABILITYAND IMPLEMENTATION The codes are available at https://github.com/TencentAILabHealthcare/MLA-GNN.
Collapse
Affiliation(s)
- Xiaohan Xing
- Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong 999077, China.,AI Lab, Tencent, Shenzhen 518000, China
| | - Fan Yang
- AI Lab, Tencent, Shenzhen 518000, China
| | - Hang Li
- AI Lab, Tencent, Shenzhen 518000, China.,School of Informatics, Xiamen University, Xiamen 361005, China
| | - Jun Zhang
- AI Lab, Tencent, Shenzhen 518000, China
| | - Yu Zhao
- AI Lab, Tencent, Shenzhen 518000, China
| | - Mingxuan Gao
- AI Lab, Tencent, Shenzhen 518000, China.,School of Informatics, Xiamen University, Xiamen 361005, China
| | | | | |
Collapse
|
9
|
Feng CH, Disis ML, Cheng C, Zhang L. Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models. J Transl Med 2022; 102:236-244. [PMID: 34537824 DOI: 10.1038/s41374-021-00662-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/10/2021] [Accepted: 08/12/2021] [Indexed: 11/09/2022] Open
Abstract
Colorectal cancer (CRC) is one of the most common cancers worldwide, and a leading cause of cancer deaths. Better classifying multicategory outcomes of CRC with clinical and omic data may help adjust treatment regimens based on individual's risk. Here, we selected the features that were useful for classifying four-category survival outcome of CRC using the clinical and transcriptomic data, or clinical, transcriptomic, microsatellite instability and selected oncogenic-driver data (all data) of TCGA. We also optimized multimetric feature selection to develop the best multinomial logistic regression (MLR) and random forest (RF) models that had the highest accuracy, precision, recall and F1 score, respectively. We identified 2073 differentially expressed genes of the TCGA RNASeq dataset. MLR overall outperformed RF in the multimetric feature selection. In both RF and MLR models, precision, recall and F1 score increased as the feature number increased and peaked at the feature number of 600-1000, while the models' accuracy remained stable. The best model was the MLR one with 825 features based on sum of squared coefficients using all data, and attained the best accuracy of 0.855, F1 of 0.738 and precision of 0.832, which were higher than those using clinical and transcriptomic data. The top-ranked features in the MLR model of the best performance using clinical and transcriptomic data were different from those using all data. However, pathologic staging, HBS1L, TSPYL4, and TP53TG3B were the overlapping top-20 ranked features in the best models using clinical and transcriptomic, or all data. Thus, we developed a multimetric feature-selection based MLR model that outperformed RF models in classifying four-category outcome of CRC patients. Interestingly, adding microsatellite instability and oncogenic-driver data to clinical and transcriptomic data improved models' performances. Precision and recall of tuned algorithms may change significantly as the feature number changes, but accuracy appears not sensitive to these changes.
Collapse
Affiliation(s)
| | - Mary L Disis
- UW Medicine Cancer Vaccine Institute, University of Washington, Seattle, WA, USA
| | - Chao Cheng
- Department of Medicine, Section of Epidemiology and Population Sciences, Baylor College of Medicine, Houston, TX, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, USA.,Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, NJ, USA. .,Department of Pathology, Princeton Medical Center, Plainsboro, NJ, USA. .,Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA. .,Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA.
| |
Collapse
|
10
|
Qin G, Du L, Ma Y, Yin Y, Wang L. Gene biomarker prediction in glioma by integrating scRNA-seq data and gene regulatory network. BMC Med Genomics 2021; 14:287. [PMID: 34863158 PMCID: PMC8643020 DOI: 10.1186/s12920-021-01115-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 11/01/2021] [Indexed: 12/22/2022] Open
Abstract
Background Although great efforts have been made to study the occurrence and development of glioma, the molecular mechanisms of glioma are still unclear. Single-cell sequencing technology provides a new perspective for researchers to explore the pathogens of tumors to further help make treatment and prognosis decisions for patients with tumors. Methods In this study, we proposed an algorithm framework to explore the molecular mechanisms of glioma by integrating single-cell gene expression profiles and gene regulatory relations. First, since there were great differences among malignant cells from different glioma samples, we analyzed the expression status of malignant cells for each sample, and then tumor consensus genes were identified by constructing and analyzing cell-specific networks. Second, to comprehensively analyze the characteristics of glioma, we integrated transcriptional regulatory relationships and consensus genes to construct a tumor-specific regulatory network. Third, we performed a hybrid clustering analysis to identify glioma cell types. Finally, candidate tumor gene biomarkers were identified based on cell types and known glioma-related genes. Results We got six identified cell types using the method we proposed and for these cell types, we performed functional and biological pathway enrichment analyses. The candidate tumor gene biomarkers were analyzed through survival analysis and verified using literature from PubMed. Conclusions The results showed that these candidate tumor gene biomarkers were closely related to glioma and could provide clues for the diagnosis and prognosis of patients with glioma. In addition, we found that four of the candidate tumor gene biomarkers (NDUFS5, NDUFA1, NDUFA13, and NDUFB8) belong to the NADH ubiquinone oxidoreductase subunit gene family, so we inferred that this gene family may be strongly related to glioma.
Collapse
Affiliation(s)
- Guimin Qin
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Longting Du
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Yuying Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Yu Yin
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Liming Wang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
| |
Collapse
|
11
|
Musolf AM, Holzinger ER, Malley JD, Bailey-Wilson JE. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Hum Genet 2021; 141:1515-1528. [PMID: 34862561 PMCID: PMC9360120 DOI: 10.1007/s00439-021-02402-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 11/08/2021] [Indexed: 01/26/2023]
Abstract
Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.
Collapse
Affiliation(s)
- Anthony M Musolf
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Emily R Holzinger
- Target Sciences, Informatics and Predictive Sciences, Bristol Myers Squibb, Cambridge, MA, USA
| | - James D Malley
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Joan E Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA.
| |
Collapse
|
12
|
Hammad A, Elshaer M, Tang X. Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:8997-9015. [PMID: 34814332 DOI: 10.3934/mbe.2021443] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.
Collapse
Affiliation(s)
- Ahmed Hammad
- Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
- Radiation Biology Department, National Center for Radiation Research and Technology, Egyptian Atomic Energy Authority, Cairo 13759, Egypt
| | - Mohamed Elshaer
- Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
- Labeled Compounds Department, Hot Labs Center, Egyptian Atomic Energy Authority, Cairo 13759, Egypt
| | - Xiuwen Tang
- Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| |
Collapse
|
13
|
Javaid A, Shahab O, Adorno W, Fernandes P, May E, Syed S. Machine Learning Predictive Outcomes Modeling in Inflammatory Bowel Diseases. Inflamm Bowel Dis 2021; 28:819-829. [PMID: 34417815 PMCID: PMC9165557 DOI: 10.1093/ibd/izab187] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Indexed: 12/14/2022]
Abstract
There is a rising interest in use of big data approaches to personalize treatment of inflammatory bowel diseases (IBDs) and to predict and prevent outcomes such as disease flares and therapeutic nonresponse. Machine learning (ML) provides an avenue to identify and quantify features across vast quantities of data to produce novel insights in disease management. In this review, we cover current approaches in ML-driven predictive outcomes modeling for IBD and relate how advances in other fields of medicine may be applied to improve future IBD predictive models. Numerous studies have incorporated clinical, laboratory, or omics data to predict significant outcomes in IBD, including hospitalizations, outpatient corticosteroid use, biologic response, and refractory disease after colectomy, among others, with considerable health care dollars saved as a result. Encouraging results in other fields of medicine support efforts to use ML image analysis-including analysis of histopathology, endoscopy, and radiology-to further advance outcome predictions in IBD. Though obstacles to clinical implementation include technical barriers, bias within data sets, and incongruence between limited data sets preventing model validation in larger cohorts, ML-predictive analytics have the potential to transform the clinical management of IBD. Future directions include the development of models that synthesize all aforementioned approaches to produce more robust predictive metrics.
Collapse
Affiliation(s)
- Aamir Javaid
- Division of Pediatric Gastroenterology and Hepatology, Department of Pediatrics, University of Virginia, Charlottesville, VA, USA
| | - Omer Shahab
- Division of Gastroenterology and Hepatology, Department of Medicine, Virginia Commonwealth University, Richmond, VA, USA
| | - William Adorno
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Philip Fernandes
- Division of Pediatric Gastroenterology and Hepatology, Department of Pediatrics, University of Virginia, Charlottesville, VA, USA
| | - Eve May
- Division of Gastroenterology and Hepatology, Department of Pediatrics, Children’s National Hospital, Washington, DC, USA
| | - Sana Syed
- Division of Pediatric Gastroenterology and Hepatology, Department of Pediatrics, University of Virginia, Charlottesville, VA, USA,School of Data Science, University of Virginia, Charlottesville, VA, USA,Address Correspondence to: Sana Syed, MD, MSCR, MSDS, Division of Pediatric Gastroenterology and Hepatology, Department of Pediatrics, University of Virginia, 409 Lane Rd, Room 2035B, Charlottesville, VA, 22908, USA ()
| |
Collapse
|
14
|
Xie J, Yin Y, Yang F, Sun J, Wang J. Differential Network Analysis Reveals Regulatory Patterns in Neural Stem Cell Fate Decision. Interdiscip Sci 2021; 13:91-102. [PMID: 33439459 DOI: 10.1007/s12539-020-00415-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 12/11/2020] [Accepted: 12/22/2020] [Indexed: 11/30/2022]
Abstract
Deciphering regulatory patterns of neural stem cell (NSC) differentiation with multiple stages is essential to understand NSC differentiation mechanisms. Recent single-cell transcriptome datasets became available at individual differentiation. However, a systematic and integrative analysis of multiple datasets at multiple temporal stages of NSC differentiation is lacking. In this study, we propose a new method integrating prior information to construct three gene regulatory networks at pair-wise stages of transcriptome and apply this method to investigate five NSC differentiation paths on four different single-cell transcriptome datasets. By constructing gene regulatory networks for each path, we delineate their regulatory patterns via differential topology and network diffusion analyses. We find 12 common differentially expressed genes among the five NSC differentiation paths, with one common regulatory pattern (Gsk3b_App_Cdk5) shared by all paths. The identified regulatory pattern, partly supported by previous experimental evidence, is essential to all differentiation paths, but it plays a different role in each path when regulating other genes. Together, our integrative analysis provides both common and specific regulatory mechanisms for each of the five NSC differentiation paths.
Collapse
Affiliation(s)
- Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Yiting Yin
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Fuzhang Yang
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Jiamin Sun
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Jiao Wang
- School of Life Sciences, Shanghai University, Shanghai, China.
| |
Collapse
|
15
|
Xu D, Zhang J, Xu H, Zhang Y, Chen W, Gao R, Dehmer M. Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data. BMC Genomics 2020; 21:650. [PMID: 32962626 PMCID: PMC7510277 DOI: 10.1186/s12864-020-07038-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 08/30/2020] [Indexed: 12/19/2022] Open
Abstract
Background The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field. Results In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy. Conclusions The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.
Collapse
Affiliation(s)
- Da Xu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Jialin Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Hanxiao Xu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
| | - Wei Chen
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Matthias Dehmer
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, Steyr, Austria.,College of Computer and Control Engineering, Nankai University, Tianjin, 300071, China.,Department of Mechatronics and Biomedical Computer Science, UMIT, Hall in Tyrol, Austria
| |
Collapse
|
16
|
Xie YR, Castro DC, Bell SE, Rubakhin SS, Sweedler JV. Single-Cell Classification Using Mass Spectrometry through Interpretable Machine Learning. Anal Chem 2020; 92:9338-9347. [PMID: 32519839 PMCID: PMC7374983 DOI: 10.1021/acs.analchem.0c01660] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The brain consists of organized ensembles of cells that exhibit distinct morphologies, cellular connectivity, and dynamic biochemistries that control the executive functions of an organism. However, the relationships between chemical heterogeneity, cell function, and phenotype are not always understood. Recent advancements in matrix-assisted laser desorption/ionization mass spectrometry have enabled the high-throughput, multiplexed chemical analysis of single cells, capable of resolving hundreds of molecules in each mass spectrum. We developed a machine learning workflow to classify single cells according to their mass spectra based on cell groups of interest (GOI), e.g., neurons vs astrocytes. Three data sets from various cell groups were acquired on three different mass spectrometer platforms representing thousands of individual cell spectra that were collected and used to validate the single cell classification workflow. The trained models achieved >80% classification accuracy and were subjected to the recently developed instance-based model interpretation framework, SHapley Additive exPlanations (SHAP), which locally assigns feature importance for each single-cell spectrum. SHAP values were used for both local and global interpretations of our data sets, preserving the chemical heterogeneity uncovered by the single-cell analysis while offering the ability to perform supervised analysis. The top contributing mass features to each of the GOI were ranked and selected using mean absolute SHAP values, highlighting the features that are specific to the defined GOI. Our approach provides insight into discriminating the chemical profiles of the single cells through interpretable machine learning, facilitating downstream analysis and validation.
Collapse
Affiliation(s)
- Yuxuan Richard Xie
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| | - Daniel C. Castro
- Department of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| | - Sara E. Bell
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| | - Stanislav S. Rubakhin
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| | - Jonathan V. Sweedler
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Department of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| |
Collapse
|
17
|
Lin Y, Qian F, Shen L, Chen F, Chen J, Shen B. Computer-aided biomarker discovery for precision medicine: data resources, models and applications. Brief Bioinform 2020; 20:952-975. [PMID: 29194464 DOI: 10.1093/bib/bbx158] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 10/17/2017] [Indexed: 12/21/2022] Open
Abstract
Biomarkers are a class of measurable and evaluable indicators with the potential to predict disease initiation and progression. In contrast to disease-associated factors, biomarkers hold the promise to capture the changeable signatures of biological states. With methodological advances, computer-aided biomarker discovery has now become a burgeoning paradigm in the field of biomedical science. In recent years, the 'big data' term has accumulated for the systematical investigation of complex biological phenomena and promoted the flourishing of computational methods for systems-level biomarker screening. Compared with routine wet-lab experiments, bioinformatics approaches are more efficient to decode disease pathogenesis under a holistic framework, which is propitious to identify biomarkers ranging from single molecules to molecular networks for disease diagnosis, prognosis and therapy. In this review, the concept and characteristics of typical biomarker types, e.g. single molecular biomarkers, module/network biomarkers, cross-level biomarkers, etc., are explicated on the guidance of systems biology. Then, publicly available data resources together with some well-constructed biomarker databases and knowledge bases are introduced. Biomarker identification models using mathematical, network and machine learning theories are sequentially discussed. Based on network substructural and functional evidences, a novel bioinformatics model is particularly highlighted for microRNA biomarker discovery. This article aims to give deep insights into the advantages and challenges of current computational approaches for biomarker detection, and to light up the future wisdom toward precision medicine and nation-wide healthcare.
Collapse
Affiliation(s)
- Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Fuliang Qian
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Li Shen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Feifei Chen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering, Suzhou University of Science and Technology, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| |
Collapse
|
18
|
Lin E, Lin CH, Lane HY. Precision Psychiatry Applications with Pharmacogenomics: Artificial Intelligence and Machine Learning Approaches. Int J Mol Sci 2020; 21:ijms21030969. [PMID: 32024055 PMCID: PMC7037937 DOI: 10.3390/ijms21030969] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 01/25/2020] [Accepted: 01/30/2020] [Indexed: 12/22/2022] Open
Abstract
A growing body of evidence now suggests that precision psychiatry, an interdisciplinary field of psychiatry, precision medicine, and pharmacogenomics, serves as an indispensable foundation of medical practices by offering the accurate medication with the accurate dose at the accurate time to patients with psychiatric disorders. In light of the latest advancements in artificial intelligence and machine learning techniques, numerous biomarkers and genetic loci associated with psychiatric diseases and relevant treatments are being discovered in precision psychiatry research by employing neuroimaging and multi-omics. In this review, we focus on the latest developments for precision psychiatry research using artificial intelligence and machine learning approaches, such as deep learning and neural network algorithms, together with multi-omics and neuroimaging data. Firstly, we review precision psychiatry and pharmacogenomics studies that leverage various artificial intelligence and machine learning techniques to assess treatment prediction, prognosis prediction, diagnosis prediction, and the detection of potential biomarkers. In addition, we describe potential biomarkers and genetic loci that have been discovered to be associated with psychiatric diseases and relevant treatments. Moreover, we outline the limitations in regard to the previous precision psychiatry and pharmacogenomics studies. Finally, we present a discussion of directions and challenges for future research.
Collapse
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
| | - Chieh-Hsin Lin
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung 83301, Taiwan
- School of Medicine, Chang Gung University, Taoyuan 33302, Taiwan
- Correspondence: (C.-H.L.); (H.-Y.L.)
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, China Medical University Hospital, Taichung 40402, Taiwan
- Brain Disease Research Center, China Medical University Hospital, Taichung 40402, Taiwan
- Department of Psychology, College of Medical and Health Sciences, Asia University, Taichung 41354, Taiwan
- Correspondence: (C.-H.L.); (H.-Y.L.)
| |
Collapse
|
19
|
Gonzalez-Dias P, Lee EK, Sorgi S, de Lima DS, Urbanski AH, Silveira EL, Nakaya HI. Methods for predicting vaccine immunogenicity and reactogenicity. Hum Vaccin Immunother 2019; 16:269-276. [PMID: 31869262 PMCID: PMC7062420 DOI: 10.1080/21645515.2019.1697110] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Subjects receiving the same vaccine often show different levels of immune responses and some may even present adverse side effects to the vaccine. Systems vaccinology can combine omics data and machine learning techniques to obtain highly predictive signatures of vaccine immunogenicity and reactogenicity. Currently, several machine learning methods are already available to researchers with no background in bioinformatics. Here we described the four main steps to discover markers of vaccine immunogenicity and reactogenicity: (1) Preparing the data; (2) Selecting the vaccinees and relevant genes; (3) Choosing the algorithm; (4) Blind testing your model. With the increasing number of Systems Vaccinology datasets being generated, we expect that the accuracy and robustness of signatures of vaccine reactogenicity and immunogenicity will significantly improve.
Collapse
Affiliation(s)
- Patrícia Gonzalez-Dias
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Eva K Lee
- The Center for Operations Research in Medicine and HealthCare, Georgia Institute of Technology, Atlanta, GA, USA
| | - Sara Sorgi
- Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Diógenes S de Lima
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Alysson H Urbanski
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Eduardo Lv Silveira
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Helder I Nakaya
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil.,Scientific Platform Pasteur, University of São Paulo, São Paulo, Brazil
| |
Collapse
|
20
|
Maj C, Azevedo T, Giansanti V, Borisov O, Dimitri GM, Spasov S, Lió P, Merelli I. Integration of Machine Learning Methods to Dissect Genetically Imputed Transcriptomic Profiles in Alzheimer's Disease. Front Genet 2019; 10:726. [PMID: 31552082 PMCID: PMC6735530 DOI: 10.3389/fgene.2019.00726] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/10/2019] [Indexed: 12/12/2022] Open
Abstract
The genetic component of many common traits is associated with the gene expression and several variants act as expression quantitative loci, regulating the gene expression in a tissue specific manner. In this work, we applied tissue-specific cis-eQTL gene expression prediction models on the genotype of 808 samples including controls, subjects with mild cognitive impairment, and patients with Alzheimer's Disease. We then dissected the imputed transcriptomic profiles by means of different unsupervised and supervised machine learning approaches to identify potential biological associations. Our analysis suggests that unsupervised and supervised methods can provide complementary information, which can be integrated for a better characterization of the underlying biological system. In particular, a variational autoencoder representation of the transcriptomic profiles, followed by a support vector machine classification, has been used for tissue-specific gene prioritizations. Interestingly, the achieved gene prioritizations can be efficiently integrated as a feature selection step for improving the accuracy of deep learning classifier networks. The identified gene-tissue information suggests a potential role for inflammatory and regulatory processes in gut-brain axis related tissues. In line with the expected low heritability that can be apportioned to eQTL variants, we were able to achieve only relatively low prediction capability with deep learning classification models. However, our analysis revealed that the classification power strongly depends on the network structure, with recurrent neural networks being the best performing network class. Interestingly, cross-tissue analysis suggests a potentially greater role of models trained in brain tissues also by considering dementia-related endophenotypes. Overall, the present analysis suggests that the combination of supervised and unsupervised machine learning techniques can be used for the evaluation of high dimensional omics data.
Collapse
Affiliation(s)
- Carlo Maj
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Tiago Azevedo
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Valentina Giansanti
- National Research Council, Institute for Biomedical Technologies, Milan, Italy
| | - Oleg Borisov
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Giovanna Maria Dimitri
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Simeon Spasov
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | | | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Ivan Merelli
- National Research Council, Institute for Biomedical Technologies, Milan, Italy
| |
Collapse
|
21
|
An elastic-net logistic regression approach to generate classifiers and gene signatures for types of immune cells and T helper cell subsets. BMC Bioinformatics 2019; 20:433. [PMID: 31438843 PMCID: PMC6704630 DOI: 10.1186/s12859-019-2994-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 07/15/2019] [Indexed: 02/08/2023] Open
Abstract
Background Host immune response is coordinated by a variety of different specialized cell types that vary in time and location. While host immune response can be studied using conventional low-dimensional approaches, advances in transcriptomics analysis may provide a less biased view. Yet, leveraging transcriptomics data to identify immune cell subtypes presents challenges for extracting informative gene signatures hidden within a high dimensional transcriptomics space characterized by low sample numbers with noisy and missing values. To address these challenges, we explore using machine learning methods to select gene subsets and estimate gene coefficients simultaneously. Results Elastic-net logistic regression, a type of machine learning, was used to construct separate classifiers for ten different types of immune cell and for five T helper cell subsets. The resulting classifiers were then used to develop gene signatures that best discriminate among immune cell types and T helper cell subsets using RNA-seq datasets. We validated the approach using single-cell RNA-seq (scRNA-seq) datasets, which gave consistent results. In addition, we classified cell types that were previously unannotated. Finally, we benchmarked the proposed gene signatures against other existing gene signatures. Conclusions Developed classifiers can be used as priors in predicting the extent and functional orientation of the host immune response in diseases, such as cancer, where transcriptomic profiling of bulk tissue samples and single cells are routinely employed. Information that can provide insight into the mechanistic basis of disease and therapeutic response. The source code and documentation are available through GitHub: https://github.com/KlinkeLab/ImmClass2019. Electronic supplementary material The online version of this article (10.1186/s12859-019-2994-z) contains supplementary material, which is available to authorized users.
Collapse
|
22
|
Competing Endogenous RNA and Coexpression Network Analysis for Identification of Potential Biomarkers and Therapeutics in association with Metastasis Risk and Progression of Prostate Cancer. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019; 2019:8265958. [PMID: 31467637 PMCID: PMC6701351 DOI: 10.1155/2019/8265958] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 04/11/2019] [Accepted: 06/19/2019] [Indexed: 02/06/2023]
Abstract
Prostate cancer (PCa) is the most frequently diagnosed malignant neoplasm in men. Despite the high incidence, the underlying pathogenic mechanisms of PCa are still largely unknown, which limits the therapeutic options and leads to poor prognosis. Herein, based on the expression profiles from The Cancer Genome Atlas (TCGA) database, we investigated the interactions between long noncoding RNA (lncRNA) and mRNA by constructing a competing endogenous RNA network. Several competing endogenous RNAs could participate in the tumorigenesis of PCa. Six lncRNA signatures were identified as potential candidates associated with stage progression by the Kolmogorov-Smirnov test. In addition, 32 signatures from the coexpression network had potential diagnostic value for PCa lymphatic metastasis using machine learning algorithms. By targeting the coexpression network, the antifungal compound econazole was screened out for PCa treatment. Econazole could induce growth restraint, arrest the cell cycle, lead to apoptosis, inhibit migration, invasion, and adhesion in PC3 and DU145 cell lines, and inhibit the growth of prostate xenografts in nude mice. This systematic characterization of lncRNAs, microRNAs, and mRNAs in the risk of metastasis and progression of PCa will aid in the identification of candidate prognostic biomarkers and potential therapeutic drugs.
Collapse
|
23
|
Way GP, Greene CS. Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021348] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.
Collapse
Affiliation(s)
- Gregory P. Way
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
24
|
Machine Learning in Neural Networks. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1192:127-137. [DOI: 10.1007/978-981-32-9721-0_7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
25
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
26
|
Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics 2018; 15:41-51. [PMID: 29275361 PMCID: PMC5822181 DOI: 10.21873/cgp.20063] [Citation(s) in RCA: 320] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 10/03/2017] [Accepted: 10/23/2017] [Indexed: 12/23/2022] Open
Abstract
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications.
Collapse
Affiliation(s)
- Shujun Huang
- College of Pharmacy, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada
- Research Institute of Oncology and Hematology, CancerCare Manitoba, Winnipeg, Canada
| | - Nianguang Cai
- Research Institute of Oncology and Hematology, CancerCare Manitoba, Winnipeg, Canada
| | - Pedro Penzuti Pacheco
- Research Institute of Oncology and Hematology, CancerCare Manitoba, Winnipeg, Canada
| | - Shavira Narrandes
- Research Institute of Oncology and Hematology, CancerCare Manitoba, Winnipeg, Canada
- Departments of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada
| | - Yang Wang
- Department of Computer Science, Faculty of Sciences, University of Manitoba, Winnipeg, Canada
| | - Wayne Xu
- Research Institute of Oncology and Hematology, CancerCare Manitoba, Winnipeg, Canada
- Departments of Biochemistry and Medical Genetics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada
- College of Pharmacy, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
27
|
Schönbach C, Verma C, Wee LJK, Bond PJ, Ranganathan S. 2016 update on APBioNet's annual international conference on bioinformatics (InCoB). BMC Genomics 2016; 17:1036. [PMID: 28155656 PMCID: PMC5259860 DOI: 10.1186/s12864-016-3362-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.
Collapse
Affiliation(s)
- Christian Schönbach
- International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811 Japan
| | - Chandra Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Lawrence Jin Kiat Wee
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore, 138632 Singapore
| | - Peter John Bond
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109 Australia
| |
Collapse
|