1
|
Jiang S, Wang T, Zhang KH. Data-driven decision-making for precision diagnosis of digestive diseases. Biomed Eng Online 2023; 22:87. [PMID: 37658345 PMCID: PMC10472739 DOI: 10.1186/s12938-023-01148-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 08/15/2023] [Indexed: 09/03/2023] Open
Abstract
Modern omics technologies can generate massive amounts of biomedical data, providing unprecedented opportunities for individualized precision medicine. However, traditional statistical methods cannot effectively process and utilize such big data. To meet this new challenge, machine learning algorithms have been developed and applied rapidly in recent years, which are capable of reducing dimensionality, extracting features, organizing data and forming automatable data-driven clinical decision systems. Data-driven clinical decision-making have promising applications in precision medicine and has been studied in digestive diseases, including early diagnosis and screening, molecular typing, staging and stratification of digestive malignancies, as well as precise diagnosis of Crohn's disease, auxiliary diagnosis of imaging and endoscopy, differential diagnosis of cystic lesions, etiology discrimination of acute abdominal pain, stratification of upper gastrointestinal bleeding (UGIB), and real-time diagnosis of esophageal motility function, showing good application prospects. Herein, we reviewed the recent progress of data-driven clinical decision making in precision diagnosis of digestive diseases and discussed the limitations of data-driven decision making after a brief introduction of methods for data-driven decision making.
Collapse
Affiliation(s)
- Song Jiang
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, 330006 China
- Jiangxi Institute of Gastroenterology and Hepatology, Nanchang, 330006 China
| | - Ting Wang
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, 330006 China
- Jiangxi Institute of Gastroenterology and Hepatology, Nanchang, 330006 China
| | - Kun-He Zhang
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, 330006 China
- Jiangxi Institute of Gastroenterology and Hepatology, Nanchang, 330006 China
| |
Collapse
|
2
|
Tran QT, Alom MZ, Orr BA. Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors. BMC Bioinformatics 2022; 23:223. [PMID: 35676649 PMCID: PMC9178802 DOI: 10.1186/s12859-022-04764-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Precision medicine for cancer treatment relies on an accurate pathological diagnosis. The number of known tumor classes has increased rapidly, and reliance on traditional methods of histopathologic classification alone has become unfeasible. To help reduce variability, validation costs, and standardize the histopathological diagnostic process, supervised machine learning models using DNA-methylation data have been developed for tumor classification. These methods require large labeled training data sets to obtain clinically acceptable classification accuracy. While there is abundant unlabeled epigenetic data across multiple databases, labeling pathology data for machine learning models is time-consuming and resource-intensive, especially for rare tumor types. Semi-supervised learning (SSL) approaches have been used to maximize the utility of labeled and unlabeled data for classification tasks and are effectively applied in genomics. SSL methods have not yet been explored with epigenetic data nor demonstrated beneficial to central nervous system (CNS) tumor classification. RESULTS This paper explores the application of semi-supervised machine learning on methylation data to improve the accuracy of supervised learning models in classifying CNS tumors. We comprehensively evaluated 11 SSL methods and developed a novel combination approach that included a self-training with editing using support vector machine (SETRED-SVM) model and an L2-penalized, multinomial logistic regression model to obtain high confidence labels from a few labeled instances. Results across eight random forest and neural net models show that the pseudo-labels derived from our SSL method can significantly increase prediction accuracy for 82 CNS tumors and 9 normal controls. CONCLUSIONS The proposed combination of semi-supervised technique and multinomial logistic regression holds the potential to leverage the abundant publicly available unlabeled methylation data effectively. Such an approach is highly beneficial in providing additional training examples, especially for scarce tumor types, to boost the prediction accuracy of supervised models.
Collapse
Affiliation(s)
- Quynh T Tran
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 250, Memphis, TN, 38105-3678, USA
| | - Md Zahangir Alom
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 250, Memphis, TN, 38105-3678, USA
| | - Brent A Orr
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 250, Memphis, TN, 38105-3678, USA.
| |
Collapse
|
3
|
Lam C, Tso CF, Green-Saxena A, Pellegrini E, Iqbal Z, Evans D, Hoffman J, Calvert J, Mao Q, Das R. Semi-supervised deep learning from time series clinical data for acute respiratory distress syndrome prediction: model development and validation study. JMIR Form Res 2021; 5:e28028. [PMID: 34398784 PMCID: PMC8447921 DOI: 10.2196/28028] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 06/18/2021] [Accepted: 08/01/2021] [Indexed: 11/23/2022] Open
Abstract
Background A high number of patients who are hospitalized with COVID-19 develop acute respiratory distress syndrome (ARDS). Objective In response to the need for clinical decision support tools to help manage the next pandemic during the early stages (ie, when limited labeled data are present), we developed machine learning algorithms that use semisupervised learning (SSL) techniques to predict ARDS development in general and COVID-19 populations based on limited labeled data. Methods SSL techniques were applied to 29,127 encounters with patients who were admitted to 7 US hospitals from May 1, 2019, to May 1, 2021. A recurrent neural network that used a time series of electronic health record data was applied to data that were collected when a patient’s peripheral oxygen saturation level fell below the normal range (<97%) to predict the subsequent development of ARDS during the remaining duration of patients’ hospital stay. Model performance was assessed with the area under the receiver operating characteristic curve and area under the precision recall curve of an external hold-out test set. Results For the whole data set, the median time between the first peripheral oxygen saturation measurement of <97% and subsequent respiratory failure was 21 hours. The area under the receiver operating characteristic curve for predicting subsequent ARDS development was 0.73 when the model was trained on a labeled data set of 6930 patients, 0.78 when the model was trained on the labeled data set that had been augmented with the unlabeled data set of 16,173 patients by using SSL techniques, and 0.84 when the model was trained on the entire training set of 23,103 labeled patients. Conclusions In the context of using time-series inpatient data and a careful model training design, unlabeled data can be used to improve the performance of machine learning models when labeled data for predicting ARDS development are scarce or expensive.
Collapse
Affiliation(s)
- Carson Lam
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | - Chak Foon Tso
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | | | | | - Zohora Iqbal
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | - Daniel Evans
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | - Jana Hoffman
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | - Jacob Calvert
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | - Qingqing Mao
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| | - Ritankar Das
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, US
| |
Collapse
|
4
|
Cheng T, Shuang W, Ye D, Zhang W, Yang Z, Fang W, Xu H, Gu M, Xu W, Guan C. SNHG16 promotes cell proliferation and inhibits cell apoptosis via regulation of the miR-1303-p/STARD9 axis in clear cell renal cell carcinoma. Cell Signal 2021; 84:110013. [PMID: 33901578 DOI: 10.1016/j.cellsig.2021.110013] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/20/2021] [Accepted: 04/20/2021] [Indexed: 02/06/2023]
Abstract
Clear cell renal cell carcinoma (ccRCC) is a common subtype of renal cell carcinoma (RCC) and causes many deaths. Numerous medical studies have suggested that long noncoding RNAs (lncRNAs) exert their biological functions on ccRCC. Herein, functions of lncRNA SNHG16 in ccRCC cells and the mechanism mediated by SNHG16 were investigated. The expression levels of SNHG16 and its downstream genes in ccRCC cells and RCC tissues were examined utilizing reverse transcription quantitative polymerase chain reaction analyses. Cell counting kit-8 and 5-Ethynyl-2'-deoxyuridine assays were performed to evaluate the proliferation of ccRCC cells, and flow cytometry analyses were employed to determine the apoptosis of ccRCC cells. Western blot analysis was applied to examine protein levels associated with cell proliferation and apoptosis. The combination between SNHG16 and miRNA as well as miRNA and its target gene were explored by luciferase reporter, RNA pull down, and RNA immunoprecipitation assays. The significant upregulation of SNHG16 was observed in RCC tissues and ccRCC cells. SNHG16 downregulation inhibited the proliferation and promoted the apoptosis of ccRCC cells. In addition, SNHG16 served as a competing endogenous RNA for miR-1301-3p, and STARD9 was a target gene of miR-1301-3p in ccRCC cells. SNHG16 upregulated STARD9 expression by binding with miR-1301-3p in ccRCC cells. Rescue assays validated that SNHG16 promoted ccRCC cell promotion and induced ccRCC cell apoptosis by upregulating STARD9 expression. In conclusions, SNHG16 promotes ccRCC cell proliferation and suppresses ccRCC cell apoptosis via interaction with miR-1301-3p to upregulate STARD9 expression in ccRCC cells.
Collapse
Affiliation(s)
- Tao Cheng
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China
| | - Weibing Shuang
- Department of Urology, The First Hospital of Shanxi Medical University, Taiyuan 030001, Shanxi, China
| | - Dawen Ye
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China
| | - Wenzhi Zhang
- Innoscience Research Sdn Bhd, Subang Jaya, Malaysia
| | - Zhao Yang
- Core Facility for Protein Research, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Wenge Fang
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China
| | - Haibin Xu
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China
| | - Mingli Gu
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China
| | - Weiqiang Xu
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China
| | - Chao Guan
- Department of Urology, The Second Affiliated Hospital of Bengbu Medical College, Bengbu 233000, Anhui, China..
| |
Collapse
|
5
|
Mrozek D. A review of Cloud computing technologies for comprehensive microRNA analyses. Comput Biol Chem 2020; 88:107365. [PMID: 32906056 DOI: 10.1016/j.compbiolchem.2020.107365] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 08/05/2020] [Accepted: 08/18/2020] [Indexed: 01/08/2023]
Abstract
Cloud computing revolutionized many fields that require ample computational power. Cloud platforms may also provide huge support for microRNA analysis mainly through disclosing scalable resources of different types. In Clouds, these resources are available as services, which simplifies their allocation and releasing. This feature is especially useful during the analysis of large volumes of data, like the one produced by next generation sequencing experiments, which require not only extended storage space but also a distributed computing environment. In this paper, we show which of the Cloud properties and service models can be especially beneficial for microRNA analysis. We also explain the most useful services of the Cloud (including storage space, computational power, web application hosting, machine learning models, and Big Data frameworks) that can be used for microRNA analysis. At the same time, we review several solutions for microRNA and show that the utilization of the Cloud in this field is still weak, but can increase in the future when the awareness of their applicability grows.
Collapse
Affiliation(s)
- Dariusz Mrozek
- Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.
| |
Collapse
|