1
|
Sun J, Chen Y, Bi R, Yuan Y, Yu H. Bioinformatic approaches of liquid-liquid phase separation in human disease. Chin Med J (Engl) 2024; 137:1912-1925. [PMID: 39033393 PMCID: PMC11332758 DOI: 10.1097/cm9.0000000000003249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Indexed: 07/23/2024] Open
Abstract
ABSTRACT Biomolecular aggregation within cellular environments via liquid-liquid phase separation (LLPS) spontaneously forms droplet-like structures, which play pivotal roles in diverse biological processes. These structures are closely associated with a range of diseases, including neurodegenerative disorders, cancer and infectious diseases, highlighting the significance of understanding LLPS mechanisms for elucidating disease pathogenesis, and exploring potential therapeutic interventions. In this review, we delineate recent advancements in LLPS research, emphasizing its pathological relevance, therapeutic considerations, and the pivotal role of bioinformatic tools and databases in facilitating LLPS investigations. Additionally, we undertook a comprehensive analysis of bioinformatic resources dedicated to LLPS research in order to elucidate their functionality and applicability. By providing comprehensive insights into current LLPS-related bioinformatics resources, this review highlights its implications for human health and disease.
Collapse
Affiliation(s)
- Jun Sun
- Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, Sichuan 610041, China
| | - Yilong Chen
- Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, Sichuan 610041, China
| | - Ruiye Bi
- Department of Orthognathic and TMJ Surgery, State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, Sichuan 610041, China
| | - Yong Yuan
- Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, Sichuan 610041, China
| | - Haopeng Yu
- Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
2
|
Yin Q, Chen W, Zhang C, Wei Z. A convolutional neural network model for survival prediction based on prognosis-related cascaded Wx feature selection. J Transl Med 2022; 102:1064-1074. [PMID: 35810236 DOI: 10.1038/s41374-022-00801-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/22/2022] [Accepted: 04/26/2022] [Indexed: 12/14/2022] Open
Abstract
Great advances in deep learning have provided effective solutions for prediction tasks in the biomedical field. However, accurate prognosis prediction using cancer genomics data remains challenging due to the severe overfitting problem caused by curse of dimensionality inherent to high-throughput sequencing data. Moreover, there are unique challenges to perform survival analysis, arising from the difficulty in utilizing censored samples whose events of interest are not observed. Convolutional neural network (CNN) models provide us the opportunity to extract meaningful hierarchical features to characterize cancer subtype and prognosis outcomes. On the other hand, feature selection can mitigate overfitting and reduce subsequent model training computation burden by screening out significant genes from redundant genes. To accomplish model simplification, we developed a concise and efficient survival analysis model, named CNN-Cox model, which combines a special CNN framework with prognosis-related feature selection cascaded Wx, with the advantage of less computation demand utilizing light training parameters. Experiment results show that CNN-Cox model achieved consistent higher C-index values and better survival prediction performance across seven cancer type datasets in The Cancer Genome Atlas cohort, including bladder carcinoma, head and neck squamous cell carcinoma, kidney renal cell carcinoma, brain low-grade glioma, lung adenocarcinoma (LUAD), lung squamous cell carcinoma, and skin cutaneous melanoma, compared with the existing state-of-the-art survival analysis methods. As an illustration of model interpretation, we examined potential prognostic gene signatures of LUAD dataset using the proposed CNN-Cox model. We conducted protein-protein interaction network analysis to identify potential prognostic genes and further analyzed the biological function of 13 hub genes, including ANLN, RACGAP1, KIF4A, KIF20A, KIF14, ASPM, CDK1, SPC25, NCAPG, MKI67, HJURP, EXO1, HMMR, whose high expression is significantly associated with poor survival of LUAD patients. These findings confirmed that CNN-Cox model is effective in extracting not only prognosis factors but also biologically meaningful gene features. The codes are available at the GitHub website: https://github.com/wangwangCCChen/CNN-Cox .
Collapse
Affiliation(s)
- Qingyan Yin
- School of Science, Xi'an University of Architecture and Technology, Xi'an, Shaanxi, 710055, China.
| | - Wangwang Chen
- School of Science, Xi'an University of Architecture and Technology, Xi'an, Shaanxi, 710055, China
| | - Chunxia Zhang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| |
Collapse
|
3
|
Rachid Zaim S, Kenost C, Berghout J, Chiu W, Wilson L, Zhang HH, Lussier YA. binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions. BMC Bioinformatics 2020; 21:374. [PMID: 32859146 PMCID: PMC7456085 DOI: 10.1186/s12859-020-03718-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 08/19/2020] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND In this era of data science-driven bioinformatics, machine learning research has focused on feature selection as users want more interpretation and post-hoc analyses for biomarker detection. However, when there are more features (i.e., transcripts) than samples (i.e., mice or human samples) in a study, it poses major statistical challenges in biomarker detection tasks as traditional statistical techniques are underpowered in high dimension. Second and third order interactions of these features pose a substantial combinatoric dimensional challenge. In computational biology, random forest (RF) classifiers are widely used due to their flexibility, powerful performance, their ability to rank features, and their robustness to the "P > > N" high-dimensional limitation that many matrix regression algorithms face. We propose binomialRF, a feature selection technique in RFs that provides an alternative interpretation for features using a correlated binomial distribution and scales efficiently to analyze multiway interactions. RESULTS In both simulations and validation studies using datasets from the TCGA and UCI repositories, binomialRF showed computational gains (up to 5 to 300 times faster) while maintaining competitive variable precision and recall in identifying biomarkers' main effects and interactions. In two clinical studies, the binomialRF algorithm prioritizes previously-published relevant pathological molecular mechanisms (features) with high classification precision and recall using features alone, as well as with their statistical interactions alone. CONCLUSION binomialRF extends upon previous methods for identifying interpretable features in RFs and brings them together under a correlated binomial distribution to create an efficient hypothesis testing algorithm that identifies biomarkers' main effects and interactions. Preliminary results in simulations demonstrate computational gains while retaining competitive model selection and classification accuracies. Future work will extend this framework to incorporate ontologies that provide pathway-level feature selection from gene expression input data.
Collapse
Affiliation(s)
- Samir Rachid Zaim
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA
- The Graduate Interdisciplinary Program in Statistics, The University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ, 85721, USA
- College of Medicine, Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85721, USA
| | - Colleen Kenost
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA
- College of Medicine, Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85721, USA
| | - Joanne Berghout
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA
- College of Medicine, Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85721, USA
| | - Wesley Chiu
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA
- College of Medicine, Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85721, USA
| | - Liam Wilson
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA
- College of Medicine, Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85721, USA
| | - Hao Helen Zhang
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA.
- The Graduate Interdisciplinary Program in Statistics, The University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ, 85721, USA.
- Department of Mathematics, College of Sciences, The University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ, 85721, USA.
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA.
- The Graduate Interdisciplinary Program in Statistics, The University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ, 85721, USA.
- College of Medicine, Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85721, USA.
- The Center for Applied Genetic and Genomic Medicine, 1295 N. Martin, Tucson, AZ, 85721, USA.
- The University of Arizona Cancer Center, 3838 N. Campbell Ave, Tucson, AZ, 85721, USA.
- The University of Arizona BIO5 Institute, 1657 E. Helen Street, Tucson, AZ, 85721, USA.
| |
Collapse
|