1
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
2
|
Liang S, Liu S, Song J, Lin Q, Zhao S, Li S, Li J, Liang S, Wang J. HMCDA: a novel method based on the heterogeneous graph neural network and metapath for circRNA-disease associations prediction. BMC Bioinformatics 2023; 24:335. [PMID: 37697297 PMCID: PMC10494331 DOI: 10.1186/s12859-023-05441-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 08/08/2023] [Indexed: 09/13/2023] Open
Abstract
Circular RNA (CircRNA) is a type of non-coding RNAs in which both ends are covalently linked. Researchers have demonstrated that many circRNAs can act as biomarkers of diseases. However, traditional experimental methods for circRNA-disease associations identification are labor-intensive. In this work, we propose a novel method based on the heterogeneous graph neural network and metapaths for circRNA-disease associations prediction termed as HMCDA. First, a heterogeneous graph consisting of circRNA-disease associations, circRNA-miRNA associations, miRNA-disease associations and disease-disease associations are constructed. Then, six metapaths are defined and generated according to the biomedical pathways. Afterwards, the entity content transformation, intra-metapath and inter-metapath aggregation are implemented to learn the embeddings of circRNA and disease entities. Finally, the learned embeddings are used to predict novel circRNA-disase associations. In particular, the result of extensive experiments demonstrates that HMCDA outperforms four state-of-the-art models in fivefold cross validation. In addition, our case study indicates that HMCDA has the ability to identify novel circRNA-disease associations.
Collapse
Affiliation(s)
- Shiyang Liang
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China
- Department of Internal Medicine, The No. 944 Hospital of Joint Logistic Support Force of PLA, Xiongguan Road, Jiuquan, China
| | - Siwei Liu
- Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Junliang Song
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China
| | - Qiang Lin
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China
| | - Shihong Zhao
- Department of Respiratory Medicine, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China
| | - Shuaixin Li
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China
| | - Jiahui Li
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China
| | - Shangsong Liang
- Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Jingjie Wang
- Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, Xinsi Road, Xi'an, China.
| |
Collapse
|
3
|
Chen M, Deng Y, Li Z, Ye Y, He Z. KATZNCP: a miRNA-disease association prediction model integrating KATZ algorithm and network consistency projection. BMC Bioinformatics 2023; 24:229. [PMID: 37268893 DOI: 10.1186/s12859-023-05365-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 05/26/2023] [Indexed: 06/04/2023] Open
Abstract
BACKGROUND Clinical studies have shown that miRNAs are closely related to human health. The study of potential associations between miRNAs and diseases will contribute to a profound understanding of the mechanism of disease development, as well as human disease prevention and treatment. MiRNA-disease associations predicted by computational methods are the best complement to biological experiments. RESULTS In this research, a federated computational model KATZNCP was proposed on the basis of the KATZ algorithm and network consistency projection to infer the potential miRNA-disease associations. In KATZNCP, a heterogeneous network was initially constructed by integrating the known miRNA-disease association, integrated miRNA similarities, and integrated disease similarities; then, the KATZ algorithm was implemented in the heterogeneous network to obtain the estimated miRNA-disease prediction scores. Finally, the precise scores were obtained by the network consistency projection method as the final prediction results. KATZNCP achieved the reliable predictive performance in leave-one-out cross-validation (LOOCV) with an AUC value of 0.9325, which was better than the state-of-the-art comparable algorithms. Furthermore, case studies of lung neoplasms and esophageal neoplasms demonstrated the excellent predictive performance of KATZNCP. CONCLUSION A new computational model KATZNCP was proposed for predicting potential miRNA-drug associations based on KATZ and network consistency projections, which can effectively predict the potential miRNA-disease interactions. Therefore, KATZNCP can be used to provide guidance for future experiments.
Collapse
Affiliation(s)
- Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Yingwei Deng
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China.
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Yifan Ye
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Ziyi He
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| |
Collapse
|
4
|
Xie X, Wang Y, He K, Sheng N. Predicting miRNA-disease associations based on PPMI and attention network. BMC Bioinformatics 2023; 24:113. [PMID: 36959547 PMCID: PMC10037801 DOI: 10.1186/s12859-023-05152-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/17/2023] [Indexed: 03/25/2023] Open
Abstract
BACKGROUND With the development of biotechnology and the accumulation of theories, many studies have found that microRNAs (miRNAs) play an important role in various diseases. Uncovering the potential associations between miRNAs and diseases is helpful to better understand the pathogenesis of complex diseases. However, traditional biological experiments are expensive and time-consuming. Therefore, it is necessary to develop more efficient computational methods for exploring underlying disease-related miRNAs. RESULTS In this paper, we present a new computational method based on positive point-wise mutual information (PPMI) and attention network to predict miRNA-disease associations (MDAs), called PATMDA. Firstly, we construct the heterogeneous MDA network and multiple similarity networks of miRNAs and diseases. Secondly, we respectively perform random walk with restart and PPMI on different similarity network views to get multi-order proximity features and then obtain high-order proximity representations of miRNAs and diseases by applying the convolutional neural network to fuse the learned proximity features. Then, we design an attention network with neural aggregation to integrate the representations of a node and its heterogeneous neighbor nodes according to the MDA network. Finally, an inner product decoder is adopted to calculate the relationship scores between miRNAs and diseases. CONCLUSIONS PATMDA achieves superior performance over the six state-of-the-art methods with the area under the receiver operating characteristic curve of 0.933 and 0.946 on the HMDD v2.0 and HMDD v3.2 datasets, respectively. The case studies further demonstrate the validity of PATMDA for discovering novel disease-associated miRNAs.
Collapse
Affiliation(s)
- Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
5
|
Ren J, Jin H, Zhu Y. The Role of Placental Non-Coding RNAs in Adverse Pregnancy Outcomes. Int J Mol Sci 2023; 24:ijms24055030. [PMID: 36902459 PMCID: PMC10003511 DOI: 10.3390/ijms24055030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 02/16/2023] [Accepted: 02/23/2023] [Indexed: 03/08/2023] Open
Abstract
Non-coding RNAs (ncRNAs) are transcribed from the genome and do not encode proteins. In recent years, ncRNAs have attracted increasing attention as critical participants in gene regulation and disease pathogenesis. Different categories of ncRNAs, which mainly include microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), are involved in the progression of pregnancy, while abnormal expression of placental ncRNAs impacts the onset and development of adverse pregnancy outcomes (APOs). Therefore, we reviewed the current status of research on placental ncRNAs and APOs to further understand the regulatory mechanisms of placental ncRNAs, which provides a new perspective for treating and preventing related diseases.
Collapse
Affiliation(s)
- Jiawen Ren
- Department of Maternal, Child and Adolescent Health, School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- MOE Key Laboratory of Population Health Across Life Cycle, School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- Anhui Provincial Key Laboratory of Population Health and Aristogenics, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei 230032, China
| | - Heyue Jin
- Department of Maternal, Child and Adolescent Health, School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- MOE Key Laboratory of Population Health Across Life Cycle, School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- Anhui Provincial Key Laboratory of Population Health and Aristogenics, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei 230032, China
| | - Yumin Zhu
- Department of Maternal, Child and Adolescent Health, School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- MOE Key Laboratory of Population Health Across Life Cycle, School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- Anhui Provincial Key Laboratory of Population Health and Aristogenics, Anhui Medical University, No 81 Meishan Road, Hefei 230032, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei 230032, China
- Correspondence:
| |
Collapse
|
6
|
Ershadi MM, Rise ZR. Fusing clinical and image data for detecting the severity level of hospitalized symptomatic COVID-19 patients using hierarchical model. RESEARCH ON BIOMEDICAL ENGINEERING 2023; 39:209-232. [PMCID: PMC9957693 DOI: 10.1007/s42600-023-00268-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 02/08/2023] [Indexed: 02/05/2024]
Abstract
Purpose Based on medical reports, it is hard to find levels of different hospitalized symptomatic COVID-19 patients according to their features in a short time. Besides, there are common and special features for COVID-19 patients at different levels based on physicians’ knowledge that make diagnosis difficult. For this purpose, a hierarchical model is proposed in this paper based on experts’ knowledge, fuzzy C-mean (FCM) clustering, and adaptive neuro-fuzzy inference system (ANFIS) classifier. Methods Experts considered a special set of features for different groups of COVID-19 patients to find their treatment plans. Accordingly, the structure of the proposed hierarchical model is designed based on experts’ knowledge. In the proposed model, we applied clustering methods to patients’ data to determine some clusters. Then, we learn classifiers for each cluster in a hierarchical model. Regarding different common and special features of patients, FCM is considered for the clustering method. Besides, ANFIS had better performances than other classification methods. Therefore, FCM and ANFIS were considered to design the proposed hierarchical model. FCM finds the membership degree of each patient’s data based on common and special features of different clusters to reinforce the ANFIS classifier. Next, ANFIS identifies the need of hospitalized symptomatic COVID-19 patients to ICU and to find whether or not they are in the end-stage (mortality target class). Two real datasets about COVID-19 patients are analyzed in this paper using the proposed model. One of these datasets had only clinical features and another dataset had both clinical and image features. Therefore, some appropriate features are extracted using some image processing and deep learning methods. Results According to the results and statistical test, the proposed model has the best performance among other utilized classifiers. Its accuracies based on clinical features of the first and second datasets are 92% and 90% to find the ICU target class. Extracted features of image data increase the accuracy by 94%. Conclusion The accuracy of this model is even better for detecting the mortality target class among different classifiers in this paper and the literature review. Besides, this model is compatible with utilized datasets about COVID-19 patients based on clinical data and both clinical and image data, as well. Highlights • A new hierarchical model is proposed using ANFIS classifiers and FCM clustering method in this paper. Its structure is designed based on experts’ knowledge and real medical process. FCM reinforces the ANFIS classification learning phase based on the features of COVID-19 patients. • Two real datasets about COVID-19 patients are studied in this paper. One of these datasets has both clinical and image data. Therefore, appropriate features are extracted based on its image data and considered with available meaningful clinical data. Different levels of hospitalized symptomatic COVID-19 patients are considered in this paper including the need of patients to ICU and whether or not they are in end-stage. • Well-known classification methods including case-based reasoning (CBR), decision tree, convolutional neural networks (CNN), K-nearest neighbors (KNN), learning vector quantization (LVQ), multi-layer perceptron (MLP), Naive Bayes (NB), radial basis function network (RBF), support vector machine (SVM), recurrent neural networks (RNN), fuzzy type-I inference system (FIS), and adaptive neuro-fuzzy inference system (ANFIS) are designed for these datasets and their results are analyzed for different random groups of the train and test data; • According to unbalanced utilized datasets, different performances of classifiers including accuracy, sensitivity, specificity, precision, F-score, and G-mean are compared to find the best classifier. ANFIS classifiers have the best results for both datasets. • To reduce the computational time, the effects of the Principal Component Analysis (PCA) feature reduction method are studied on the performances of the proposed model and classifiers. According to the results and statistical test, the proposed hierarchical model has the best performances among other utilized classifiers. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1007/s42600-023-00268-w.
Collapse
Affiliation(s)
- Mohammad Mahdi Ershadi
- Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, Tehran, 1591634311 Iran
| | - Zeinab Rahimi Rise
- Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, Tehran, 1591634311 Iran
| |
Collapse
|
7
|
Ouyang D, Liang Y, Wang J, Liu X, Xie S, Miao R, Ai N, Li L, Dang Q. Predicting multiple types of miRNA-disease associations using adaptive weighted nonnegative tensor factorization with self-paced learning and hypergraph regularization. Brief Bioinform 2022; 23:6720405. [PMID: 36168938 DOI: 10.1093/bib/bbac390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
More and more evidence indicates that the dysregulations of microRNAs (miRNAs) lead to diseases through various kinds of underlying mechanisms. Identifying the multiple types of disease-related miRNAs plays an important role in studying the molecular mechanism of miRNAs in diseases. Moreover, compared with traditional biological experiments, computational models are time-saving and cost-minimized. However, most tensor-based computational models still face three main challenges: (i) easy to fall into bad local minima; (ii) preservation of high-order relations; (iii) false-negative samples. To this end, we propose a novel tensor completion framework integrating self-paced learning, hypergraph regularization and adaptive weight tensor into nonnegative tensor factorization, called SPLDHyperAWNTF, for the discovery of potential multiple types of miRNA-disease associations. We first combine self-paced learning with nonnegative tensor factorization to effectively alleviate the model from falling into bad local minima. Then, hypergraphs for miRNAs and diseases are constructed, and hypergraph regularization is used to preserve the high-order complex relations of these hypergraphs. Finally, we innovatively introduce adaptive weight tensor, which can effectively alleviate the impact of false-negative samples on the prediction performance. The average results of 5-fold and 10-fold cross-validation on four datasets show that SPLDHyperAWNTF can achieve better prediction performance than baseline models in terms of Top-1 precision, Top-1 recall and Top-1 F1. Furthermore, we implement case studies to further evaluate the accuracy of SPLDHyperAWNTF. As a result, 98 (MDAv2.0) and 98 (MDAv2.0-2) of top-100 are confirmed by HMDDv3.2 dataset. Moreover, the results of enrichment analysis illustrate that unconfirmed potential associations have biological significance.
Collapse
Affiliation(s)
- Dong Ouyang
- Peng Cheng Laboratory, Shenzhen 518055, China.,School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Jianjun Wang
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai 519090, China
| | - Shengli Xie
- Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510000, China
| | - Rui Miao
- Basic Teaching Department, ZhuHai Campus of ZunYi Medical University, Zhuhai 519090, China
| | - Ning Ai
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| | - Le Li
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| | - Qi Dang
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| |
Collapse
|
8
|
Abdel-Hafiz M, Najafi M, Helmi S, Pratte KA, Zhuang Y, Liu W, Kechris KJ, Bowler RP, Lange L, Banaei-Kashani F. Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification. Front Big Data 2022; 5:894632. [PMID: 35811829 PMCID: PMC9256965 DOI: 10.3389/fdata.2022.894632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 05/27/2022] [Indexed: 01/21/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is one of the leading causes of death in the United States. COPD represents one of many areas of research where identifying complex pathways and networks of interacting biomarkers is an important avenue toward studying disease progression and potentially discovering cures. Recently, sparse multiple canonical correlation network analysis (SmCCNet) was developed to identify complex relationships between omics associated with a disease phenotype, such as lung function. SmCCNet uses two sets of omics datasets and an associated output phenotypes to generate a multi-omics graph, which can then be used to explore relationships between omics in the context of a disease. Detecting significant subgraphs within this multi-omics network, i.e., subgraphs which exhibit high correlation to a disease phenotype and high inter-connectivity, can help clinicians identify complex biological relationships involved in disease progression. The current approach to identifying significant subgraphs relies on hierarchical clustering, which can be used to inform clinicians about important pathways involved in the disease or phenotype of interest. The reliance on a hierarchical clustering approach can hinder subgraph quality by biasing toward finding more compact subgraphs and removing larger significant subgraphs. This study aims to introduce new significant subgraph detection techniques. In particular, we introduce two subgraph detection methods, dubbed Correlated PageRank and Correlated Louvain, by extending the Personalized PageRank Clustering and Louvain algorithms, as well as a hybrid approach combining the two proposed methods, and compare them to the hierarchical method currently in use. The proposed methods show significant improvement in the quality of the subgraphs produced when compared to the current state of the art.
Collapse
Affiliation(s)
- Mohamed Abdel-Hafiz
- Big Data Management and Mining Laboratory, Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States,*Correspondence: Mohamed Abdel-Hafiz
| | - Mesbah Najafi
- Department of Mathematics, College of Liberal Arts and Sciences, University of Colorado Denver, Denver, CO, United States
| | - Shahab Helmi
- Big Data Management and Mining Laboratory, Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States
| | | | - Yonghua Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Weixuan Liu
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Russell P. Bowler
- National Jewish Health, Denver, CO, United States,School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Leslie Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Farnoush Banaei-Kashani
- Big Data Management and Mining Laboratory, Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States
| |
Collapse
|
9
|
BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent. Data Min Knowl Discov 2021. [DOI: 10.1007/s10618-021-00787-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractMatrix tri-factorization subject to binary constraints is a versatile and powerful framework for the simultaneous clustering of observations and features, also known as biclustering. Applications for biclustering encompass the clustering of high-dimensional data and explorative data mining, where the selection of the most important features is relevant. Unfortunately, due to the lack of suitable methods for the optimization subject to binary constraints, the powerful framework of biclustering is typically constrained to clusterings which partition the set of observations or features. As a result, overlap between clusters cannot be modelled and every item, even outliers in the data, have to be assigned to exactly one cluster. In this paper we propose Broccoli, an optimization scheme for matrix factorization subject to binary constraints, which is based on the theoretically well-founded optimization scheme of proximal stochastic gradient descent. Thereby, we do not impose any restrictions on the obtained clusters. Our experimental evaluation, performed on both synthetic and real-world data, and against 6 competitor algorithms, show reliable and competitive performance, even in presence of a high amount of noise in the data. Moreover, a qualitative analysis of the identified clusters shows that Broccoli may provide meaningful and interpretable clustering structures.
Collapse
|
10
|
Deshmukh PR, Phalnikar R. Information extraction for prognostic stage prediction from breast cancer medical records using NLP and ML. Med Biol Eng Comput 2021; 59:1751-1772. [PMID: 34297300 DOI: 10.1007/s11517-021-02399-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 07/01/2021] [Indexed: 11/24/2022]
Abstract
For cancer prediction, the prognostic stage is the main factor that helps medical experts to decide the optimal treatment for a patient. Specialists study prognostic stage information from medical reports, often in an unstructured form, and take a larger review time. The main objective of this study is to suggest a generic clinical decision-unifying staging method to extract the most reliable prognostic stage information of breast cancer from medical records of various health institutions. Additional prognostic elements should be extracted from medical reports to identify the cancer stage for getting an exact measure of cancer and improving care quality. This study has collected 465 pathological and clinical reports of breast cancer sufferers from India's reputed medical institutions. The unstructured records were found distinct from each institute. Anatomic and biologic factors are extracted from medical records using the natural language processing, machine learning and rule-based method for prognostic stage detection. This study has extracted anatomic stage, grade, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) from medical reports with high accuracy and predicted prognostic stage for both regions. The prognostic stage prediction's average accuracy is found 92% and 82% in rural and urban areas, respectively. It was essential to combine biological and anatomical elements under a single prognostic staging method. A generic clinical decision-unifying staging method for prognostic stage detection with great accuracy in various institutions of different regional areas suggests that the proposed research improves the prognosis of breast cancer.
Collapse
Affiliation(s)
- Pratiksha R Deshmukh
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India, 411029. .,Department of Computer Engineering and Information Technology, College of Engineering, Pune, 411005, India.
| | - Rashmi Phalnikar
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India, 411029
| |
Collapse
|
11
|
Yan H, Chai H, Zhao H. Detecting lncRNA-Cancer Associations by Combining miRNAs, Genes, and Prognosis With Matrix Factorization. Front Genet 2021; 12:639872. [PMID: 34262591 PMCID: PMC8273282 DOI: 10.3389/fgene.2021.639872] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 04/15/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation: Long non-coding RNAs (lncRNAs) play important roles in cancer development. Prediction of lncRNA–cancer association is necessary for efficiently discovering biomarkers and designing treatment for cancers. Currently, several methods have been developed to predict lncRNA–cancer associations. However, most of them do not consider the relationships between lncRNA with other molecules and with cancer prognosis, which has limited the accuracy of the prediction. Method: Here, we constructed relationship matrices between 1,679 lncRNAs, 2,759 miRNAs, and 16,410 genes and cancer prognosis on three types of cancers (breast, lung, and colorectal cancers) to predict lncRNA–cancer associations. The matrices were iteratively reconstructed by matrix factorization to optimize low-rank size. This method is called detecting lncRNA cancer association (DRACA). Results: Application of this method in the prediction of lncRNAs–breast cancer, lncRNA–lung cancer, and lncRNA–colorectal cancer associations achieved an area under curve (AUC) of 0.810, 0.796, and 0.795, respectively, by 10-fold cross-validations. The performances of DRACA in predicting associations between lncRNAs with three kinds of cancers were at least 6.6, 7.2, and 6.9% better than other methods, respectively. To our knowledge, this is the first method employing cancer prognosis in the prediction of lncRNA–cancer associations. When removing the relationships between cancer prognosis and genes, the AUCs were decreased 7.2, 0.6, and 5% for breast, lung, and colorectal cancers, respectively. Moreover, the predicted lncRNAs were found with greater numbers of somatic mutations than the lncRNAs not predicted as cancer-associated for three types of cancers. DRACA predicted many novel lncRNAs, whose expressions were found to be related to survival rates of patients. The method is available at https://github.com/Yanh35/DRACA.
Collapse
Affiliation(s)
- Huan Yan
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital, Guangzhou, China.,Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| | - Hua Chai
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital, Guangzhou, China.,Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| |
Collapse
|
12
|
Kanimozhi N, Singaravel G. Hybrid artificial fish particle swarm optimizer and kernel extreme learning machine for type-II diabetes predictive model. Med Biol Eng Comput 2021; 59:841-867. [PMID: 33738640 DOI: 10.1007/s11517-021-02333-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 02/03/2021] [Indexed: 10/21/2022]
Abstract
The World Health Organization (WHO) estimated that in 2016, 1.6 million deaths caused were due to diabetes. Precise and on-time diagnosis of type-II diabetes is crucial to reduce the risk of various diseases such as heart disease, stroke, kidney disease, diabetic retinopathy, diabetic neuropathy, and macrovascular problems. The non-invasive methods like machine learning are reliable and efficient in classifying the people subjected to type-II diabetics risk and healthy people into two different categories. This present study aims to develop a stacking-based integrated kernel extreme learning machine (KELM) model for identifying the risk of type-II diabetic patients based on the follow-up time on the diabetes research center dataset. The Pima Indian Diabetic Dataset (PIDD) and a Diabetic Research Center dataset are used in this study. A min-max normalization is used to preprocess the noisy datasets. The Hybrid Particle Swarm Optimization-Artificial Fish Swarm Optimization (HAFPSO) algorithm used satisfies the multi-objective problem by increasing the Classification Accuracy (CA) and decreasing the kernel complexity of the optimal learners (NBC) selected. At last, the model is integrated by utilizing the KELM as a meta-classifier which combines the predictions of the twenty Base Learners as a whole. The proposed classification method helps the clinicians to predict the patients who are at a high risk of type-II diabetes in the future with the highest accuracy of 98.5%. The proposed method is tested with different measures such as accuracy, sensitivity, specificity, Mathews Correlation Coefficient, and Kappa Statistics are calculated. The results obtained show that the KELM-HAFPSO approach is a promising new tool for identifying type-II diabetes.
Collapse
Affiliation(s)
- N Kanimozhi
- Department of Computer Science and Engineering, GKM College of Engineering and Technology, Chennai, India.
| | - G Singaravel
- Department of Information Technology, K S Rangasamy College of Engineering, Tiruchengode, India
| |
Collapse
|
13
|
Dogu E, Albayrak YE, Tuncay E. Length of hospital stay prediction with an integrated approach of statistical-based fuzzy cognitive maps and artificial neural networks. Med Biol Eng Comput 2021; 59:483-496. [PMID: 33544271 DOI: 10.1007/s11517-021-02327-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 01/24/2021] [Indexed: 10/22/2022]
Abstract
Chronic obstructive pulmonary disease (COPD) is a global burden, which is estimated to be the third leading cause of death worldwide by 2030. The economic burden of COPD grows continuously because it is not a curable disease. These conditions make COPD an important research field of artificial intelligence (AI) techniques in medicine. In this study, an integrated approach of the statistical-based fuzzy cognitive maps (SBFCM) and artificial neural networks (ANN) is proposed for predicting length of hospital stay of patients with COPD, who admitted to the hospital with an acute exacerbation. The SBFCM method is developed to determine the input variables of the ANN model. The SBFCM conducts statistical analysis to prepare preliminary information for the experts and then collects expert opinions accordingly, to define a conceptual map of the system. The integration of SBFCM and ANN methods provides both statistical data and expert opinion in the prediction model. In the numerical application, the proposed approach outperformed the conventional approach and other machine learning algorithms with 79.95% accuracy, revealing the power of expert opinion involvement in medical decisions. A medical decision support framework is constructed for better prediction of length of hospital stay and more effective hospital management.
Collapse
Affiliation(s)
- Elif Dogu
- Industrial Engineering Dept., Galatasaray University, Ciragan Cad. No.: 36, Ortakoy, 34349, Istanbul, Turkey.
| | - Y Esra Albayrak
- Industrial Engineering Dept., Galatasaray University, Ciragan Cad. No.: 36, Ortakoy, 34349, Istanbul, Turkey
| | - Esin Tuncay
- Yedikule Chest Diseases & Thoracic Surgery Training & Research Hospital, Belgrad Kapi Yolu Cad. No.: 1 34020 Zeytinburnu, Istanbul, Turkey
| |
Collapse
|
14
|
Das J, Barman Mandal S. Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping. Med Biol Eng Comput 2021; 59:673-691. [PMID: 33595791 DOI: 10.1007/s11517-021-02324-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 01/18/2021] [Indexed: 11/25/2022]
Abstract
Classification of Homo sapiens gene behavior employing computational biology is a recent research trend. But monitoring gene activity profile and genetic behavior from the alphabetic DNA sequence using a non-invasive method is a tremendous challenge in functional genomics. The present paper addresses such issue and attempts to differentiate Homo sapiens genes using linear discriminant analysis (LDA) method. Annotated protein coding sequences of Homo sapiens genes, collected from NCBI, are taken as test samples. Minimum entropy-based mapping (MEM) technique assists to extract highest information from the numerical DNA sequences. The proposed LDA technique has successfully classified Homo sapiens genes based on the following features: composition of hydrophilic amino acids, dominance of arginine amino acid, and magnitude and size of individual amino acids. The proposed algorithm is successfully tested on 84 Homo sapiens healthy and cancer genes of the prostate and breast cells. Classification performance of the proposed LDA technique is judged by sensitivity (89.12%), specificity (91.9%), accuracy (90.87%), F1 score (92.03%), Matthews' correlation coefficients (81.04%), and miss rate (9.12%), and it outperforms other four existing classifiers. The results are cross-validated through Rayleigh PDF and mutual information technique. Fisher test, 2-sample T-test, and relative entropy test are considered to verify the efficacy of the present classifier.
Collapse
Affiliation(s)
- Joyshri Das
- Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India
| | - Soma Barman Mandal
- Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India
| |
Collapse
|
15
|
Tonkovic P, Kalajdziski S, Zdravevski E, Lameski P, Corizzo R, Pires IM, Garcia NM, Loncar-Turukalo T, Trajkovik V. Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review. BIOLOGY 2020; 9:E453. [PMID: 33316921 PMCID: PMC7763105 DOI: 10.3390/biology9120453] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 11/30/2020] [Accepted: 12/03/2020] [Indexed: 12/12/2022]
Abstract
Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008-2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries' search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.
Collapse
Affiliation(s)
- Petar Tonkovic
- Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia; (S.K.); (E.Z.); (P.L.); (V.T.)
| | - Slobodan Kalajdziski
- Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia; (S.K.); (E.Z.); (P.L.); (V.T.)
| | - Eftim Zdravevski
- Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia; (S.K.); (E.Z.); (P.L.); (V.T.)
| | - Petre Lameski
- Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia; (S.K.); (E.Z.); (P.L.); (V.T.)
| | - Roberto Corizzo
- Department of Computer Science, American University, Washington, DC 20016, USA;
| | - Ivan Miguel Pires
- Instituto de Telecomunicações, Universidade da Beira Interior, 6200-001 Covilhã, Portugal; (I.M.P.); (N.M.G.)
- Computer Science Department, Polytechnic Institute of Viseu, 3504-510 Viseu, Portugal
- Health Sciences Research Unit: Nursing, School of Health, Polytechnic Institute of Viseu, 3504-510 Viseu, Portugal
| | - Nuno M. Garcia
- Instituto de Telecomunicações, Universidade da Beira Interior, 6200-001 Covilhã, Portugal; (I.M.P.); (N.M.G.)
| | | | - Vladimir Trajkovik
- Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia; (S.K.); (E.Z.); (P.L.); (V.T.)
| |
Collapse
|
16
|
Li J, Zhang X, Liu C. The computational approaches of lncRNA identification based on coding potential: Status quo and challenges. Comput Struct Biotechnol J 2020; 18:3666-3677. [PMID: 33304463 PMCID: PMC7710504 DOI: 10.1016/j.csbj.2020.11.030] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/15/2020] [Accepted: 11/16/2020] [Indexed: 12/13/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.
Collapse
Affiliation(s)
- Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Xuan Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| |
Collapse
|
17
|
Cong H, Liu H, Chen Y, Cao Y. Self-evoluting framework of deep convolutional neural network for multilocus protein subcellular localization. Med Biol Eng Comput 2020; 58:3017-3038. [PMID: 33078303 DOI: 10.1007/s11517-020-02275-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 10/14/2020] [Indexed: 12/12/2022]
Abstract
In the present paper, deep convolutional neural network (DCNN) is applied to multilocus protein subcellular localization as it is more suitable for multi-class classification. There are two main problems with this application. First, the appropriate features for correlation between multiple sites are hard to find. Second, the classifier structure is difficult to determine as it is greatly affected by the distribution of classified data. To solve these problems, a self-evoluting framework using DCNNs for multilocus protein subcellular localization is proposed. It has three characteristics that the previous algorithms do not. The first is that it combines the ant colony algorithm with the DCNN to form a self-evoluting algorithm for multilocus protein subcellular localization. The second is that it randomly groups subcellular sites using a limited random k-labelsets multi-label classification method. It also solves complex problems in a divide-and-conquer approach and proposes a flexible expansion model. The third is that it realizes the random selection feature extraction method in the positioning process and avoids the defects in individual feature extraction methods. The algorithm in the present paper is tested on the human database, and the overall correct rate is 67.17%, which is higher than that for the stacked self-encoder (SAE), support vector machine (SVM), random forest classifier (RF), or single deep convolutional neural network.Graphical abstract The algorithm mentioned in the present paper mainly includes four parts. They are protein sequence data preprocessing, integrated DCNN model construction, finding optimal DCNN combination by ant colony optimization, and protein subcellular localization for sequences. These parts are sequential relationships and the data obtained in the previous part is the basis for the latter part of the function. In the part of data preprocessing, the limited RAkEL multi-label classification method is used to randomly group subcellular sites. At the same time, the feature fusion of protein sequences is carried out by using multiple feature extraction methods. Each combination including features and sites information corresponds to a DCNN model. In the part of finding optimal DCNN combination by ant colony optimization, the main purpose is to find the best combination of DCNN models through the global optimization ability of the ant colony algorithm. The positioning of sequences is mainly to obtain multilocus subcellular localization by the optimal model combination.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China.,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China. .,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| |
Collapse
|
18
|
Blumenberg L, Ruggles KV. Hypercluster: a flexible tool for parallelized unsupervised clustering optimization. BMC Bioinformatics 2020; 21:428. [PMID: 32993491 PMCID: PMC7525959 DOI: 10.1186/s12859-020-03774-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 09/22/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Unsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. RESULTS We present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Users can efficiently evaluate a huge range of clustering results from multiple models and hyperparameters to identify an optimal model. CONCLUSIONS Hypercluster improves ease of use, robustness and reproducibility for unsupervised clustering application for high throughput biology. Hypercluster is available on pip and bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster .
Collapse
Affiliation(s)
- Lili Blumenberg
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Kelly V. Ruggles
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| |
Collapse
|
19
|
Avuçlu E, Elen A. Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements. Med Biol Eng Comput 2020; 58:2775-2788. [PMID: 32920727 DOI: 10.1007/s11517-020-02260-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/29/2020] [Indexed: 01/23/2023]
Abstract
Parkinson's disease is a neurological disorder that causes partial or complete loss of motor reflexes and speech and affects thinking, behavior, and other vital functions affecting the nervous system. Parkinson's disease causes impaired speech and motor abilities (writing, balance, etc.) in about 90% of patients and is often seen in older people. Some signs (deterioration of vocal cords) in medical voice recordings from Parkinson's patients are used to diagnose this disease. The database used in this study contains biomedical speech voice from 31 people of different age and sex related to this disease. The performance comparison of the machine learning algorithms k-Nearest Neighborhood (k-NN), Random Forest, Naive Bayes, and Support Vector Machine classifiers was performed with the used database. Moreover, the best classifier was determined for the diagnosis of Parkinson's disease. Eleven different training and test data (45 × 55, 50 × 50, 55 × 45, 60 × 40, 65 × 35, 70 × 30, 75 × 25, 80 × 20, 85 × 15, 90 × 10, 95 × 5) were processed separately. The data obtained from these training and tests were compared with statistical measurements. The training results of the k-NN classification algorithm were generally 100% successful. The best test result was obtained from Random Forest classifier with 85.81%. All statistical results and measured values are given in detail in the experimental studies section.Graphical abstract.
Collapse
Affiliation(s)
- Emre Avuçlu
- Department of Computer Technology, Aksaray University, Aksaray, Turkey.
| | - Abdullah Elen
- Department of Computer Technology, Karabuk University, Karabuk, Turkey
| |
Collapse
|
20
|
Roy T, Bhattacharjee P. Performance analysis of melanoma classifier using electrical modeling technique. Med Biol Eng Comput 2020; 58:2443-2454. [PMID: 32770290 DOI: 10.1007/s11517-020-02241-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 07/27/2020] [Indexed: 11/25/2022]
Abstract
An efficient and novel modeling approach is proposed in this paper for identifying proteins or genes involved in melanoma skin cancer. Two types of classifiers are modeled, based on the chemical structure and hydropathy property of amino acids. These classifiers are further implemented using NI LabVIEW-based hardware kit to observe the real-time response for proper diagnosis. The phase responses, pole-zero diagrams, and transient responses are examined to screen out the genes related to melanoma from healthy genes. The performance of the proposed classifier is measured using various performance measurement metrics in terms of accuracy, sensitivity, specificity, etc. The classifier is experimented along with a color code scheme on skin genes and illustrates the superiority in comparison with traditional methods by achieving 94% of classification accuracy with 96% of sensitivity.Graphical abstract An equivalent electrical model is developed for designing melanoma classifier. Initially, each amino acid is modeled using the RC passive circuit depending on their physicochemical structure and hydropathy nature, to form a gene structure model. The melanoma-related genes are detected by phase, transient, and color code analysis.
Collapse
Affiliation(s)
- Tanusree Roy
- Department of Electrical and Electronics Engineering, University of Engineering and Management, Kolkata, 700135, India.
| | - Pranabesh Bhattacharjee
- Department of Electrical and Electronics Engineering, University of Engineering and Management, Kolkata, 700135, India
| |
Collapse
|
21
|
Appice A, Tsoumakas G, Manolopoulos Y, Matwin S. Generating Explainable and Effective Data Descriptors Using Relational Learning: Application to Cancer Biology. DISCOVERY SCIENCE 2020. [PMCID: PMC7556385 DOI: 10.1007/978-3-030-61527-7_25] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The key to success in machine learning is the use of effective data representations. The success of deep neural networks (DNNs) is based on their ability to utilize multiple neural network layers, and big data, to learn how to convert simple input representations into richer internal representations that are effective for learning. However, these internal representations are sub-symbolic and difficult to explain. In many scientific problems explainable models are required, and the input data is semantically complex and unsuitable for DNNs. This is true in the fundamental problem of understanding the mechanism of cancer drugs, which requires complex background knowledge about the functions of genes/proteins, their cells, and the molecular structure of the drugs. This background knowledge cannot be compactly expressed propositionally, and requires at least the expressive power of Datalog. Here we demonstrate the use of relational learning to generate new data descriptors in such semantically complex background knowledge. These new descriptors are effective: adding them to standard propositional learning methods significantly improves prediction accuracy. They are also explainable, and add to our understanding of cancer. Our approach can readily be expanded to include other complex forms of background knowledge, and combines the generality of relational learning with the efficiency of standard propositional learning.
Collapse
|