1
|
Kolbinger FR, Veldhuizen GP, Zhu J, Truhn D, Kather JN. Reporting guidelines in medical artificial intelligence: a systematic review and meta-analysis. COMMUNICATIONS MEDICINE 2024; 4:71. [PMID: 38605106 PMCID: PMC11009315 DOI: 10.1038/s43856-024-00492-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND The field of Artificial Intelligence (AI) holds transformative potential in medicine. However, the lack of universal reporting guidelines poses challenges in ensuring the validity and reproducibility of published research studies in this field. METHODS Based on a systematic review of academic publications and reporting standards demanded by both international consortia and regulatory stakeholders as well as leading journals in the fields of medicine and medical informatics, 26 reporting guidelines published between 2009 and 2023 were included in this analysis. Guidelines were stratified by breadth (general or specific to medical fields), underlying consensus quality, and target research phase (preclinical, translational, clinical) and subsequently analyzed regarding the overlap and variations in guideline items. RESULTS AI reporting guidelines for medical research vary with respect to the quality of the underlying consensus process, breadth, and target research phase. Some guideline items such as reporting of study design and model performance recur across guidelines, whereas other items are specific to particular fields and research stages. CONCLUSIONS Our analysis highlights the importance of reporting guidelines in clinical AI research and underscores the need for common standards that address the identified variations and gaps in current guidelines. Overall, this comprehensive overview could help researchers and public stakeholders reinforce quality standards for increased reliability, reproducibility, clinical validity, and public trust in AI research in healthcare. This could facilitate the safe, effective, and ethical translation of AI methods into clinical applications that will ultimately improve patient outcomes.
Collapse
Grants
- UM1 TR004402 NCATS NIH HHS
- JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111) and the Max-Eder-Programme of the German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (Transplant.KI, 01VSF21048) the European Union (ODELIA, 101057091; GENIAL, 101096312) and the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre.
Collapse
Affiliation(s)
- Fiona R Kolbinger
- Else Kroener Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA
- Regenstrief Center for Healthcare Engineering, Purdue University, West Lafayette, IN, USA
- Department of Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health, Indiana University, Indianapolis, IN, USA
- Indiana University Simon Comprehensive Cancer Center, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Gregory P Veldhuizen
- Else Kroener Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | - Jiefu Zhu
- Else Kroener Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
- Department of Medicine I, University Hospital Dresden, Dresden, Germany.
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
2
|
Pirompud P, Sivapirunthep P, Punyapornwithaya V, Chaosap C. Application of machine learning algorithms to predict dead on arrival of broiler chickens raised without antibiotic program. Poult Sci 2024; 103:103504. [PMID: 38335671 PMCID: PMC10864801 DOI: 10.1016/j.psj.2024.103504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/20/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the factors of dead-on-arrival (DOA) incidents during pre-slaughter handling is crucial for informed decision-making, improving broiler welfare, and optimizing farm profitability. In this study, 3 different machine learning (ML) algorithms - least absolute shrinkage and selection operator (LASSO), classification tree (CT), and random forest (RF) - were used together with 4 sampling techniques to optimize imbalanced data. The dataset comes from 22,115 broiler truckloads from a large producer in Thailand (2021-2022) and includes 14 independent variables covering the rearing, catching, and transportation stages. The study focuses on DOA% in the range of 0.10 to 1.20%, with a threshold for high DOA% above 0.3%, and records DOA% per truckload during pre-slaughter ante-mortem inspection. With a high DOA rate of 25.2%, the imbalanced dataset prompts the implementation of 4 methods to tune the imbalance parameters: random over sampling (ROS), random under sampling (RUS), both sampling (BOTH), and synthetic sampling or random over sampling example (ROSE). The aim is to improve the performance of the prediction model in classifying and predicting high DOA%. The comparative analysis of the different error metrics shows that RF outperforms the other models in a balanced dataset. In particular, RUS shows a significant improvement in prediction performance across all models compared to the original unbalanced dataset. The identification of the 4 most important variables for predicting high DOA percentages - mortality and culling rate, rearing stocking density, season, and mean body weight - emphasizes their importance for broiler production. This study provides valuable insights into the prediction of DOA status using an ML approach and contributes to the development of more effective strategies to mitigate high DOA percentages in commercial broiler production.
Collapse
Affiliation(s)
- Pranee Pirompud
- Doctoral Program in Innovative Tropical Agriculture, Department of Agricultural Education, Faculty of Industrial Education and Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Panneepa Sivapirunthep
- Department of Agricultural Education, Faculty of Industrial Education and Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| | - Veerasak Punyapornwithaya
- Research Center for Veterinary Biosciences and Veterinary Public Health, Faculty of Veterinary Medicine, Chiang Mai University, Chiang Mai 50100, Thailand
| | - Chanporn Chaosap
- Department of Agricultural Education, Faculty of Industrial Education and Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand.
| |
Collapse
|
3
|
Reggiani F, El Rashed Z, Petito M, Pfeffer M, Morabito A, Tanda ET, Spagnolo F, Croce M, Pfeffer U, Amaro A. Machine Learning Methods for Gene Selection in Uveal Melanoma. Int J Mol Sci 2024; 25:1796. [PMID: 38339073 PMCID: PMC10855534 DOI: 10.3390/ijms25031796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 01/25/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Uveal melanoma (UM) is the most common primary intraocular malignancy with a limited five-year survival for metastatic patients. Limited therapeutic treatments are currently available for metastatic disease, even if the genomics of this tumor has been deeply studied using next-generation sequencing (NGS) and functional experiments. The profound knowledge of the molecular features that characterize this tumor has not led to the development of efficacious therapies, and the survival of metastatic patients has not changed for decades. Several bioinformatics methods have been applied to mine NGS tumor data in order to unveil tumor biology and detect possible molecular targets for new therapies. Each application can be single domain based while others are more focused on data integration from multiple genomics domains (as gene expression and methylation data). Examples of single domain approaches include differentially expressed gene (DEG) analysis on gene expression data with statistical methods such as SAM (significance analysis of microarray) or gene prioritization with complex algorithms such as deep learning. Data fusion or integration methods merge multiple domains of information to define new clusters of patients or to detect relevant genes, according to multiple NGS data. In this work, we compare different strategies to detect relevant genes for metastatic disease prediction in the TCGA uveal melanoma (UVM) dataset. Detected targets are validated with multi-gene score analysis on a larger UM microarray dataset.
Collapse
Affiliation(s)
- Francesco Reggiani
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Zeinab El Rashed
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Mariangela Petito
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
- Department of Experimental Medicine (DIMES), University of Genova, Via Leon Battista Alberti, 16132 Genova, Italy
| | - Max Pfeffer
- Institute of Numerical and Applied Mathematics, University of Göttingen, 37083 Göttingen, Germany;
| | - Anna Morabito
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Enrica Teresa Tanda
- Skin Cancer Unit, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (E.T.T.); (F.S.)
- Department of Internal Medicine and Medical Specialties, University of Genova, Viale Benedetto XV, 16132 Genova, Italy
| | - Francesco Spagnolo
- Skin Cancer Unit, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (E.T.T.); (F.S.)
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genova, 16132 Genova, Italy
| | - Michela Croce
- Biotherapies, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy;
| | - Ulrich Pfeffer
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Adriana Amaro
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| |
Collapse
|
4
|
Khalilzad Z, Tadj C. Use of psychoacoustic spectrum warping, decision template fusion, and neighborhood component analysis in newborn cry diagnostic systems. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:901-914. [PMID: 38310608 DOI: 10.1121/10.0024618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 01/10/2024] [Indexed: 02/06/2024]
Abstract
Dealing with newborns' health is a delicate matter since they cannot express needs, and crying does not reflect their condition. Although newborn cries have been studied for various purposes, there is no prior research on distinguishing a certain pathology from other pathologies so far. Here, an unsophisticated framework is proposed for the study of septic newborns amid a collective of other pathologies. The cry was analyzed with music inspired and speech processing inspired features. Furthermore, neighborhood component analysis (NCA) feature selection was employed with two goals: (i) Exploring how the elements of each feature set contributed to classification outcome; (ii) investigating to what extent the feature space could be compacted. The attained results showed success of both experiments introduced in this study, with 88.66% for the decision template fusion (DTF) technique and a consistent enhancement in comparison to all feature sets in terms of accuracy and 86.22% for the NCA feature selection method by drastically downsizing the feature space from 86 elements to only 6 elements. The achieved results showed great potential for identifying a certain pathology from other pathologies that may have similar effects on the cry patterns as well as proving the success of the proposed framework.
Collapse
Affiliation(s)
- Zahra Khalilzad
- Department of Electrical Engineering, École de Technologie Supérieur, Université du Québec, Montréal, Québec H3C 1K3, Canada
| | - Chakib Tadj
- Department of Electrical Engineering, École de Technologie Supérieur, Université du Québec, Montréal, Québec H3C 1K3, Canada
| |
Collapse
|
5
|
Porras LM, Padilla N, Moles-Fernández A, Feliubadaló L, Santamariña-Pena M, Sánchez AT, López-Novo A, Blanco A, de la Hoya M, Molina IJ, Osorio A, Pineda M, Rueda D, Ruiz-Ponte C, Vega A, Lázaro C, Díez O, Gutiérrez-Enríquez S, de la Cruz X. A New Set of in Silico Tools to Support the Interpretation of ATM Missense Variants Using Graphical Analysis. J Mol Diagn 2024; 26:17-28. [PMID: 37865290 DOI: 10.1016/j.jmoldx.2023.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 06/30/2023] [Accepted: 09/20/2023] [Indexed: 10/23/2023] Open
Abstract
Establishing the pathogenic nature of variants in ATM, a gene associated with breast cancer and other hereditary cancers, is crucial for providing patients with adequate care. Unfortunately, achieving good variant classification is still difficult. To address this challenge, we extended the range of in silico tools with a series of graphical tools devised for the analysis of computational evidence by health care professionals. We propose a family of fast and easy-to-use graphical representations in which the impact of a variant is considered relative to other pathogenic and benign variants. To illustrate their value, the representations are applied to three problems in variant interpretation. The assessment of computational pathogenicity predictions showed that the graphics provide an intuitive view of prediction reliability, complementing and extending conventional numerical reliability indexes. When applied to variant of unknown significance populations, the representations shed light on the nature of these variants and can be used to prioritize variants of unknown significance for further studies. In a third application, the graphics were used to compare the two versions of the ATM-adapted American College of Medical Genetics and Genomics and Association for Molecular Pathology guidelines, obtaining valuable information on their relative virtues and weaknesses. Finally, a server [ATMision (ATM missense in silico interpretation online)] was generated for users to apply these representations in their variant interpretation problems, to check the ATM-adapted guidelines' criteria for computational evidence on their variant(s) and access different sources of information.
Collapse
Affiliation(s)
- Luz-Marina Porras
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Natàlia Padilla
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Alejandro Moles-Fernández
- Hereditary Cancer Genetics Group, Vall d'Hebron Institute of Oncology, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Lidia Feliubadaló
- Hereditary Cancer Program, Catalan Institute of Oncology, Hospitalet de Llobregat, Barcelona, Spain; Program in Molecular Mechanisms and Experimental Therapy in Oncology, Instituto de Investigación Biomédica de Bellvitge (IDIBELL), Hospitalet de Llobregat, Barcelona, Spain; Centro de Investigación Biomédica en Red de Cáncer, Madrid, Spain
| | - Marta Santamariña-Pena
- Fundación Pública Galega Medicina Xenómica, Santiago de Compostela, Spain; Instituto de Investigación Sanitaria de Santiago de Compostela, Santiago de Compostela, Spain; Centro de Investigación Biomédica en Red de enfermedades Raras, Madrid, Spain
| | - Alysson T Sánchez
- Hereditary Cancer Program, Oncobell Program, Catalan Institute of Oncology, Instituto de Investigación Biomédica de Bellvitge (IDIBELL), Hospitalet de Llobregat, Barcelona, Spain
| | - Anael López-Novo
- Fundación Pública Galega Medicina Xenómica, Santiago de Compostela, Spain; Instituto de Investigación Sanitaria de Santiago de Compostela, Santiago de Compostela, Spain
| | - Ana Blanco
- Fundación Pública Galega Medicina Xenómica, Santiago de Compostela, Spain; Instituto de Investigación Sanitaria de Santiago de Compostela, Santiago de Compostela, Spain; Centro de Investigación Biomédica en Red de enfermedades Raras, Madrid, Spain
| | - Miguel de la Hoya
- Molecular Oncology Laboratory, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Ignacio J Molina
- Instituto de Biopatología y Medicina Regenerativa, Universidad de Granada and Instituto de Investigación Biosanitaria ibs.GRANADA, Granada, Spain
| | - Ana Osorio
- Familial Cancer Clinical Unit, Human Cancer Genetics Programme, Spanish National Cancer Research Centre, Madrid, Spain; Spanish Network on Rare Diseases, Madrid, Spain
| | - Marta Pineda
- Hereditary Cancer Program, Catalan Institute of Oncology, Hospitalet de Llobregat, Barcelona, Spain; Program in Molecular Mechanisms and Experimental Therapy in Oncology, Instituto de Investigación Biomédica de Bellvitge (IDIBELL), Hospitalet de Llobregat, Barcelona, Spain; Centro de Investigación Biomédica en Red de Cáncer, Madrid, Spain
| | - Daniel Rueda
- Hereditary Cancer Laboratory, 12 de Octubre University Hospital, i+12 Research Institute, Madrid, Spain
| | - Clara Ruiz-Ponte
- Fundación Pública Galega Medicina Xenómica, Santiago de Compostela, Spain; Instituto de Investigación Sanitaria de Santiago de Compostela, Santiago de Compostela, Spain; Centro de Investigación Biomédica en Red de enfermedades Raras, Madrid, Spain
| | - Ana Vega
- Fundación Pública Galega Medicina Xenómica, Santiago de Compostela, Spain; Instituto de Investigación Sanitaria de Santiago de Compostela, Santiago de Compostela, Spain; Centro de Investigación Biomédica en Red de enfermedades Raras, Madrid, Spain
| | - Conxi Lázaro
- Hereditary Cancer Program, Catalan Institute of Oncology, Hospitalet de Llobregat, Barcelona, Spain; Program in Molecular Mechanisms and Experimental Therapy in Oncology, Instituto de Investigación Biomédica de Bellvitge (IDIBELL), Hospitalet de Llobregat, Barcelona, Spain; Centro de Investigación Biomédica en Red de Cáncer, Madrid, Spain
| | - Orland Díez
- Hereditary Cancer Genetics Group, Vall d'Hebron Institute of Oncology, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Area of Clinical and Molecular Genetics, Vall d'Hebron Hospital Universitari, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Sara Gutiérrez-Enríquez
- Hereditary Cancer Genetics Group, Vall d'Hebron Institute of Oncology, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain.
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research, Universitat Autònoma de Barcelona, Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.
| |
Collapse
|
6
|
Guéniche N, Lakehal Z, Habauzit D, Bruyère A, Fardel O, Le Hégarat L, Huguet A. Combined in silico and in vitro approaches to identify P-glycoprotein-inhibiting pesticides. J Biochem Mol Toxicol 2024; 38:e23588. [PMID: 37985955 DOI: 10.1002/jbt.23588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/04/2023] [Accepted: 11/10/2023] [Indexed: 11/22/2023]
Abstract
The P-glycoprotein (P-gp) efflux pump plays a major role in xenobiotic detoxification. The inhibition of its activity by environmental contaminants remains however rather little characterised. The present study was designed to develop a combination of different approaches to identify P-gp inhibitors among a large number of pesticides using in silico and in vitro models. First, the prediction performance of four web tools was evaluated alone or in combination using a set of recently marketed drugs. The best combination of web tools-AdmetSAR2.0/PgpRules/pkCSM-was next used to predict P-gp activity inhibition by 762 pesticides. Among the 187 pesticides predicted to be P-gp inhibitors, 11 were tested in vitro for their ability to inhibit the efflux of reference substrates (rhodamine 123 and Hoechst 33342) in P-gp overexpressing MCF7R cells and to inhibit the efflux of the reference substrate rhodamine 123 in the Caco-2 cell monolayer. In MCF7R cell assays, ivermectin B1a, emamectin B1 benzoate, spinosad, dimethomorph and tralkoxydim inhibited P-gp activity; ivermectin B1a, emamectin B1 benzoate and spinosad were determined to be stronger inhibitors (half-maximal inhibitory concentration [IC50 ] of 3 ± 1, 5 ± 1 and 7 ± 1 µM, respectively) than dimethomorph and tralkoxydim (IC50 of 102 ± 7 and 88 ± 7 µM, respectively). Ivermectin B1a, emamectin B1 benzoate, spinosad and dimethomorph also inhibited P-gp activity in Caco-2 cell monolayer assays, with dimethomorph being a weaker P-gp inhibitor. These combined approaches could be used to identify P-gp inhibitors among food contaminants, but need to be optimised and adapted for high-throughput screening.
Collapse
Affiliation(s)
- Nelly Guéniche
- Xenobiotics and Barriers team, Research Institut for Environmental and Occupational Health (IRSET), Rennes, France
- Fougères Laboratory, Toxicology of Contaminants Unit, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), Fougères Cedex, France
| | - Zeineb Lakehal
- Fougères Laboratory, Toxicology of Contaminants Unit, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), Fougères Cedex, France
| | - Denis Habauzit
- Fougères Laboratory, Toxicology of Contaminants Unit, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), Fougères Cedex, France
| | - Arnaud Bruyère
- Xenobiotics and Barriers team, Research Institut for Environmental and Occupational Health (IRSET), Rennes, France
| | - Olivier Fardel
- University hospital center of Rennes, Xenobiotics and Barriers team, Research Institut for Environmental and Occupational Health (IRSET), Rennes, France
| | - Ludovic Le Hégarat
- Fougères Laboratory, Toxicology of Contaminants Unit, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), Fougères Cedex, France
| | - Antoine Huguet
- Fougères Laboratory, Toxicology of Contaminants Unit, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), Fougères Cedex, France
| |
Collapse
|
7
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
8
|
Bhandari N, Walambe R, Kotecha K, Kaliya M. Integrative gene expression analysis for the diagnosis of Parkinson's disease using machine learning and explainable AI. Comput Biol Med 2023; 163:107140. [PMID: 37315380 DOI: 10.1016/j.compbiomed.2023.107140] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 05/29/2023] [Accepted: 06/04/2023] [Indexed: 06/16/2023]
Abstract
Parkinson's disease (PD) is a progressive neurodegenerative disorder. Various symptoms and diagnostic tests are used in combination for the diagnosis of PD; however, accurate diagnosis at early stages is difficult. Blood-based markers can support physicians in the early diagnosis and treatment of PD. In this study, we used Machine Learning (ML) based methods for the diagnosis of PD by integrating gene expression data from different sources and applying explainable artificial intelligence (XAI) techniques to find the significant set of gene features contributing to diagnosis. We utilized the Least Absolute Shrinkage and Selection Operator (LASSO), and Ridge regression for the feature selection process. We utilized state-of-the-art ML techniques for the classification of PD cases and healthy controls. Logistic regression and Support Vector Machine showed the highest diagnostic accuracy. SHapley Additive exPlanations (SHAP) based global interpretable model-agnostic XAI method was utilized for the interpretation of the Support Vector Machine model. A set of significant biomarkers that contributed to the diagnosis of PD were identified. Some of these genes are associated with other neurodegenerative diseases. Our results suggest that the utilization of XAI can be useful in making early therapeutic decisions for the treatment of PD. The integration of datasets from different sources made this model robust. We believe that this research article will be of interest to clinicians as well as computational biologists in translational research.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH, India; Symbiosis Center for Applied Artificial Intelligence (SCAAI), Symbiosis International Deemed University, Pune, Maharashtra, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India; Symbiosis Center for Applied Artificial Intelligence (SCAAI), Symbiosis International Deemed University, Pune, Maharashtra, India.
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH, India; Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India.
| | - Mehul Kaliya
- Department of General Medicine, AIIMS, Rajkot, Gujrat, India
| |
Collapse
|
9
|
Yang Y, Chong Z, Vihinen M. PON-Fold: Prediction of Substitutions Affecting Protein Folding Rate. Int J Mol Sci 2023; 24:13023. [PMID: 37629203 PMCID: PMC10455311 DOI: 10.3390/ijms241613023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/08/2023] [Accepted: 08/09/2023] [Indexed: 08/27/2023] Open
Abstract
Most proteins fold into characteristic three-dimensional structures. The rate of folding and unfolding varies widely and can be affected by variations in proteins. We developed a novel machine-learning-based method for the prediction of the folding rate effects of amino acid substitutions in two-state folding proteins. We collected a data set of experimentally defined folding rates for variants and used them to train a gradient boosting algorithm starting with 1161 features. Two predictors were designed. The three-class classifier had, in blind tests, specificity and sensitivity ranging from 0.324 to 0.419 and from 0.256 to 0.451, respectively. The other tool was a regression predictor that showed a Pearson correlation coefficient of 0.525. The error measures, mean absolute error and mean squared error, were 0.581 and 0.603, respectively. One of the previously presented tools could be used for comparison with the blind test data set, our method called PON-Fold showed superior performance on all used measures. The applicability of the tool was tested by predicting all possible substitutions in a protein domain. Predictions for different conformations of proteins, open and closed forms of a protein kinase, and apo and holo forms of an enzyme indicated that the choice of the structure had a large impact on the outcome. PON-Fold is freely available.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (Z.C.)
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Zhang Chong
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (Z.C.)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84 Lund, Sweden
| |
Collapse
|
10
|
Aspromonte MC, Conte AD, Zhu S, Tan W, Shen Y, Zhang Y, Li Q, Wang MH, Babbi G, Bovo S, Martelli PL, Casadio R, Althagafi A, Toonsi S, Kulmanov M, Hoehndorf R, Katsonis P, Williams A, Lichtarge O, Xian S, Surento W, Pejaver V, Mooney SD, Sunderam U, Srinivasan R, Murgia A, Piovesan D, Tosatto SCE, Leonardi E. CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs). RESEARCH SQUARE 2023:rs.3.rs-3209168. [PMID: 37577579 PMCID: PMC10418555 DOI: 10.21203/rs.3.rs-3209168/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
Collapse
Affiliation(s)
| | | | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | | | - Qi Li
- CUHK Shenzhen Research Institute, Shenzhen
| | | | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Sumyyah Toonsi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Wesley Surento
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Uma Sunderam
- Innovation Labs, Tata Consultancy Services, Hyderabad
| | | | | | | | | | | |
Collapse
|
11
|
Ahn SY, Jung EH, Ahn H, Lee JS, Bak JH, Kim ED, Song JH, Shin HS, Jamiyansharav M, Seo KY. Automatic measurement of mouse visual acuity based on optomotor response: SKY optomotry. Lab Anim 2023; 57:412-423. [PMID: 36708198 DOI: 10.1177/00236772221148576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In the field of visual science study using rodents, several assessment methods have been developed for measuring visual function. However, methods such as electroretinograms tests, visual evoked potentials tests and maze tests have limitations in that they measure function of only a specific type of cells, are difficult to quantify or require sufficient training time. The method which uses an optokinetic reflex and optomotor response, a compensatory eye and head movement in response to changes in the visual scene, became the most widely used method. However, this method requires highly trained experimenters and is time consuming. We showed that measured visual acuity values are significantly different between beginner and expert. Here we suggest an automated optometry program, 'SKY optomotry', which automatically tracks rodents' optomotor response to overcome subjectivity and the lengthy scoring procedure of the existing method. To evaluate the performance of SKY optomotry using 8-12-week-old C57BL/6 mice we compared the binomial decision of SKY optomotry with a skilled expert, and the area under the curve of SKY optomotry was 0.845. Comparing the final visual acuity, the intraclass correlation coefficient value between SKY optomotry and an expert was 0.860 (95% confidence interval (CI) 0.709-0.928), whereas that between an expert and a beginner was 0.642 (95% CI 0.292-0.811). SKY optomotry showed an excellent level of performance with good inter-rater agreements based on the visual acuity measured by an expert. With the use of our application, researchers will be able to test an experimental animal's eyesight more accurately while saving time on specialized training.
Collapse
Affiliation(s)
- So Yeon Ahn
- Department of Medicine, Yonsei University College of Medicine, Republic of Korea
| | - Eun Hye Jung
- Department of Medicine, Yonsei University College of Medicine, Republic of Korea
| | - Hyunmin Ahn
- Department of Medicine, Yonsei University College of Medicine, Republic of Korea
- Department of Ophthalmology, The Institute of Vision Research, Yonsei University College of Medicine, Republic of Korea
| | - Jihei Sara Lee
- Department of Medicine, Yonsei University College of Medicine, Republic of Korea
- Department of Ophthalmology, The Institute of Vision Research, Yonsei University College of Medicine, Republic of Korea
| | - Jeong Hyeon Bak
- Department of Mechanical Engineering, Hanyang University, Republic of Korea
| | - Eun-do Kim
- Laboratory of Molecular Immunology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, USA
| | - Ja-Hyun Song
- Korea Mouse Sensory Phenotyping Center (KMSPC), Yonsei University College of Medicine, Republic of Korea
| | - Hae-Sol Shin
- Department of Ophthalmology, The Institute of Vision Research, Yonsei University College of Medicine, Republic of Korea
- Korea Mouse Sensory Phenotyping Center (KMSPC), Yonsei University College of Medicine, Republic of Korea
| | | | - Kyoung Yul Seo
- Department of Medicine, Yonsei University College of Medicine, Republic of Korea
- Department of Ophthalmology, The Institute of Vision Research, Yonsei University College of Medicine, Republic of Korea
| |
Collapse
|
12
|
Aguirre J, Padilla N, Özkan S, Riera C, Feliubadaló L, de la Cruz X. Choosing Variant Interpretation Tools for Clinical Applications: Context Matters. Int J Mol Sci 2023; 24:11872. [PMID: 37511631 PMCID: PMC10380979 DOI: 10.3390/ijms241411872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023] Open
Abstract
Pathogenicity predictors are computational tools that classify genetic variants as benign or pathogenic; this is currently a major challenge in genomic medicine. With more than fifty such predictors available, selecting the most suitable tool for clinical applications like genetic screening, molecular diagnostics, and companion diagnostics has become increasingly challenging. To address this issue, we have developed a cost-based framework that naturally considers the various components of the problem. This framework encodes clinical scenarios using a minimal set of parameters and treats pathogenicity predictors as rejection classifiers, a common practice in clinical applications where low-confidence predictions are routinely rejected. We illustrate our approach in four examples where we compare different numbers of pathogenicity predictors for missense variants. Our results show that no single predictor is optimal for all clinical scenarios and that considering rejection yields a different perspective on classifiers.
Collapse
Affiliation(s)
- Josu Aguirre
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035 Barcelona, Spain
| | - Natàlia Padilla
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035 Barcelona, Spain
| | - Selen Özkan
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035 Barcelona, Spain
| | - Casandra Riera
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035 Barcelona, Spain
| | - Lídia Feliubadaló
- Hereditary Cancer Program, Program in Molecular Mechanisms and Experimental Therapy in Oncology (Oncobell), IDIBELL, Catalan Institute of Oncology, 08908 L'Hospitalet de Llobregat, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), 28929 Madrid, Spain
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
13
|
Rani D, Krishan K, Kanchan T. A methodological comparison of discriminant function analysis and binary logistic regression for estimating sex in forensic research and case-work. MEDICINE, SCIENCE, AND THE LAW 2023; 63:227-236. [PMID: 36366800 DOI: 10.1177/00258024221136687] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The purpose of this study is to assess the accuracy of two multivariate statistical approaches for estimating sex from human external ear anthropometry, namely, discriminant function analysis (DFA) and binary logistic regression (BLR). A cross-sectional sample of 497 participants (233 males and 264 females) aged 18-35 years (24.42 ± 5.17) was obtained from Himachal Pradesh state of North India. Both the ears of the participants (994) were examined for anthropometric measurements. A total of 12 anthropometric measurements were taken independently on the left and right ear of each individual with the help of a pair of sliding calipers using a standard method. The sex of the population groups was discriminated against using binary logistic regression and discriminant function analysis. The predictive percentage of sex estimation computed from both the models were substantially the same, that is, 76.3% from DFA and 76.2% from BLR, with nearly comparable (∼0.02) sensitivity, specificity, positive predictive value, and negative predictive values, whereas the values of correct predicted percentage were 0.1% higher in DFA than BLR. Moreover, the other comparison metrics, such as classification error, B-index, and Matthews correlation coefficient indicated that both models performed equally well. The study highlighted that if the assumptions of the statistical methods are met, both methods are equally capable of discriminating the population depending on sex. The study recommends that the discriminant function analysis and binary logistic regression may be used synonymously in forensic research and case-work pertaining to the estimation of sex and various other forensic situations.
Collapse
Affiliation(s)
- Deepika Rani
- Department of Anthropology, Panjab University, Chandigarh, India
| | - Kewal Krishan
- Department of Anthropology, Panjab University, Chandigarh, India
| | - Tanuj Kanchan
- Department of Forensic Medicine, All India Institute of Medical Sciences, Jodhpur, India
| |
Collapse
|
14
|
Wang M, Zhao C, Barr A, Fan H, Yu S, Kapellusch J, Harris Adamson C. Hand Posture and Force Estimation Using Surface Electromyography and an Artificial Neural Network. HUMAN FACTORS 2023; 65:382-402. [PMID: 34006135 DOI: 10.1177/00187208211016695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
OBJECTIVE The purpose of this study was to develop an approach to predict hand posture (pinch versus grip) and grasp force using forearm surface electromyography (sEMG) and artificial neural networks (ANNs) during tasks that varied repetition rate and duty cycle. BACKGROUND Prior studies have used electromyography with machine learning models to predict grip force but relatively few studies have assessed whether both hand posture and force can be predicted, particularly at varying levels of duty cycle and repetition rate. METHOD Fourteen individuals participated in this experiment. sEMG data for five forearm muscles and force output data were collected. Calibration data (25, 50, 75, 100% of maximum voluntary contraction (MVC)) were used to train ANN models to predict hand posture (pinch versus grip) and force magnitude while performing tasks that varied load, repetition rate, and duty cycle. RESULTS Across all participants, overall hand posture prediction accuracy was 79% (0.79 ± .08), whereas overall hand force prediction accuracy was 73% (0.73 ± .09). Accuracy ranged between 0.65 and 0.93 based on varying repetition rate and duty cycle. CONCLUSION Hand posture and force prediction were possible using sEMG and ANNs, though there were important differences in the accuracy of predictions based on task characteristics including duty cycle and repetition rate. APPLICATION The results of this study could be applied to the development of a dosimeter used for distal upper extremity biomechanical exposure measurement, risk assessment, job (re)design, and return to work programs.
Collapse
Affiliation(s)
- Mengcheng Wang
- Northwestern Polytechnical University, Xi'an, China
- University of California, Berkeley, USA
| | | | - Alan Barr
- University of California, San Francisco, USA
| | - Hao Fan
- Northwestern Polytechnical University, Xi'an, China
| | - Suihuai Yu
- Northwestern Polytechnical University, Xi'an, China
| | | | | |
Collapse
|
15
|
Khalilzad Z, Tadj C. Using CCA-Fused Cepstral Features in a Deep Learning-Based Cry Diagnostic System for Detecting an Ensemble of Pathologies in Newborns. Diagnostics (Basel) 2023; 13:diagnostics13050879. [PMID: 36900023 PMCID: PMC10000938 DOI: 10.3390/diagnostics13050879] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/14/2023] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open
Abstract
Crying is one of the means of communication for a newborn. Newborn cry signals convey precious information about the newborn's health condition and their emotions. In this study, cry signals of healthy and pathologic newborns were analyzed for the purpose of developing an automatic, non-invasive, and comprehensive Newborn Cry Diagnostic System (NCDS) that identifies pathologic newborns from healthy infants. For this purpose, Mel-frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) were extracted as features. These feature sets were also combined and fused through Canonical Correlation Analysis (CCA), which provides a novel manipulation of the features that have not yet been explored in the literature on NCDS designs, to the best of our knowledge. All the mentioned feature sets were fed to the Support Vector Machine (SVM) and Long Short-term Memory (LSTM). Furthermore, two Hyperparameter optimization methods, Bayesian and grid search, were examined to enhance the system's performance. The performance of our proposed NCDS was evaluated with two different datasets of inspiratory and expiratory cries. The CCA fusion feature set using the LSTM classifier accomplished the best F-score in the study, with 99.86% for the inspiratory cry dataset. The best F-score regarding the expiratory cry dataset, 99.44%, belonged to the GFCC feature set employing the LSTM classifier. These experiments suggest the high potential and value of using the newborn cry signals in the detection of pathologies. The framework proposed in this study can be implemented as an early diagnostic tool for clinical studies and help in the identification of pathologic newborns.
Collapse
|
16
|
A new blood based epigenetic age predictor for adolescents and young adults. Sci Rep 2023; 13:2303. [PMID: 36759656 PMCID: PMC9911637 DOI: 10.1038/s41598-023-29381-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 02/03/2023] [Indexed: 02/11/2023] Open
Abstract
Children have special rights for protection compared to adults in our society. However, more than 1/4 of children globally have no documentation of their date of birth. Hence, there is a pressing need to develop biological methods for chronological age prediction, robust to differences in genetics, psychosocial events and physical living conditions. At present, DNA methylation is the most promising biological biomarker applied for age assessment. The human genome contains around 28 million DNA methylation sites, many of which change with age. Several epigenetic clocks accurately predict chronological age using methylation levels at age associated GpG-sites. However, variation in DNA methylation increases with age, and there is no epigenetic clock specifically designed for adolescents and young adults. Here we present a novel age Predictor for Adolescents and Young Adults (PAYA), using 267 CpG methylation sites to assess the chronological age of adolescents and young adults. We compared different preprocessing approaches and investigated the effect on prediction performance of the epigenetic clock. We evaluated performance using an independent validation data set consisting of 18-year-old individuals, where we obtained a median absolute deviation of just below 0.7 years. This tool may be helpful in age assessment of adolescents and young adults. However, there is a need to investigate the robustness of the age predictor across geographical and disease populations as well as environmental effects.
Collapse
|
17
|
Deyneko IV. Guidelines on the performance evaluation of motif recognition methods in bioinformatics. Front Genet 2023; 14:1135320. [PMID: 36824436 PMCID: PMC9941176 DOI: 10.3389/fgene.2023.1135320] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 01/19/2023] [Indexed: 02/09/2023] Open
|
18
|
C L, S P, Kashyap AH, Rahaman A, Niranjan S, Niranjan V. Novel Biomarker Prediction for Lung Cancer Using Random Forest Classifiers. Cancer Inform 2023; 22:11769351231167992. [PMID: 37113644 PMCID: PMC10126698 DOI: 10.1177/11769351231167992] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/17/2023] [Indexed: 04/29/2023] Open
Abstract
Lung cancer is considered the most common and the deadliest cancer type. Lung cancer could be mainly of 2 types: small cell lung cancer and non-small cell lung cancer. Non-small cell lung cancer is affected by about 85% while small cell lung cancer is only about 14%. Over the last decade, functional genomics has arisen as a revolutionary tool for studying genetics and uncovering changes in gene expression. RNA-Seq has been applied to investigate the rare and novel transcripts that aid in discovering genetic changes that occur in tumours due to different lung cancers. Although RNA-Seq helps to understand and characterise the gene expression involved in lung cancer diagnostics, discovering the biomarkers remains a challenge. Usage of classification models helps uncover and classify the biomarkers based on gene expression levels over the different lung cancers. The current research concentrates on computing transcript statistics from gene transcript files with a normalised fold change of genes and identifying quantifiable differences in gene expression levels between the reference genome and lung cancer samples. The collected data is analysed, and machine learning models were developed to classify genes as causing NSCLC, causing SCLC, causing both or neither. An exploratory data analysis was performed to identify the probability distribution and principal features. Due to the limited number of features available, all of them were used in predicting the class. To address the imbalance in the dataset, an under-sampling algorithm Near Miss was carried out on the dataset. For classification, the research primarily focused on 4 supervised machine learning algorithms: Logistic Regression, KNN classifier, SVM classifier and Random Forest classifier and additionally, 2 ensemble algorithms were considered: XGboost and AdaBoost. Out of these, based on the weighted metrics considered, the Random Forest classifier showing 87% accuracy was considered to be the best performing algorithm and thus was used to predict the biomarkers causing NSCLC and SCLC. The imbalance and limited features in the dataset restrict any further improvement in the model's accuracy or precision. In our present study using the gene expression values (LogFC, P Value) as the feature sets in the Random Forest Classifier BRAF, KRAS, NRAS, EGFR is predicted to be the possible biomarkers causing NSCLC and ATF6, ATF3, PGDFA, PGDFD, PGDFC and PIP5K1C is predicted to be the possible biomarkers causing SCLC from the transcriptome analysis. It gave a precision of 91.3% and 91% recall after fine tuning. Some of the common biomarkers predicted for NSCLC and SCLC were CDK4, CDK6, BAK1, CDKN1A, DDB2.
Collapse
Affiliation(s)
- Lavanya C
- Department of Biotechnology, RV College
of Engineering, Bengaluru, Karnataka, India
| | - Pooja S
- Department of Biotechnology, RV College
of Engineering, Bengaluru, Karnataka, India
| | - Abhay H Kashyap
- Department of Computer Science and
Engineering, RV College of Engineering, Bengaluru, Karnataka, India
| | - Abdur Rahaman
- Department of Computer Science and
Engineering, RV College of Engineering, Bengaluru, Karnataka, India
| | - Swarna Niranjan
- Department of AIML, RV College of
Engineering, Bengaluru, Karnataka, India
| | - Vidya Niranjan
- Department of Biotechnology, RV College
of Engineering, Bengaluru, Karnataka, India
- Vidya Niranjan, Department of
Biotechnology, RV College of Engineering, Mysore Road, RV Vidyaniketan Post,
Bangalore, Karnataka 560059, India.
| |
Collapse
|
19
|
Ma J, Qin T, Xiang J. Disease-gene prediction based on preserving structure network embedding. Front Aging Neurosci 2023; 15:1061892. [PMID: 36896421 PMCID: PMC9990751 DOI: 10.3389/fnagi.2023.1061892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/30/2023] [Indexed: 02/23/2023] Open
Abstract
Many diseases, such as Alzheimer's disease (AD) and Parkinson's disease (PD), are caused by abnormalities or mutations of related genes. Many computational methods based on the network relationship between diseases and genes have been proposed to predict potential pathogenic genes. However, how to effectively mine the disease-gene relationship network to predict disease genes better is still an open problem. In this paper, a disease-gene-prediction method based on preserving structure network embedding (PSNE) is introduced. In order to predict pathogenic genes more effectively, a heterogeneous network with multiple types of bio-entities was constructed by integrating disease-gene associations, human protein network, and disease-disease associations. Furthermore, the low-dimension features of nodes extracted from the network were used to reconstruct a new disease-gene heterogeneous network. Compared with other advanced methods, the performance of PSNE has been confirmed more effective in disease-gene prediction. Finally, we applied the PSNE method to predict potential pathogenic genes for age-associated diseases such as AD and PD. We verified the effectiveness of these predicted potential genes by literature verification. Overall, this work provides an effective method for disease-gene prediction, and a series of high-confidence potential pathogenic genes of AD and PD which may be helpful for the experimental discovery of disease genes.
Collapse
Affiliation(s)
- Jinlong Ma
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Tian Qin
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Ju Xiang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China.,Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| |
Collapse
|
20
|
Wang H, Li H, Gao W, Xie J. PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy. Anal Biochem 2022; 658:114935. [PMID: 36206844 DOI: 10.1016/j.ab.2022.114935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022]
Abstract
Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.
Collapse
Affiliation(s)
- Houqiang Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Hong Li
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Weifeng Gao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jin Xie
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
21
|
He B, Wang K, Xiang J, Bing P, Tang M, Tian G, Guo C, Xu M, Yang J. DGHNE: network enhancement-based method in identifying disease-causing genes through a heterogeneous biomedical network. Brief Bioinform 2022; 23:6712302. [PMID: 36151744 DOI: 10.1093/bib/bbac405] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 08/01/2022] [Accepted: 08/21/2022] [Indexed: 12/14/2022] Open
Abstract
The identification of disease-causing genes is critical for mechanistic understanding of disease etiology and clinical manipulation in disease prevention and treatment. Yet the existing approaches in tackling this question are inadequate in accuracy and efficiency, demanding computational methods with higher identification power. Here, we proposed a new method called DGHNE to identify disease-causing genes through a heterogeneous biomedical network empowered by network enhancement. First, a disease-disease association network was constructed by the cosine similarity scores between phenotype annotation vectors of diseases, and a new heterogeneous biomedical network was constructed by using disease-gene associations to connect the disease-disease network and gene-gene network. Then, the heterogeneous biomedical network was further enhanced by using network embedding based on the Gaussian random projection. Finally, network propagation was used to identify candidate genes in the enhanced network. We applied DGHNE together with five other methods into the most updated disease-gene association database termed DisGeNet. Compared with all other methods, DGHNE displayed the highest area under the receiver operating characteristic curve and the precision-recall curve, as well as the highest precision and recall, in both the global 5-fold cross-validation and predicting new disease-gene associations. We further performed DGHNE in identifying the candidate causal genes of Parkinson's disease and diabetes mellitus, and the genes connecting hyperglycemia and diabetes mellitus. In all cases, the predicted causing genes were enriched in disease-associated gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways, and the gene-disease associations were highly evidenced by independent experimental studies.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Kun Wang
- School of Mathematical Sciences, Ocean University of China, Qingdao 266100, China
| | - Ju Xiang
- Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang 212001, Jiangsu, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing 100102, China
| | - Cheng Guo
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Miao Xu
- Broad institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China.,Geneis (Beijing) Co., Ltd., Beijing 100102, China
| |
Collapse
|
22
|
Bhandari N, Walambe R, Kotecha K, Khare SP. A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Satyajeet P. Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
23
|
Parallel functional annotation of cancer-associated missense mutations in histone methyltransferases. Sci Rep 2022; 12:18487. [PMID: 36323913 PMCID: PMC9630446 DOI: 10.1038/s41598-022-23229-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/27/2022] [Indexed: 12/03/2022] Open
Abstract
Using exome sequencing for biomarker discovery and precision medicine requires connecting nucleotide-level variation with functional changes in encoded proteins. However, for functionally annotating the thousands of cancer-associated missense mutations, or variants of uncertain significance (VUS), purifying variant proteins for biochemical and functional analysis is cost-prohibitive and inefficient. We describe parallel functional annotation (PFA) of large numbers of VUS using small cultures and crude extracts in 96-well plates. Using members of a histone methyltransferase family, we demonstrate high-throughput structural and functional annotation of cancer-associated mutations. By combining functional annotation of paralogs, we discovered two phylogenetic and clustering parameters that improve the accuracy of sequence-based functional predictions to over 90%. Our results demonstrate the value of PFA for defining oncogenic/tumor suppressor functions of histone methyltransferases as well as enhancing the accuracy of sequence-based algorithms in predicting the effects of cancer-associated mutations.
Collapse
|
24
|
Comprehensive In Silico Functional Prediction Analysis of CDKL5 by Single Amino Acid Substitution in the Catalytic Domain. Int J Mol Sci 2022; 23:ijms232012281. [PMID: 36293137 PMCID: PMC9603577 DOI: 10.3390/ijms232012281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/07/2022] [Accepted: 10/11/2022] [Indexed: 11/05/2022] Open
Abstract
Cyclin-dependent kinase-like 5 (CDKL5) is a serine/threonine protein kinase whose pathological mutations cause CDKL5 deficiency disorder. Most missense mutations are concentrated in the catalytic domain. Therefore, anticipating whether mutations in this region affect CDKL5 function is informative for clinical diagnosis. This study comprehensively predicted the pathogenicity of all 5700 missense substitutions in the catalytic domain of CDKL5 using in silico analysis and evaluating their accuracy. Each missense substitution was evaluated as “pathogenic” or “benign”. In silico tools PolyPhen-2 HumDiv mode/HumVar mode, PROVEAN, and SIFT were selected individually or in combination with one another to determine their performance using 36 previously reported mutations as a reference. Substitutions predicted as pathogenic were over 88.0% accurate using each of the three tools. The best performance score (accuracy, 97.2%; sensitivity, 100%; specificity, 66.7%; and Matthew’s correlation coefficient (MCC), 0.804) was achieved by combining PolyPhen-2 HumDiv, PolyPhen-2 HumVar, and PROVEAN. This provided comprehensive information that could accurately predict the pathogenicity of the disease, which might be used as an aid for clinical diagnosis.
Collapse
|
25
|
Prado MJ, Ligabue-Braun R, Zaha A, Rossetti MLR, Pandey AV. Variant predictions in congenital adrenal hyperplasia caused by mutations in CYP21A2. Front Pharmacol 2022; 13:931089. [PMID: 36278220 PMCID: PMC9579345 DOI: 10.3389/fphar.2022.931089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
CYP21A2 deficiency represents 95% of congenital adrenal hyperplasia (CAH) cases, a group of genetic disorders that affect steroid biosynthesis. The genetic and functional analysis provide critical tools to elucidate complex CAH cases. One of the most accessible tools to infer the pathogenicity of new variants is in silico prediction. Here, we analyzed the performance of in silico prediction tools to categorize missense single nucleotide variants (SNVs) of CYP21A2. SNVs of CYP21A2 characterized in vitro by functional assays were selected to assess the performance of online single and meta predictors. SNVs were tested separately or in combination with the related phenotype (severe or mild CAH form). In total, 103 SNVs of CYP21A2 (90 pathogenic and 13 neutral) were used to test the performance of 13 single-predictors and four meta-predictors. All SNVs associated with the severe phenotypes were well categorized by all tools, with an accuracy of between 0.69 (PredictSNP2) and 0.97 (CADD), and Matthews’ correlation coefficient (MCC) between 0.49 (PoredicSNP2) and 0.90 (CADD). However, SNVs related to the mild phenotype had more variation, with the accuracy between 0.47 (S3Ds&GO and MAPP) and 0.88 (CADD), and MCC between 0.18 (MAPP) and 0.71 (CADD). From our analysis, we identified four predictors of CYP21A2 variant pathogenicity with good performance, CADD, ConSurf, DANN, and PolyPhen2. These results can be used for future analysis to infer the impact of uncharacterized SNVs in CYP21A2.
Collapse
Affiliation(s)
- Mayara J. Prado
- Graduate Program in Cell and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Center for Biotechnology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Translational Hormone Research, Department of Biomedical Research, University of Bern, Bern, Switzerland
- Pediatric Endocrinology Unit, Department of Pediatrics, University Children’s Hospital Bern, Bern, Switzerland
- *Correspondence: Mayara J. Prado, ; Amit V. Pandey,
| | - Rodrigo Ligabue-Braun
- Departament of Pharmacosciences, Universidade Federal de Ciências da Saúde de Porto Alegre (UFCSPA), Porto Alegre, Brazil
| | - Arnaldo Zaha
- Graduate Program in Cell and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Center for Biotechnology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| | - Maria Lucia Rosa Rossetti
- Graduate Program in Cell and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Graduate Program in Molecular Biology Applied to Health, Universiade Luterana do Brasil (ULBRA), Canoas, Brazil
| | - Amit V. Pandey
- Translational Hormone Research, Department of Biomedical Research, University of Bern, Bern, Switzerland
- Pediatric Endocrinology Unit, Department of Pediatrics, University Children’s Hospital Bern, Bern, Switzerland
- *Correspondence: Mayara J. Prado, ; Amit V. Pandey,
| |
Collapse
|
26
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| |
Collapse
|
27
|
Yang Y, Zhao J, Zeng L, Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int J Mol Sci 2022; 23:ijms231810798. [PMID: 36142711 PMCID: PMC9505338 DOI: 10.3390/ijms231810798] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
The stability of proteins is an essential property that has several biological implications. Knowledge about protein stability is important in many ways, ranging from protein purification and structure determination to stability in cells and biotechnological applications. Experimental determination of thermal stabilities has been tedious and available data have been limited. The introduction of limited proteolysis and mass spectrometry approaches has facilitated more extensive cellular protein stability data production. We collected melting temperature information for 34,913 proteins and developed a machine learning predictor, ProTstab2, by utilizing a gradient boosting algorithm after testing seven algorithms. The method performance was assessed on a blind test data set and showed a Pearson correlation coefficient of 0.753 and root mean square error of 7.005. Comparison to previous methods indicated that ProTstab2 had superior performance. The method is fast, so it was applied to predict and compare the stabilities of all proteins in human, mouse, and zebrafish proteomes for which experimental data were not determined. The tool is freely available.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Jianjun Zhao
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Lianjie Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
- Correspondence:
| |
Collapse
|
28
|
Behrendt A, Golchin P, König F, Mulnaes D, Stalke A, Dröge C, Keitel V, Gohlke H. Vasor: Accurate prediction of variant effects for amino acid substitutions in multidrug resistance protein 3. Hepatol Commun 2022; 6:3098-3111. [PMID: 36111625 PMCID: PMC9592774 DOI: 10.1002/hep4.2088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/26/2022] [Accepted: 08/16/2022] [Indexed: 12/14/2022] Open
Abstract
The phosphatidylcholine floppase multidrug resistance protein 3 (MDR3) is an essential hepatobiliary transport protein. MDR3 dysfunction is associated with various liver diseases, ranging from severe progressive familial intrahepatic cholestasis to transient forms of intrahepatic cholestasis of pregnancy and familial gallstone disease. Single amino acid substitutions are often found as causative of dysfunction, but identifying the substitution effect in in vitro studies is time and cost intensive. We developed variant assessor of MDR3 (Vasor), a machine learning-based model to classify novel MDR3 missense variants into the categories benign or pathogenic. Vasor was trained on the largest data set to date that is specific for benign and pathogenic variants of MDR3 and uses general predictors, namely Evolutionary Models of Variant Effects (EVE), EVmutation, PolyPhen-2, I-Mutant2.0, MUpro, MAESTRO, and PON-P2 along with other variant properties, such as half-sphere exposure and posttranslational modification site, as input. Vasor consistently outperformed the integrated general predictors and the external prediction tool MutPred2, leading to the current best prediction performance for MDR3 single-site missense variants (on an external test set: F1-score, 0.90; Matthew's correlation coefficient, 0.80). Furthermore, Vasor predictions cover the entire sequence space of MDR3. Vasor is accessible as a webserver at https://cpclab.uni-duesseldorf.de/mdr3_predictor/ for users to rapidly obtain prediction results and a visualization of the substitution site within the MDR3 structure. The MDR3-specific prediction tool Vasor can provide reliable predictions of single-site amino acid substitutions, giving users a fast way to initially assess whether a variant is benign or pathogenic.
Collapse
Affiliation(s)
- Annika Behrendt
- Institute for Pharmaceutical and Medicinal ChemistryHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Pegah Golchin
- Department of Electrical Engineering and Information TechnologyTechnische Universität DarmstadtDarmstadtGermany
| | - Filip König
- Institute for Pharmaceutical and Medicinal ChemistryHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Daniel Mulnaes
- Institute for Pharmaceutical and Medicinal ChemistryHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Amelie Stalke
- Department of Human GeneticsHannover Medical SchoolHannoverGermany,Division of Kidney, Department of Pediatric Gastroenterology and Hepatology, Liver, and Metabolic DiseasesHannover Medical SchoolHannoverGermany
| | - Carola Dröge
- Department for Gastroenterology, Hepatology, and Infectious Diseases, Medical FacultyOtto von Guericke UniversityMagdeburgGermany,Department for Gastroenterology, Hepatology, and Infectious DiseasesUniversity Hospital, Medical FacultyHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Verena Keitel
- Department for Gastroenterology, Hepatology, and Infectious Diseases, Medical FacultyOtto von Guericke UniversityMagdeburgGermany,Department for Gastroenterology, Hepatology, and Infectious DiseasesUniversity Hospital, Medical FacultyHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Holger Gohlke
- Institute for Pharmaceutical and Medicinal ChemistryHeinrich Heine University DüsseldorfDüsseldorfGermany,John‐von‐Neumann‐Institute for Computing, Jülich Supercomputing Center, Institute of Biological Information Processing (IBI‐7: Structural Biochemistry), and Institute of Bio‐ and Geosciences (IBG‐4: Bioinformatics)Forschungszentrum Jülich GmbHJülichGermany
| |
Collapse
|
29
|
Khalilzad Z, Kheddache Y, Tadj C. An Entropy-Based Architecture for Detection of Sepsis in Newborn Cry Diagnostic Systems. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1194. [PMID: 36141080 PMCID: PMC9498202 DOI: 10.3390/e24091194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 06/16/2023]
Abstract
The acoustic characteristics of cries are an exhibition of an infant's health condition and these characteristics have been acknowledged as indicators for various pathologies. This study focused on the detection of infants suffering from sepsis by developing a simplified design using acoustic features and conventional classifiers. The features for the proposed framework were Mel-frequency Cepstral Coefficients (MFCC), Spectral Entropy Cepstral Coefficients (SENCC) and Spectral Centroid Cepstral Coefficients (SCCC), which were classified through K-nearest Neighborhood (KNN) and Support Vector Machine (SVM) classification methods. The performance of the different combinations of the feature sets was also evaluated based on several measures such as accuracy, F1-score and Matthews Correlation Coefficient (MCC). Bayesian Hyperparameter Optimization (BHPO) was employed to tailor the classifiers uniquely to fit each experiment. The proposed methodology was tested on two datasets of expiratory cries (EXP) and voiced inspiratory cries (INSV). The highest accuracy and F-score were 89.99% and 89.70%, respectively. This framework also implemented a novel feature selection method based on Fuzzy Entropy (FE) as a final experiment. By employing FE, the number of features was reduced by more than 40%, whereas the evaluation measures were not hindered for the EXP dataset and were even enhanced for the INSV dataset. Therefore, it was deduced through these experiments that an entropy-based framework is successful for identifying sepsis in neonates and has the advantage of achieving high performance with conventional machine learning (ML) approaches, which makes it a reliable means for the early diagnosis of sepsis in deprived areas of the world.
Collapse
|
30
|
Exploring the predictive capability of machine learning models in identifying foot and mouth disease outbreak occurrences in cattle farms in an endemic setting of Thailand. Prev Vet Med 2022; 207:105706. [DOI: 10.1016/j.prevetmed.2022.105706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/09/2022] [Accepted: 07/01/2022] [Indexed: 11/20/2022]
|
31
|
Yang Y, Shao A, Vihinen M. PON-All: Amino Acid Substitution Tolerance Predictor for All Organisms. Front Mol Biosci 2022; 9:867572. [PMID: 35782867 PMCID: PMC9245922 DOI: 10.3389/fmolb.2022.867572] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 05/02/2022] [Indexed: 01/08/2023] Open
Abstract
Genetic variations are investigated in human and many other organisms for many purposes (e.g., to aid in clinical diagnosis). Interpretation of the identified variations can be challenging. Although some dedicated prediction methods have been developed and some tools for human variants can also be used for other organisms, the performance and species range have been limited. We developed a novel variant pathogenicity/tolerance predictor for amino acid substitutions in any organism. The method, PON-All, is a machine learning tool trained on human, animal, and plant variants. Two versions are provided, one with Gene Ontology (GO) annotations and another without these details. GO annotations are not available or are partial for many organisms of interest. The methods provide predictions for three classes: pathogenic, benign, and variants of unknown significance. On the blind test, when using GO annotations, accuracy was 0.913 and MCC 0.827. When GO features were not used, accuracy was 0.856 and MCC 0.712. The performance is the best for human and plant variants and somewhat lower for animal variants because the number of known disease-causing variants in animals is rather small. The method was compared to several other tools and was found to have superior performance. PON-All is freely available at http://structure.bmc.lu.se/PON-All and http://8.133.174.28:8999/.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China
| | - Aibin Shao
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
- *Correspondence: Mauno Vihinen,
| |
Collapse
|
32
|
Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree. Sci Rep 2022; 12:10004. [PMID: 35705654 PMCID: PMC9200794 DOI: 10.1038/s41598-022-14127-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
Collapse
|
33
|
Yazar M, Ozbek P. Assessment of 13 in silico pathogenicity methods on cancer-related variants. Comput Biol Med 2022; 145:105434. [DOI: 10.1016/j.compbiomed.2022.105434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/04/2022] [Accepted: 03/20/2022] [Indexed: 11/03/2022]
|
34
|
Kebabci N, Timucin AC, Timucin E. Toward Compilation of Balanced Protein Stability Data Sets: Flattening the ΔΔ G Curve through Systematic Enrichment. J Chem Inf Model 2022; 62:1345-1355. [PMID: 35201762 DOI: 10.1021/acs.jcim.2c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Often studies analyzing stability data sets and/or predictors ignore neutral mutations and use a binary classification scheme labeling only destabilizing and stabilizing mutations. Recognizing that highly concentrated neutral mutations interfere with data set quality, we have explored three protein stability data sets: S2648, PON-tstab, and the symmetric Ssym that differ in size and quality. A characteristic leptokurtic shape in the ΔΔG distributions of all three data sets including the curated and symmetric ones was reported due to concentrated neutral mutations. To further investigate the impact of neutral mutations on ΔΔG predictions, we have comprehensively assessed the performance of 11 predictors on the PON-tstab data set. Correlation and error analyses showed that all of the predictors performed the best on the neutral mutations, while their performance became gradually worse as the ΔΔG of the mutations departed further from the neutral zone regardless of the direction, implying a bias toward dense mutations. To this end, after unraveling the role of concentrated neutral mutations in biases of stability data sets, we described a systematic enrichment approach to balance the ΔΔG distributions. Before enrichment, mutations were clustered based on their biochemical and/or structural features, and then three mutations were selected from every 2 kcal/mol of each cluster. Upon implementation of this approach by distinct clustering schemes, we generated five subsets varying in size and ΔΔG distributions. All subsets showed improved ΔΔG and frequency distributions. We ultimately reported that the errors toward enriched subsets were higher than those toward the parent data sets, confirming the enrichment of difficult-to-predict mutations in the subsets. In summary, we elaborated the prediction bias toward a concentrated neutral zone and also implemented a rational strategy to tackle this and other forms of biases. Ultimately, this study equipping us with an extended view of shortcomings of stability data sets is a step taken toward development of an unbiased predictor.
Collapse
Affiliation(s)
- Narod Kebabci
- Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Ahmet Can Timucin
- Department of Molecular Biology and Genetics, Faculty of Arts and Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Emel Timucin
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem University, Istanbul 34752, Turkey
| |
Collapse
|
35
|
Khan MNA, Miah MSU, Shahjalal M, Sarwar TB, Rokon MS. Predicting Young Imposter Syndrome Using Ensemble Learning. COMPLEXITY 2022; 2022:1-10. [DOI: 10.1155/2022/8306473] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Background. Imposter syndrome (IS), associated with self-doubt and fear despite clear accomplishments and competencies, is frequently detected in medical students and has a negative impact on their well-being. This study aimed to predict the students’ IS using the machine learning ensemble approach. Methods. This study was a cross-sectional design among medical students in Bangladesh. Data were collected from February to July 2020 through snowball sampling technique across medical colleges in Bangladesh. In this study, we employed three different machine learning techniques such as neural network, random forest, and ensemble learning to compare the accuracy of prediction of the IS. Results. In total, 500 students completed the questionnaire. We used the YIS scale to determine the presence of IS among medical students. The ensemble model has the highest accuracy of this predictive model, with 96.4%, while the individual accuracy of random forest and neural network is 93.5% and 96.3%, respectively. We used different performance matrices to compare the results of the models. Finally, we compared feature importance scores between neural network and random forest model. The top feature of the neural network model is Y7, and the top feature of the random forest model is Y2, which is second among the top features of the neural network model. Conclusions. Imposter syndrome is an emerging mental illness in Bangladesh and requires the immediate attention of researchers. For instance, in order to reduce the impact of IS, identifying key factors responsible for IS is an important step. Machine learning methods can be employed to identify the potential sources responsible for IS. Similarly, determining how each factor contributes to the IS condition among medical students could be a potential future direction.
Collapse
Affiliation(s)
- Md. Nafiul Alam Khan
- Institute of Mathematical Sciences, Faculty of Science, University of Malaya (UM), Kuala Lumpur, Malaysia
| | - M. Saef Ullah Miah
- Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang (UMP), Pekan 26600, Malaysia
| | - Md. Shahjalal
- Department of Public Health, North South University (NSU), Dhaka, Bangladesh
| | - Talha Bin Sarwar
- Department of Computer Science, Faculty of Science and Technology, American International University Bangladesh (AIUB), Dhaka, Bangladesh
| | - Md. Shahariar Rokon
- Applied Statistics and Data Science, Department of Statistics, Jahangirnagar University (JU), Savar, Bangladesh
| |
Collapse
|
36
|
Houskeeper HF, Rosenthal IS, Cavanaugh KC, Pawlak C, Trouille L, Byrnes JEK, Bell TW, Cavanaugh KC. Automated satellite remote sensing of giant kelp at the Falkland Islands (Islas Malvinas). PLoS One 2022; 17:e0257933. [PMID: 34990455 PMCID: PMC8735600 DOI: 10.1371/journal.pone.0257933] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 12/20/2021] [Indexed: 11/30/2022] Open
Abstract
Giant kelp populations that support productive and diverse coastal ecosystems at temperate and subpolar latitudes of both hemispheres are vulnerable to changing climate conditions as well as direct human impacts. Observations of giant kelp forests are spatially and temporally uneven, with disproportionate coverage in the northern hemisphere, despite the size and comparable density of southern hemisphere kelp forests. Satellite imagery enables the mapping of existing and historical giant kelp populations in understudied regions, but automating the detection of giant kelp using satellite imagery requires approaches that are robust to the optical complexity of the shallow, nearshore environment. We present and compare two approaches for automating the detection of giant kelp in satellite datasets: one based on crowd sourcing of satellite imagery classifications and another based on a decision tree paired with a spectral unmixing algorithm (automated using Google Earth Engine). Both approaches are applied to satellite imagery (Landsat) of the Falkland Islands or Islas Malvinas (FLK), an archipelago in the southern Atlantic Ocean that supports expansive giant kelp ecosystems. The performance of each method is evaluated by comparing the automated classifications with a subset of expert-annotated imagery (8 images spanning the majority of our continuous timeseries, cumulatively covering over 2,700 km of coastline, and including all relevant sensors). Using the remote sensing approaches evaluated herein, we present the first continuous timeseries of giant kelp observations in the FLK region using Landsat imagery spanning over three decades. We do not detect evidence of long-term change in the FLK region, although we observe a recent decline in total canopy area from 2017-2021. Using a nitrate model based on nearby ocean state measurements obtained from ships and incorporating satellite sea surface temperature products, we find that the area of giant kelp forests in the FLK region is positively correlated with the nitrate content observed during the prior year. Our results indicate that giant kelp classifications using citizen science are approximately consistent with classifications based on a state-of-the-art automated spectral approach. Despite differences in accuracy and sensitivity, both approaches find high interannual variability that impedes the detection of potential long-term changes in giant kelp canopy area, although recent canopy area declines are notable and should continue to be monitored carefully.
Collapse
Affiliation(s)
- Henry F. Houskeeper
- Department of Geography, University of California Los Angeles, Los Angeles, California, United States of America
| | - Isaac S. Rosenthal
- School for the Environment, University of Massachusetts Boston, Boston, Massachusetts, United States of America
| | - Katherine C. Cavanaugh
- Department of Geography, University of California Los Angeles, Los Angeles, California, United States of America
| | - Camille Pawlak
- Department of Geography, University of California Los Angeles, Los Angeles, California, United States of America
| | - Laura Trouille
- The Adler Planetarium, Chicago, Illinois, United States of America
| | - Jarrett E. K. Byrnes
- Department of Biology, University of Massachusetts Boston, Boston, Massachusetts, United States of America
| | - Tom W. Bell
- Applied Ocean Physics and Engineering, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, United States of America
| | - Kyle C. Cavanaugh
- Department of Geography, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
37
|
Wafula EK, Zhang H, Von Kuster G, Leebens-Mack JH, Honaas LA, dePamphilis CW. PlantTribes2: Tools for comparative gene family analysis in plant genomics. FRONTIERS IN PLANT SCIENCE 2022; 13:1011199. [PMID: 36798801 PMCID: PMC9928214 DOI: 10.3389/fpls.2022.1011199] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 12/02/2022] [Indexed: 05/12/2023]
Abstract
Plant genome-scale resources are being generated at an increasing rate as sequencing technologies continue to improve and raw data costs continue to fall; however, the cost of downstream analyses remains large. This has resulted in a considerable range of genome assembly and annotation qualities across plant genomes due to their varying sizes, complexity, and the technology used for the assembly and annotation. To effectively work across genomes, researchers increasingly rely on comparative genomic approaches that integrate across plant community resources and data types. Such efforts have aided the genome annotation process and yielded novel insights into the evolutionary history of genomes and gene families, including complex non-model organisms. The essential tools to achieve these insights rely on gene family analysis at a genome-scale, but they are not well integrated for rapid analysis of new data, and the learning curve can be steep. Here we present PlantTribes2, a scalable, easily accessible, highly customizable, and broadly applicable gene family analysis framework with multiple entry points including user provided data. It uses objective classifications of annotated protein sequences from existing, high-quality plant genomes for comparative and evolutionary studies. PlantTribes2 can improve transcript models and then sort them, either genome-scale annotations or individual gene coding sequences, into pre-computed orthologous gene family clusters with rich functional annotation information. Then, for gene families of interest, PlantTribes2 performs downstream analyses and customizable visualizations including, (1) multiple sequence alignment, (2) gene family phylogeny, (3) estimation of synonymous and non-synonymous substitution rates among homologous sequences, and (4) inference of large-scale duplication events. We give examples of PlantTribes2 applications in functional genomic studies of economically important plant families, namely transcriptomics in the weedy Orobanchaceae and a core orthogroup analysis (CROG) in Rosaceae. PlantTribes2 is freely available for use within the main public Galaxy instance and can be downloaded from GitHub or Bioconda. Importantly, PlantTribes2 can be readily adapted for use with genomic and transcriptomic data from any kind of organism.
Collapse
Affiliation(s)
- Eric K Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Huiting Zhang
- Tree Fruit Research Laboratory, United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Wenatchee, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Gregory Von Kuster
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, United States
| | | | - Loren A Honaas
- Tree Fruit Research Laboratory, United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Wenatchee, WA, United States
| | - Claude W dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
38
|
Kanda K, Blythe S, Grace R, Elcombe E, Kemp L. Variations in sustained home visiting care for mothers and children experiencing adversity. Public Health Nurs 2021; 39:71-81. [PMID: 34862813 PMCID: PMC9299687 DOI: 10.1111/phn.13014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 11/01/2021] [Accepted: 11/03/2021] [Indexed: 11/30/2022]
Abstract
Objective This study aimed to examine the variations in care received by mothers and families within a sustained home visiting program. We sought to identify the extent to which there were variations in home visiting care in response to the program schedule and families’ risk factors. Design and sample Data collected within the right@home program, a randomized controlled trial (RCT) for a sustained nurse home visiting intervention in Australia, were analyzed. A total of 352 women comprised the intervention arm of the trial. Measurements Visit content in the home visiting program, sociodemographic data, and families’ risk factors were used for analysis. Results Our results confirmed that the majority of women received scheduled content on time or within an acceptable timeframe, except for the sleeping program. Women with identified risks were significantly more likely to receive content related to those risks than women without those risks (smoking: Odds Ratio [OR] = 15.39 [95%CI 3.7–64.7], mental health: OR = 15.04 [1.8–124.0], domestic violence: OR = 4.07 [2.0–8.3], and drugs and alcohol: OR = 1.81 [1.1–3.0]). Conclusions The right@home program had high compliance with the scheduled content. Capacity development in responding to mothers with the risk of domestic violence and drugs and alcohol is recommended. Further research is required to explore the relationship between variations in care and critical outcomes.
Collapse
Affiliation(s)
- Kie Kanda
- School of Nursing and Midwifery, Western Sydney University, Translational Research and Social Innovation group, Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Stacy Blythe
- School of Nursing and Midwifery, Western Sydney University, Translational Research and Social Innovation group, Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Rebekah Grace
- Transforming early Education and Child Health, Translational Health Research Institute, Western Sydney University, Campbelltown, NSW, Australia
| | - Emma Elcombe
- School of Nursing and Midwifery, Western Sydney University, Translational Research and Social Innovation group, Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Lynn Kemp
- School of Nursing and Midwifery, Western Sydney University, Translational Research and Social Innovation group, Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
39
|
Kolbert Z, Lindermayr C. Computational prediction of NO-dependent posttranslational modifications in plants: Current status and perspectives. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2021; 167:851-861. [PMID: 34536898 DOI: 10.1016/j.plaphy.2021.09.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/04/2021] [Accepted: 09/08/2021] [Indexed: 05/11/2023]
Abstract
The perception and transduction of nitric oxide (NO) signal is achieved by NO-dependent posttranslational modifications (PTMs) among which S-nitrosation and tyrosine nitration has biological significance. In plants, 100-1000 S-nitrosated and tyrosine nitrated proteins have been identified so far by mass spectrometry. The determination of NO-modified protein targets/amino acid residues is often methodologically challenging. In the past decade, the growing demand for the knowledge of S-nitrosated or tyrosine nitrated sites has motivated the introduction of bioinformatics tools. For predicting S-nitrosation seven computational tools have been developed (GPS-SNO, SNOSite, iSNO-PseACC, iSNO-AAPAir, PSNO, PreSNO, RecSNO). Four predictors have been developed for indicating tyrosine nitration sites (GPS-YNO2, iNitro-Tyr, PredNTS, iNitroY-Deep), and one tool (DeepNitro) predicts both NO-dependent PTMs. The advantage of these computational tools is the fast provision of large amount of information. In this review, the available software tools have been tested on plant proteins in which S-nitrosated or tyrosine nitrated sites have been experimentally identified. The predictors showed distinct performance and there were differences from the experimental results partly due to the fact that the three-dimensional protein structure is not taken into account by the computational tools. Nevertheless, the predictors excellently establish experiments, and it is suggested to apply all available tools on target proteins and compare their results. In the future, computational prediction must be developed further to improve the precision with which S-nitrosation/tyrosine nitration-sites are identified.
Collapse
Affiliation(s)
- Zsuzsanna Kolbert
- Department of Plant Biology, University of Szeged, Közép fasor 52, 6726, Szeged, Hungary.
| | - Christian Lindermayr
- Institute of Biochemical Plant Pathology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstaedter Landstr. 1, D-85764, Oberschleißheim, München, Germany.
| |
Collapse
|
40
|
Khoruddin NA, Noorizhab MN, Teh LK, Mohd Yusof FZ, Salleh MZ. Pathogenic nsSNPs that increase the risks of cancers among the Orang Asli and Malays. Sci Rep 2021; 11:16158. [PMID: 34373545 PMCID: PMC8352870 DOI: 10.1038/s41598-021-95618-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 07/26/2021] [Indexed: 02/07/2023] Open
Abstract
Single-nucleotide polymorphisms (SNPs) are the most common genetic variations for various complex human diseases, including cancers. Genome-wide association studies (GWAS) have identified numerous SNPs that increase cancer risks, such as breast cancer, colorectal cancer, and leukemia. These SNPs were cataloged for scientific use. However, GWAS are often conducted on certain populations in which the Orang Asli and Malays were not included. Therefore, we have developed a bioinformatic pipeline to mine the whole-genome sequence databases of the Orang Asli and Malays to determine the presence of pathogenic SNPs that might increase the risks of cancers among them. Five different in silico tools, SIFT, PROVEAN, Poly-Phen-2, Condel, and PANTHER, were used to predict and assess the functional impacts of the SNPs. Out of the 80 cancer-related nsSNPs from the GWAS dataset, 52 nsSNPs were found among the Orang Asli and Malays. They were further analyzed using the bioinformatic pipeline to identify the pathogenic variants. Three nsSNPs; rs1126809 (TYR), rs10936600 (LRRC34), and rs757978 (FARP2), were found as the most damaging cancer pathogenic variants. These mutations alter the protein interface and change the allosteric sites of the respective proteins. As TYR, LRRC34, and FARP2 genes play important roles in numerous cellular processes such as cell proliferation, differentiation, growth, and cell survival; therefore, any impairment on the protein function could be involved in the development of cancer. rs1126809, rs10936600, and rs757978 are the important pathogenic variants that increase the risks of cancers among the Orang Asli and Malays. The roles and impacts of these variants in cancers will require further investigations using in vitro cancer models.
Collapse
Affiliation(s)
- Nurul Ain Khoruddin
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia
- Faculty of Applied Sciences, Universiti Teknologi MARA (UiTM), Shah Alam Campus, Selangor, Malaysia
| | - Mohd NurFakhruzzaman Noorizhab
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia
- Faculty of Pharmacy, Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia
| | - Lay Kek Teh
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia
- Faculty of Pharmacy, Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia
| | - Farida Zuraina Mohd Yusof
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia
- Faculty of Applied Sciences, Universiti Teknologi MARA (UiTM), Shah Alam Campus, Selangor, Malaysia
| | - Mohd Zaki Salleh
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia.
- Faculty of Pharmacy, Universiti Teknologi MARA (UiTM), Selangor Branch, Puncak Alam Campus, 42300, Puncak Alam, Selangor, Malaysia.
| |
Collapse
|
41
|
Prabhu BN, Kanchamreddy SH, Sharma AR, Bhat SK, Bhat PV, Kabekkodu SP, Satyamoorthy K, Rai PS. Conceptualization of functional single nucleotide polymorphisms of polycystic ovarian syndrome genes: an in silico approach. J Endocrinol Invest 2021; 44:1783-1793. [PMID: 33506367 PMCID: PMC8285346 DOI: 10.1007/s40618-021-01498-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 01/02/2021] [Indexed: 12/24/2022]
Abstract
PURPOSE Polycystic ovarian syndrome (PCOS) is a multi-faceted endocrinopathy frequently observed in reproductive-aged females, causing infertility. Cumulative evidence revealed that genetic and epigenetic variations, along with environmental factors, were linked with PCOS. Deciphering the molecular pathways of PCOS is quite complicated due to the availability of limited molecular information. Hence, to explore the influence of genetic variations in PCOS, we mapped the GWAS genes and performed a computational analysis to identify the SNPs and their impact on the coding and non-coding sequences. METHODS The causative genes of PCOS were searched using the GWAS catalog, and pathway analysis was performed using ClueGO. SNPs were extracted using an Ensembl genome browser, and missense variants were shortlisted. Further, the native and mutant forms of the deleterious SNPs were modeled using I-TASSER, Swiss-PdbViewer, and PyMOL. MirSNP, PolymiRTS, miRNASNP3, and SNP2TFBS, SNPInspector databases were used to find SNPs in the miRNA binding site and transcription factor binding site (TFBS), respectively. EnhancerDB and HaploReg were used to characterize enhancer SNPs. Linkage Disequilibrium (LD) analysis was performed using LDlink. RESULTS 25 PCOS genes showed interaction with 18 pathways. 7 SNPs were predicted to be deleterious using different pathogenicity predictions. 4 SNPs were found in the miRNA target site, TFBS, and enhancer sites and were in LD with reported PCOS GWAS SNPs. CONCLUSION Computational analysis of SNPs residing in PCOS genes may provide insight into complex molecular interactions among genes involved in PCOS pathophysiology. It may also aid in determining the causal variants and consequently contributing to predicting disease strategies.
Collapse
Affiliation(s)
- B N Prabhu
- Department of Biotechnology, Manipal School of Life Sciences, MAHE, Manipal, Karnataka, India
| | - S H Kanchamreddy
- Department of Biotechnology, Manipal School of Life Sciences, MAHE, Manipal, Karnataka, India
| | - A R Sharma
- Department of Biotechnology, Manipal School of Life Sciences, MAHE, Manipal, Karnataka, India
| | - S K Bhat
- Department of Obstetrics and Gynaecology, Dr. T.M.A Pai Hospital, MMMC, MAHE, Manipal, Karnataka, India
| | - P V Bhat
- Department of Obstetrics and Gynaecology, Dr. T.M.A Pai Hospital, MMMC, MAHE, Manipal, Karnataka, India
| | - S P Kabekkodu
- Department of Cell and Molecular Biology, Manipal School of Life Sciences, MAHE, Manipal, Karnataka, India
| | - K Satyamoorthy
- Department of Cell and Molecular Biology, Manipal School of Life Sciences, MAHE, Manipal, Karnataka, India
| | - P S Rai
- Department of Biotechnology, Manipal School of Life Sciences, MAHE, Manipal, Karnataka, India.
| |
Collapse
|
42
|
Yang Y, Zeng L, Vihinen M. PON-Sol2: Prediction of Effects of Variants on Protein Solubility. Int J Mol Sci 2021; 22:8027. [PMID: 34360790 PMCID: PMC8348231 DOI: 10.3390/ijms22158027] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/19/2021] [Accepted: 07/22/2021] [Indexed: 01/13/2023] Open
Abstract
Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein-solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (L.Z.)
| | - Lianjie Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (L.Z.)
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84 Lund, Sweden
| |
Collapse
|
43
|
Guéniche N, Huguet A, Bruyere A, Habauzit D, Le Hégarat L, Fardel O. Comparative in silico prediction of P-glycoprotein-mediated transport for 2010-2020 US FDA-approved drugs using six Web-tools. Biopharm Drug Dispos 2021; 42:393-398. [PMID: 34272891 DOI: 10.1002/bdd.2299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 06/28/2021] [Accepted: 07/08/2021] [Indexed: 01/08/2023]
Abstract
P-glycoprotein (P-gp) is an efflux pump implicated in pharmacokinetics and drug-drug interactions. The identification of its substrates is consequently an important issue, notably for drugs under development. For such a purpose, various in silico methods have been developed, but their relevance remains to be fully established. The present study was designed to get insight about this point, through determining the performance values of six freely accessible Web-tools (ADMETlab, AdmetSAR2.0, PgpRules, pkCSM, SwissADME and vNN-ADMET), computationally predicting P-gp-mediated transport. Using an external test set of 231 marketed drugs, approved over the 2010-2020 period by the US Food and Drug Administration and fully in vitro characterized for their P-gp substrate status, various performance parameters (including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under the receiver operating characteristics curve) were determined. They were found to rather poorly meet criteria commonly required for acceptable prediction, whatever the Web-tools were used alone or in combination. Predictions of being P-gp substrate or non-substrate by these online in silico methods may therefore be considered with caution.
Collapse
Affiliation(s)
- Nelly Guéniche
- Inserm, EHESP, IRSET (Institut de Recherche en Santé, Environnement et Travail), Université de Rennes, Rennes, France.,Fougères Laboratory, Toxicology of Contaminants Unit, ANSES (French Agency for Food, Environmental and Occupational Health and Safety), Fougères, France
| | - Antoine Huguet
- Fougères Laboratory, Toxicology of Contaminants Unit, ANSES (French Agency for Food, Environmental and Occupational Health and Safety), Fougères, France
| | - Arnaud Bruyere
- Inserm, EHESP, IRSET (Institut de Recherche en Santé, Environnement et Travail), Université de Rennes, Rennes, France
| | - Denis Habauzit
- Fougères Laboratory, Toxicology of Contaminants Unit, ANSES (French Agency for Food, Environmental and Occupational Health and Safety), Fougères, France
| | - Ludovic Le Hégarat
- Fougères Laboratory, Toxicology of Contaminants Unit, ANSES (French Agency for Food, Environmental and Occupational Health and Safety), Fougères, France
| | - Olivier Fardel
- CHU Rennes, Inserm, EHESP, IRSET (Institut de Recherche en Santé, Environnement et Travail), Université de Rennes, Rennes, France
| |
Collapse
|
44
|
Özkan S, Padilla N, de la Cruz X. Towards a New, Endophenotype-Based Strategy for Pathogenicity Prediction in BRCA1 and BRCA2: In Silico Modeling of the Outcome of HDR/SGE Assays for Missense Variants. Int J Mol Sci 2021; 22:6226. [PMID: 34207612 PMCID: PMC8229251 DOI: 10.3390/ijms22126226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 05/27/2021] [Accepted: 06/04/2021] [Indexed: 11/28/2022] Open
Abstract
The present limitations in the pathogenicity prediction of BRCA1 and BRCA2 (BRCA1/2) missense variants constitute an important problem with negative consequences for the diagnosis of hereditary breast and ovarian cancer. However, it has been proposed that the use of endophenotype predictions, i.e., computational estimates of the outcomes of functional assays, can be a good option to address this bottleneck. The application of this idea to the BRCA1/2 variants in the CAGI 5-ENIGMA international challenge has shown promising results. Here, we developed this approach, exploring the predictive performances of the regression models applied to the BRCA1/2 variants for which the values of the homology-directed DNA repair and saturation genome editing assays are available. Our results first showed that we can generate endophenotype estimates using a few molecular-level properties. Second, we show that the accuracy of these estimates is enough to obtain pathogenicity predictions comparable to those of many standard tools. Third, endophenotype-based predictions are complementary to, but do not outperform, those of a Random Forest model trained using variant pathogenicity annotations instead of endophenotype values. In summary, our results confirmed the usefulness of the endophenotype approach for the pathogenicity prediction of the BRCA1/2 missense variants, suggesting different options for future improvements.
Collapse
Affiliation(s)
- Selen Özkan
- Research Unit in Clinical and Translational Bioinformatics, Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain; (S.Ö.); (N.P.)
| | - Natàlia Padilla
- Research Unit in Clinical and Translational Bioinformatics, Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain; (S.Ö.); (N.P.)
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain; (S.Ö.); (N.P.)
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
45
|
Adelaja A, Taylor B, Sheu KM, Liu Y, Luecke S, Hoffmann A. Six distinct NFκB signaling codons convey discrete information to distinguish stimuli and enable appropriate macrophage responses. Immunity 2021; 54:916-930.e7. [PMID: 33979588 PMCID: PMC8184127 DOI: 10.1016/j.immuni.2021.04.011] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 12/21/2020] [Accepted: 04/13/2021] [Indexed: 12/12/2022]
Abstract
Macrophages initiate inflammatory responses via the transcription factor NFκB. The temporal pattern of NFκB activity determines which genes are expressed and thus, the type of response that ensues. Here, we examined how information about the stimulus is encoded in the dynamics of NFκB activity. We generated an mVenus-RelA reporter mouse line to enable high-throughput live-cell analysis of primary macrophages responding to host- and pathogen-derived stimuli. An information-theoretic workflow identified six dynamical features-termed signaling codons-that convey stimulus information to the nucleus. In particular, oscillatory trajectories were a hallmark of responses to cytokine but not pathogen-derived stimuli. Single-cell imaging and RNA sequencing of macrophages from a mouse model of Sjögren's syndrome revealed inappropriate responses to stimuli, suggestive of confusion of two NFκB signaling codons. Thus, the dynamics of NFκB signaling classify immune threats through six signaling codons, and signal confusion based on defective codon deployment may underlie the etiology of some inflammatory diseases.
Collapse
Affiliation(s)
- Adewunmi Adelaja
- Institute for Quantitative and Computational Biosciences (QCBio), Molecular Biology Institute (MBI), and Department of Microbiology, Immunology, and Molecular Genetics (MIMG), University of California, Los Angeles (UCLA), 611 Charles E. Young Dr S, Los Angeles, CA 90093
| | - Brooks Taylor
- Institute for Quantitative and Computational Biosciences (QCBio), Molecular Biology Institute (MBI), and Department of Microbiology, Immunology, and Molecular Genetics (MIMG), University of California, Los Angeles (UCLA), 611 Charles E. Young Dr S, Los Angeles, CA 90093
| | - Katherine M Sheu
- Institute for Quantitative and Computational Biosciences (QCBio), Molecular Biology Institute (MBI), and Department of Microbiology, Immunology, and Molecular Genetics (MIMG), University of California, Los Angeles (UCLA), 611 Charles E. Young Dr S, Los Angeles, CA 90093
| | - Yi Liu
- Institute for Quantitative and Computational Biosciences (QCBio), Molecular Biology Institute (MBI), and Department of Microbiology, Immunology, and Molecular Genetics (MIMG), University of California, Los Angeles (UCLA), 611 Charles E. Young Dr S, Los Angeles, CA 90093
| | - Stefanie Luecke
- Institute for Quantitative and Computational Biosciences (QCBio), Molecular Biology Institute (MBI), and Department of Microbiology, Immunology, and Molecular Genetics (MIMG), University of California, Los Angeles (UCLA), 611 Charles E. Young Dr S, Los Angeles, CA 90093
| | - Alexander Hoffmann
- Institute for Quantitative and Computational Biosciences (QCBio), Molecular Biology Institute (MBI), and Department of Microbiology, Immunology, and Molecular Genetics (MIMG), University of California, Los Angeles (UCLA), 611 Charles E. Young Dr S, Los Angeles, CA 90093.
| |
Collapse
|
46
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
47
|
Zhao X, Yao H, Li X. Unearthing of Key Genes Driving the Pathogenesis of Alzheimer's Disease via Bioinformatics. Front Genet 2021; 12:641100. [PMID: 33936168 PMCID: PMC8085575 DOI: 10.3389/fgene.2021.641100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 03/15/2021] [Indexed: 01/23/2023] Open
Abstract
Alzheimer’s disease (AD) is a neurodegenerative disease with unelucidated molecular pathogenesis. Herein, we aimed to identify potential hub genes governing the pathogenesis of AD. The AD datasets of GSE118553 and GSE131617 were collected from the NCBI GEO database. The weighted gene coexpression network analysis (WGCNA), differential gene expression analysis, and functional enrichment analysis were performed to reveal the hub genes and verify their role in AD. Hub genes were validated by machine learning algorithms. We identified modules and their corresponding hub genes from the temporal cortex (TC), frontal cortex (FC), entorhinal cortex (EC), and cerebellum (CE). We obtained 33, 42, 42, and 41 hub genes in modules associated with AD in TC, FC, EC, and CE tissues, respectively. Significant differences were recorded in the expression levels of hub genes between AD and the control group in the TC and EC tissues (P < 0.05). The differences in the expressions of FCGRT, SLC1A3, PTN, PTPRZ1, and PON2 in the FC and CE tissues among the AD and control groups were significant (P < 0.05). The expression levels of PLXNB1, GRAMD3, and GJA1 were statistically significant between the Braak NFT stages of AD. Overall, our study uncovered genes that may be involved in AD pathogenesis and revealed their potential for the development of AD biomarkers and appropriate AD therapeutics targets.
Collapse
Affiliation(s)
- Xingxing Zhao
- Department of Neurology, Bethune Hospital Affiliated to Shanxi Medical University, Taiyuan, China.,Department of Cardiology, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Hongmei Yao
- Department of Cardiology, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Xinyi Li
- Department of Neurology, Bethune Hospital Affiliated to Shanxi Medical University, Taiyuan, China
| |
Collapse
|
48
|
Podlewska S, Kurczab R. Mutual Support of Ligand- and Structure-Based Approaches-To What Extent We Can Optimize the Power of Predictive Model? Case Study of Opioid Receptors. Molecules 2021; 26:molecules26061607. [PMID: 33799356 PMCID: PMC7998793 DOI: 10.3390/molecules26061607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 03/10/2021] [Accepted: 03/11/2021] [Indexed: 11/16/2022] Open
Abstract
The process of modern drug design would not exist in the current form without computational methods. They are part of every stage of the drug design pipeline, supporting the search and optimization of new bioactive substances. Nevertheless, despite the great help that is offered by in silico strategies, the power of computational methods strongly depends on the input data supplied at the stage of the predictive model construction. The studies on the efficiency of the computational protocols most often focus on global efficiency. They use general parameters that refer to the whole dataset, such as accuracy, precision, mean squared error, etc. In the study, we examined machine learning predictions obtained for opioid receptors (mu, kappa, delta) and focused on cases for which the predictions were the most accurate and the least accurate. Moreover, by using docking, we tried to explain prediction errors. We attempted to develop a rule of thumb, which can help in the prediction of compound activity towards opioid receptors via docking, especially those that have been incorrectly predicted by machine learning. We found out that although the combination of ligand- and structure-based path can be beneficial for the prediction accuracy, there still remain cases that cannot be reliably predicted by any available modeling method. In addition to challenging ligand- and structure-based predictions, we also examined the role of the application of machine-learning methods in comparison to simple statistical methods for both standard ligand-based representations (molecular fingerprints) and interaction fingerprints. All approaches were confronted in both classification (where compounds were assigned to the group of active and inactive group constructed on the basis of Ki values) and regression (where exact Ki value was predicted) experiments.
Collapse
Affiliation(s)
- Sabina Podlewska
- Department of Technology and Biotechnology of Drugs, Jagiellonian University, Medical College, 9 Medyczna Street, 30-688 Cracow, Poland;
- Maj Institute of Pharmacology, Polish Academy of Sciences, 12 Smętna Street, 31-343 Cracow, Poland
| | - Rafał Kurczab
- Maj Institute of Pharmacology, Polish Academy of Sciences, 12 Smętna Street, 31-343 Cracow, Poland
- Correspondence: ; Tel.: +48-1266-23-301
| |
Collapse
|
49
|
Hyperspectral Image Spectral–Spatial Classification Method Based on Deep Adaptive Feature Fusion. REMOTE SENSING 2021. [DOI: 10.3390/rs13040746] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Convolutional neural networks (CNNs) have been widely used in hyperspectral image (HSI) classification. Many algorithms focus on the deep extraction of a single kind of feature to improve classification. There have been few studies on the deep extraction of two or more kinds of fusion features and the combination of spatial and spectral features for classification. The authors of this paper propose an HSI spectral–spatial classification method based on deep adaptive feature fusion (SSDF). This method first implements the deep adaptive fusion of two hyperspectral features, and then it performs spectral–spatial classification on the fused features. In SSDF, a U-shaped deep network model with the principal component features as the model input and the edge features as the model label is designed to adaptively fuse two kinds of different features. One comprises the edge features of the HSIs extracted by the guided filter, and the other comprises the principal component features obtained by dimensionality reduction of HSIs using principal component analysis. The fused new features are input into a multi-scale and multi-level feature extraction model for further extraction of deep features, which are then combined with the spectral features extracted by the long short-term memory (LSTM) model for classification. The experimental results on three datasets demonstrated that the performance of the proposed SSDF was superior to several state-of-the-art methods. Additionally, SSDF was found to be able to perform best as the number of training samples decreased sharply, and it could also obtain a high classification accuracy for categories with few samples.
Collapse
|
50
|
Prasanna A, Niranjan V. Clin-mNGS: Automated Pipeline for Pathogen Detection from Clinical Metagenomic Data. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200608130029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Background:
Since bacteria are the earliest known organisms, there has been significant
interest in their variety and biology, most certainly concerning human health. Recent advances in
Metagenomics sequencing (mNGS), a culture-independent sequencing technology, have facilitated
an accelerated development in clinical microbiology and our understanding of pathogens.
Objective:
For the implementation of mNGS in routine clinical practice to become feasible, a
practical and scalable strategy for the study of mNGS data is essential. This study presents a robust
automated pipeline to analyze clinical metagenomic data for pathogen identification and
classification.
Method:
The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible,
and user-friendly framework scripted using the Snakemake workflow management software. The
implementation avoids the hassle of manual installation and configuration of the multiple commandline
tools and dependencies. The approach directly screens pathogens from clinical raw reads and
generates consolidated reports for each sample.
Results:
The pipeline is demonstrated using publicly available data and is tested on a desktop Linux
system and a High-performance cluster. The study compares variability in results from different
tools and versions. The versions of the tools are made user modifiable. The pipeline results in quality
check, filtered reads, host subtraction, assembled contigs, assembly metrics, relative abundances of
bacterial species, antimicrobial resistance genes, plasmid finding, and virulence factors
identification. The results obtained from the pipeline are evaluated based on sensitivity and positive
predictive value.
Conclusion:
Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial
clinical metagenomics reads to perform taxonomic classification and antimicrobial resistance
prediction.
Collapse
Affiliation(s)
- Akshatha Prasanna
- Department of Biotechnology, Rashtreeya Vidyalaya College of Engineering, Bengaluru,India
| | - Vidya Niranjan
- Department of Biotechnology, Rashtreeya Vidyalaya College of Engineering, Bengaluru,India
| |
Collapse
|