1
|
Li B, Liu L, Xu Z, Li K. Optimizing carbon source addition to control surplus sludge yield via machine learning-based interpretable ensemble model. ENVIRONMENTAL RESEARCH 2024; 267:120653. [PMID: 39701344 DOI: 10.1016/j.envres.2024.120653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/09/2024] [Accepted: 12/16/2024] [Indexed: 12/21/2024]
Abstract
Appropriate carbon source addition can save operational costs and reduce surplus sludge yield in the wastewater treatment plant (WWTP). However, the link between carbon source and surplus sludge yield remains neglected although machine learning (ML) has become a powerful tool for WWTP, and is a challenge due to more complex multidimensional pattern recognition. Herein, weighted average ensemble strategy was conducted to assemble multiple diverse basic models to obtain better prediction capability to optimize carbon source addition (Model-1) and further control surplus sludge yield (Model-2). The ensemble models significantly outperformed all single models with MAE of 5.82 g/m3, MSE of 60.59 and R2 value of 0.98 in Model-1 and MAE of 15.09 g/m3, MSE of 449.01 and R2 value of 0.93 in Model-2. The optimal input feature subset was explored to reduce model complexity, indicating that the final ensemble models can predict with high precision using relatively few features with MAE of 6.41 g/m3, MSE of 78.49 and R2 value of 0.97 in Model-1 and MAE of 12.82 g/m3, MSE of 232.71 and R2 value of 0.95 in Model-2. Furthermore, the final models were deployed into an offline web application to facilitate their utility in real-world settings, demonstrating 47.25 % savings in carbon source addition and 15.89 % reductions in surplus sludge yield for an extra month of running. This work offers an efficient approach for the WWTP to optimize carbon source addition and provides new insights into controlling surplus sludge yield.
Collapse
Affiliation(s)
- Bowen Li
- College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China; MOE Key Laboratory of Pollution Processes and Environmental Criteria, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, Tianjin Key Laboratory of Environmental Technology for Complex Trans-Media Pollution, Nankai University, Tianjin, 300350, China
| | - Li Liu
- Tianjin Medical University General Hospital, Tianjin, 300052, China
| | - Zikang Xu
- College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China; MOE Key Laboratory of Pollution Processes and Environmental Criteria, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, Tianjin Key Laboratory of Environmental Technology for Complex Trans-Media Pollution, Nankai University, Tianjin, 300350, China
| | - Kexun Li
- College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China; MOE Key Laboratory of Pollution Processes and Environmental Criteria, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, Tianjin Key Laboratory of Environmental Technology for Complex Trans-Media Pollution, Nankai University, Tianjin, 300350, China.
| |
Collapse
|
2
|
Moth CW, Sheehan JH, Mamun AA, Sivley RM, Gulsevin A, Rinker D, Capra JA, Meiler J. VUStruct: a compute pipeline for high throughput and personalized structural biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.06.606224. [PMID: 39149406 PMCID: PMC11326201 DOI: 10.1101/2024.08.06.606224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Effective diagnosis and treatment of rare genetic disorders requires the interpretation of a patient's genetic variants of unknown significance (VUSs). Today, clinical decision-making is primarily guided by gene-phenotype association databases and DNA-based scoring methods. Our web-accessible variant analysis pipeline, VUStruct, supplements these established approaches by deeply analyzing the downstream molecular impact of variation in context of 3D protein structure. VUStruct's growing impact is fueled by the co-proliferation of protein 3D structural models, gene sequencing, compute power, and artificial intelligence. Contextualizing VUSs in protein 3D structural models also illuminates longitudinal genomics studies and biochemical bench research focused on VUS, and we created VUStruct for clinicians and researchers alike. We now introduce VUStruct to the broad scientific community as a mature, web-facing, extensible, High Performance Computing (HPC) software pipeline. VUStruct maps missense variants onto automatically selected protein structures and launches a broad range of analyses. These include energy-based assessments of protein folding and stability, pathogenicity prediction through spatial clustering analysis, and machine learning (ML) predictors of binding surface disruptions and nearby post-translational modification sites. The pipeline also considers the entire input set of VUS and identifies genes potentially involved in digenic disease. VUStruct's utility in clinical rare disease genome interpretation has been demonstrated through its analysis of over 175 Undiagnosed Disease Network (UDN) Patient cases. VUStruct-leveraged hypotheses have often informed clinicians in their consideration of additional patient testing, and we report here details from two cases where VUStruct was key to their solution. We also note successes with academic research collaborators, for whom VUStruct has informed research directions in both computational genomics and wet lab studies.
Collapse
Affiliation(s)
- Christopher W. Moth
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
| | - Jonathan H. Sheehan
- Division of Infection Diseases, Milliken Dept. of Internal Medicine, Washington Univ. of Medicine in St. Louis, MO 63110, USA
| | - Abdullah Al Mamun
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
| | | | - Alican Gulsevin
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - David Rinker
- Department of Biological Sciences, Evolutionary Studies Initiative; Vanderbilt Univ., Nashville, TN 37232, USA
| | - John A. Capra
- Bakar Computational Health Science Institute and Department of Epidemiology and Biostatistics, Univ. of California San Francisco, CA 94143, USA
| | - Jens Meiler
- Departments of Chemistry, Pharmacology, and Biomedical Informatics; Center for Structural Biology and Institute of Chemical Biology; Vanderbilt Univ., Nashville, TN 37232, USA
- Leipzig University Medical School, Institute for Drug Discovery, Brüderstraße 34, 04103 Leipzig, Germany
| |
Collapse
|
3
|
Furuta Y, Tinker RJ, Gulsevin A, Neumann SM, Hamid R, Cogan JD, Rives L, Liu Q, Chen HC, Joos KM, Phillips JA. Probable digenic inheritance of Diamond-Blackfan anemia. Am J Med Genet A 2024; 194:e63454. [PMID: 37897121 DOI: 10.1002/ajmg.a.63454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/07/2023] [Accepted: 10/12/2023] [Indexed: 10/29/2023]
Abstract
A 26-year-old female proband with a clinical diagnosis and consistent phenotype of Diamond-Blackfan anemia (DBA, OMIM 105650) without an identified genotype was referred to the Undiagnosed Diseases Network. DBA is classically associated with monoallelic variants that have an autosomal-dominant or -recessive mode of inheritance. Intriguingly, her case was solved by a detection of a digenic interaction between non-allelic RPS19 and RPL27 variants. This was confirmed with a machine learning structural model, co-segregation analysis, and RNA sequencing. This is the first report of DBA caused by a digenic effect of two non-allelic variants demonstrated by machine learning structural model. This case suggests that atypical phenotypic presentations of DBA may be caused by digenic inheritance in some individuals. We also conclude that a machine learning structural model can be useful in detecting digenic models of possible interactions between products encoded by alleles of different genes inherited from non-affected carrier parents that can result in DBA with an unrealized 25% recurrence risk.
Collapse
Affiliation(s)
- Yutaka Furuta
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Rory J Tinker
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Alican Gulsevin
- Department of Chemistry, Center for Structural Biology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, Butler University, Indianapolis, Indiana, USA
| | - Serena M Neumann
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Rizwan Hamid
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Joy D Cogan
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lynette Rives
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Qi Liu
- Department of Biostatistics and Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Hua-Chang Chen
- Department of Biostatistics and Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Karen M Joos
- Vanderbilt Eye Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - John A Phillips
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
4
|
Nagy N, Pal M, Kun J, Galik B, Urban P, Medvecz M, Fabos B, Neller A, Abdolreza A, Danis J, Szabo V, Yang Z, Fenske S, Biel M, Gyenesei A, Adam E, Szell M. Missing Heritability in Albinism: Deep Characterization of a Hungarian Albinism Cohort Raises the Possibility of the Digenic Genetic Background of the Disease. Int J Mol Sci 2024; 25:1271. [PMID: 38279271 PMCID: PMC10817051 DOI: 10.3390/ijms25021271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 01/28/2024] Open
Abstract
Albinism is characterized by a variable degree of hypopigmentation affecting the skin and the hair, and causing ophthalmologic abnormalities. Its oculocutaneous, ocular and syndromic forms follow an autosomal or X-linked recessive mode of inheritance, and 22 disease-causing genes are implicated in their development. Our aim was to clarify the genetic background of a Hungarian albinism cohort. Using a 22-gene albinism panel, the genetic background of 11 of the 17 Hungarian patients was elucidated. In patients with unidentified genetic backgrounds (n = 6), whole exome sequencing was performed. Our investigations revealed a novel, previously unreported rare variant (N687S) of the two-pore channel two gene (TPCN2). The N687S variant of the encoded TPC2 protein is carried by a 15-year-old Hungarian male albinism patient and his clinically unaffected mother. Our segregational analysis and in vitro functional experiments suggest that the detected novel rare TPCN2 variant alone is not a disease-causing variant in albinism. Deep genetic analyses of the family revealed that the patient also carries a phenotype-modifying R305W variant of the OCA2 protein, and he is the only family member harboring this genotype. Our results raise the possibility that this digenic combination might contribute to the observed differences between the patient and the mother, and found the genetic background of the disease in his case.
Collapse
Affiliation(s)
- Nikoletta Nagy
- Department of Medical Genetics, University of Szeged, 6720 Szeged, Hungary; (M.P.); (A.N.); (A.A.); (E.A.); (M.S.)
- HUN-REN-SZTE Functional Clinical Genetic Research Group, 6720 Szeged, Hungary
| | - Margit Pal
- Department of Medical Genetics, University of Szeged, 6720 Szeged, Hungary; (M.P.); (A.N.); (A.A.); (E.A.); (M.S.)
- HUN-REN-SZTE Functional Clinical Genetic Research Group, 6720 Szeged, Hungary
| | - Jozsef Kun
- Hungarian Centre for Genomics and Bioinformatics, Szentagothai Research Centre, University of Pecs, 7624 Pecs, Hungary; (J.K.); (B.G.); (P.U.); (A.G.)
| | - Bence Galik
- Hungarian Centre for Genomics and Bioinformatics, Szentagothai Research Centre, University of Pecs, 7624 Pecs, Hungary; (J.K.); (B.G.); (P.U.); (A.G.)
| | - Peter Urban
- Hungarian Centre for Genomics and Bioinformatics, Szentagothai Research Centre, University of Pecs, 7624 Pecs, Hungary; (J.K.); (B.G.); (P.U.); (A.G.)
| | - Marta Medvecz
- Department of Dermatology, Venereology and Dermatooncology, Semmelweis University, 1095 Budapest, Hungary;
- ERN-Skin Reference Centre, Semmelweis University, 1095 Budapest, Hungary
| | - Beata Fabos
- Mor Kaposi Teaching Hospital of Somogy County, 7400 Kaposvar, Hungary;
| | - Alexandra Neller
- Department of Medical Genetics, University of Szeged, 6720 Szeged, Hungary; (M.P.); (A.N.); (A.A.); (E.A.); (M.S.)
| | - Aliasgari Abdolreza
- Department of Medical Genetics, University of Szeged, 6720 Szeged, Hungary; (M.P.); (A.N.); (A.A.); (E.A.); (M.S.)
| | - Judit Danis
- HUN-REN-SZTE Dermatological Research Group, 6720 Szeged, Hungary;
- Department of Immunology, University of Szeged, 6720 Szeged, Hungary
| | - Viktoria Szabo
- Department of Ophthalmology, Semmelweis University, 1085 Budapest, Hungary
| | - Zhuo Yang
- Department of Pharmacy, Center for Drug Research, Ludwig-Maximilians-Universität München, 81377 Munich, Germany (M.B.)
| | - Stefanie Fenske
- Department of Pharmacy, Center for Drug Research, Ludwig-Maximilians-Universität München, 81377 Munich, Germany (M.B.)
| | - Martin Biel
- Department of Pharmacy, Center for Drug Research, Ludwig-Maximilians-Universität München, 81377 Munich, Germany (M.B.)
| | - Attila Gyenesei
- Hungarian Centre for Genomics and Bioinformatics, Szentagothai Research Centre, University of Pecs, 7624 Pecs, Hungary; (J.K.); (B.G.); (P.U.); (A.G.)
| | - Eva Adam
- Department of Medical Genetics, University of Szeged, 6720 Szeged, Hungary; (M.P.); (A.N.); (A.A.); (E.A.); (M.S.)
- HUN-REN-SZTE Functional Clinical Genetic Research Group, 6720 Szeged, Hungary
| | - Marta Szell
- Department of Medical Genetics, University of Szeged, 6720 Szeged, Hungary; (M.P.); (A.N.); (A.A.); (E.A.); (M.S.)
- HUN-REN-SZTE Functional Clinical Genetic Research Group, 6720 Szeged, Hungary
| |
Collapse
|
5
|
Versbraegen N, Gravel B, Nachtegael C, Renaux A, Verkinderen E, Nowé A, Lenaerts T, Papadimitriou S. Faster and more accurate pathogenic combination predictions with VarCoPP2.0. BMC Bioinformatics 2023; 24:179. [PMID: 37127601 PMCID: PMC10152795 DOI: 10.1186/s12859-023-05291-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/14/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.
Collapse
Affiliation(s)
- Nassim Versbraegen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium.
| | - Barbara Gravel
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Charlotte Nachtegael
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Alexandre Renaux
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Emma Verkinderen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Tom Lenaerts
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Sofia Papadimitriou
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| |
Collapse
|