Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012;13:495-512. [PMID: 22247263 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

For:	Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012;13:495-512. [PMID: 22247263 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Number

Cited by Other Article(s)

Álvarez-Machancoses Ó, Faraggi E, deAndrés-Galiana EJ, Fernández-Martínez JL, Kloczkowski A. Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler. Curr Genomics 2024;25:171-184. [PMID: 39086995 PMCID: PMC11288160 DOI: 10.2174/0113892029236347240308054538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/03/2023] [Accepted: 09/22/2023] [Indexed: 08/02/2024] Open

Zhang M, Gong C, Ge F, Yu DJ. FCMSTrans: Accurate Prediction of Disease-Associated nsSNPs by Utilizing Multiscale Convolution and Deep Feature Combination within a Transformer Framework. J Chem Inf Model 2024;64:1394-1406. [PMID: 38349747 DOI: 10.1021/acs.jcim.3c02025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Abstract

Nonsynonymous single-nucleotide polymorphisms (nsSNPs), implicated in over 6000 diseases, necessitate accurate prediction for expedited drug discovery and improved disease diagnosis. In this study, we propose FCMSTrans, a novel nsSNP predictor that innovatively combines the transformer framework and multiscale modules for comprehensive feature extraction. The distinctive attribute of FCMSTrans resides in a deep feature combination strategy. This strategy amalgamates evolutionary-scale modeling (ESM) and ProtTrans (PT) features, providing an understanding of protein biochemical properties, and position-specific scoring matrix, secondary structure, predicted relative solvent accessibility, and predicted disorder (PSPP) features, which are derived from four protein sequences and structure-oriented characteristics. This feature combination offers a comprehensive view of the molecular dynamics involving nsSNPs. Our model employs the transformer's self-attention mechanisms across multiple layers, extracting higher-level and abstract representations. Simultaneously, varied-level features are captured by multiscale convolutions, enriching feature abstraction at multiple echelons. Our comparative analyses with existing methodologies highlight significant improvements made possible by the integrated feature fusion approach adopted in FCMSTrans. This is further substantiated by performance assessments based on diverse data sets, such as PredictSNP, MMP, and PMD, with areas under the curve (AUCs) of 0.869, 0.819, and 0.693, respectively. Furthermore, FCMSTrans shows robustness and superiority by outperforming the current best predictor, PROVEAN, in a blind test conducted on a third-party data set, achieving an impressive AUC score of 0.7838. The Python code of FCMSTrans is available at https://github.com/gc212/FCMSTrans for academic usage.

Collapse

Shahjahan, Dey JK, Dey SK. Translational bioinformatics approach to combat cardiovascular disease and cancers. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2024;139:221-261. [PMID: 38448136 DOI: 10.1016/bs.apcsb.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]

Abstract

Bioinformatics is an interconnected subject of science dealing with diverse fields including biology, chemistry, physics, statistics, mathematics, and computer science as the key fields to answer complicated physiological problems. Key intention of bioinformatics is to store, analyze, organize, and retrieve essential information about genome, proteome, transcriptome, metabolome, as well as organisms to investigate the biological system along with its dynamics, if any. The outcome of bioinformatics depends on the type, quantity, and quality of the raw data provided and the algorithm employed to analyze the same. Despite several approved medicines available, cardiovascular disorders (CVDs) and cancers comprises of the two leading causes of human deaths. Understanding the unknown facts of both these non-communicable disorders is inevitable to discover new pathways, find new drug targets, and eventually newer drugs to combat them successfully. Since, all these goals involve complex investigation and handling of various types of macro- and small- molecules of the human body, bioinformatics plays a key role in such processes. Results from such investigation has direct human application and thus we call this filed as translational bioinformatics. Current book chapter thus deals with diverse scope and applications of this translational bioinformatics to find cure, diagnosis, and understanding the mechanisms of CVDs and cancers. Developing complex yet small or long algorithms to address such problems is very common in translational bioinformatics. Structure-based drug discovery or AI-guided invention of novel antibodies that too with super-high accuracy, speed, and involvement of considerably low amount of investment are some of the astonishing features of the translational bioinformatics and its applications in the fields of CVDs and cancers.

Collapse

Capriotti E, Fariselli P. Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants. Hum Genet 2022;141:1649-1658. [DOI: 10.1007/s00439-021-02419-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/12/2021] [Indexed: 12/28/2022]

MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins. Comput Struct Biotechnol J 2021;19:6400-6416. [PMID: 34938415 PMCID: PMC8649221 DOI: 10.1016/j.csbj.2021.11.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/05/2021] [Accepted: 11/15/2021] [Indexed: 12/11/2022] Open

Abstract

•

Prediction of mutations in transmembrane proteins is of significance for diseases diagnosis.

•

Building on the evolutionary information, proposed the Gaussian WAPSSM algorithm.

•

Based on WAPSSM and sequence and structure-based features, proposed the cascade XGBoost algorithm.

•

Webserver is freely at (http://csbio.njust.edu.cn/bioinf/ffmsresmutp/).

•

Implement MutTMPredictor to predict mutations in transmembrane proteins.

Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, such as neurodegenerative diseases and cystic fibrosis. However, there are limited studies on mutations in transmembrane proteins. In this work, we first design a new feature encoding method, termed weight attenuation position-specific scoring matrix (WAPSSM), which builds upon the protein evolutionary information. Then, we propose a new mutation prediction algorithm (cascade XGBoost) by leveraging the idea learned from consensus predictors and gcForest. Multi-level experiments illustrate the effectiveness of WAPSSM and cascade XGBoost algorithms. Finally, based on WAPSSM and other three types of features, in combination with the cascade XGBoost algorithm, we develop a new transmembrane protein mutation predictor, named MutTMPredictor. We benchmark the performance of MutTMPredictor against several existing predictors on seven datasets. On the 546 mutations dataset, MutTMPredictor achieves the accuracy (ACC) of 0.9661 and the Matthew’s Correlation Coefficient (MCC) of 0.8950. While on the 67,584 dataset, MutTMPredictor achieves an MCC of 0.7523 and area under curve (AUC) of 0.8746, which are 0.1625 and 0.0801 respectively higher than those of the existing best predictor (fathmm). Besides, MutTMPredictor also outperforms two specific predictors on the Pred-MutHTP datasets. The results suggest that MutTMPredictor can be used as an effective method for predicting and prioritizing missense mutations in transmembrane proteins. The MutTMPredictor webserver and datasets are freely accessible at http://csbio.njust.edu.cn/bioinf/muttmpredictor/ for academic use.

Collapse

Tang YY, Wei PJ, Zhao JP, Xia J, Cao RF, Zheng CH. Identification of driver genes based on gene mutational effects and network centrality. BMC Bioinformatics 2021;22:457. [PMID: 34560840 PMCID: PMC8461858 DOI: 10.1186/s12859-021-04377-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022] Open

Periwal N, Rathod SB, Pal R, Sharma P, Nebhnani L, Barnwal RP, Arora P, Srivastava KR, Sood V. In silico characterization of mutations circulating in SARS-CoV-2 structural proteins. J Biomol Struct Dyn 2021;40:8216-8231. [PMID: 33797336 PMCID: PMC8043164 DOI: 10.1080/07391102.2021.1908170] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Benevenuta S, Capriotti E, Fariselli P. Calibrating variant-scoring methods for clinical decision making. Bioinformatics 2021;36:5709-5711. [PMID: 33492342 PMCID: PMC8023678 DOI: 10.1093/bioinformatics/btaa943] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 09/27/2020] [Accepted: 10/28/2020] [Indexed: 12/22/2022] Open

Ge F, Hu J, Zhu YH, Arif M, Yu DJ. TargetMM: Accurate Missense Mutation Prediction by Utilizing Local and Global Sequence Information with Classifier Ensemble. Comb Chem High Throughput Screen 2021;25:38-52. [DOI: 10.2174/1386207323666201204140438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/22/2020] [Accepted: 10/26/2020] [Indexed: 11/22/2022]

Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 2020;18:1968-1979. [PMID: 32774791 PMCID: PMC7397395 DOI: 10.1016/j.csbj.2020.07.011] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/13/2022] Open

Capriotti E, Montanucci L, Profiti G, Rossi I, Giannuzzi D, Aresu L, Fariselli P. Fido-SNP: the first webserver for scoring the impact of single nucleotide variants in the dog genome. Nucleic Acids Res 2020;47:W136-W141. [PMID: 31114899 PMCID: PMC6602425 DOI: 10.1093/nar/gkz420] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 04/19/2019] [Accepted: 05/06/2019] [Indexed: 12/22/2022] Open

Abolhassani H, Marcotte H, Fang M, Hammarström L. Clinical implications of experimental analyses of AID function on predictive computational tools: Challenge of missense variants. Clin Genet 2020;97:844-856. [PMID: 32162335 DOI: 10.1111/cge.13737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/27/2020] [Accepted: 03/03/2020] [Indexed: 11/30/2022]

Pharmacogenes (PGx-genes): Current understanding and future directions. Gene 2019;718:144050. [DOI: 10.1016/j.gene.2019.144050] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 12/14/2022]

Voskanian A, Katsonis P, Lichtarge O, Pejaver V, Radivojac P, Mooney SD, Capriotti E, Bromberg Y, Wang Y, Miller M, Martelli PL, Savojardo C, Babbi G, Casadio R, Cao Y, Sun Y, Shen Y, Garg A, Pal D, Yu Y, Huff CD, Tavtigian SV, Young E, Neuhausen SL, Ziv E, Pal LR, Andreoletti G, Brenner S, Kann MG. Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer. Hum Mutat 2019;40:1612-1622. [PMID: 31241222 PMCID: PMC6744287 DOI: 10.1002/humu.23849] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 05/23/2019] [Accepted: 06/21/2019] [Indexed: 01/22/2023]

Affiliation(s)

Alin Voskanian Department of Biological Sciences, University of Maryland, Baltimore County, MD, U.S.A
Panagiotis Katsonis Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, U.S.A
Olivier Lichtarge Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, U.S.A Department of Pharmacology, Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
Vikas Pejaver Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, U.S.A The eScience Institute, University of Washington, Seattle, Washington, U.S.A
Predrag Radivojac Khoury College of Computer and Information Sciences, Northeastern University, Boston, Massachusetts, U.S.A
Sean D. Mooney Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, U.S.A
Emidio Capriotti BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via Selmi 3, 40126 Bologna, Italy
Yana Bromberg Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, U.S.A Department of Genetics, Rutgers University, New Brunswick, New Jersey, U.S.A Technical University of Munich Institute for Advanced Study, (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany
Yanran Wang Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, U.S.A
Max Miller Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, U.S.A
Pier Luigi Martelli Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
Castrense Savojardo Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
Giulia Babbi Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
Rita Casadio Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
Yue Cao Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, U.S.A
Yuanfei Sun Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, U.S.A
Yang Shen Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, U.S.A
Aditi Garg Department of Computational and Data Sciences Indian Institute of Science, Bengaluru 560 012, India
Debnath Pal Department of Computational and Data Sciences Indian Institute of Science, Bengaluru 560 012, India
Yao Yu Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
Chad D. Huff Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
Sean V. Tavtigian Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84132, U.S.A
Erin Young Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84132, U.S.A
Susan L. Neuhausen Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, 91010 U.S.A
Elad Ziv Division of General Internal Medicine, Department of Medicine, Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA,U.S.A
Lipika R. Pal Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
Gaia Andreoletti Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
Steven Brenner Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
Maricel G. Kann Department of Biological Sciences, University of Maryland, Baltimore County, MD, U.S.A

Collapse

Bromberg Y, Capriotti E, Carter H. VarI-COSI 2018: a forum for research advances in variant interpretation and diagnostics. BMC Genomics 2019;20:550. [PMID: 31307380 PMCID: PMC6631439 DOI: 10.1186/s12864-019-5862-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Capriotti E, Fariselli P. PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants. Nucleic Acids Res 2019;45:W247-W252. [PMID: 28482034 PMCID: PMC5570245 DOI: 10.1093/nar/gkx369] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/24/2017] [Indexed: 12/15/2022] Open

Bhyan SB, Wee Y, Liu Y, Cummins S, Zhao M. Integrative analysis of common genes and driver mutations implicated in hormone stimulation for four cancers in women. PeerJ 2019;7:e6872. [PMID: 31205821 PMCID: PMC6556371 DOI: 10.7717/peerj.6872] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 03/28/2019] [Indexed: 12/11/2022] Open

Abstract

Cancer is one of the leading cause of death of women worldwide, and breast, ovarian, endometrial and cervical cancers contribute significantly to this every year. Developing early genetic-based diagnostic tools may be an effective approach to increase the chances of survival and provide more treatment opportunities. However, the current cancer genetic studies are mainly conducted independently and, hence lack of common driver genes involved in cancers in women. To explore the potential common molecular mechanism, we integrated four comprehensive literature-based databases to explore the shared implicated genetic effects. Using a total of 460 endometrial, 2,068 ovarian, 2,308 breast and 537 cervical cancer-implicated genes, we identified 52 genes which are common in all four types of cancers in women. Furthermore, we defined their potential functional role in endogenous hormonal regulation pathways within the context of four cancers in women. For example, these genes are strongly associated with hormonal stimulation, which may facilitate rapid diagnosis and treatment management decision making. Additional mutational analyses on combined the cancer genome atlas datasets consisting of 5,919 gynaecological and breast tumor samples were conducted to identify the frequently mutated genes across cancer types. For those common implicated genes for hormonal stimulants, we found that three quarter of 5,919 samples had genomic alteration with the highest frequency in MYC (22%), followed by NDRG1 (19%), ERBB2 (14%), PTEN (13%), PTGS2 (13%) and CDH1 (11%). We also identified 38 hormone related genes, eight of which are associated with the ovulation cycle. Further systems biology approach of the shared genes identified 20 novel genes, of which 12 were involved in the hormone regulation in these four cancers in women. Identification of common driver genes for hormone stimulation provided an unique angle of involving the potential of the hormone stimulants-related genes for cancer diagnosis and prognosis.

Collapse

Capriotti E, Ozturk K, Carter H. Integrating molecular networks with genetic variant interpretation for precision medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2018;11:e1443. [PMID: 30548534 PMCID: PMC6450710 DOI: 10.1002/wsbm.1443] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/23/2018] [Accepted: 10/30/2018] [Indexed: 02/01/2023]

Abstract

More reliable and cheaper sequencing technologies have revealed the vast mutational landscapes characteristic of many phenotypes. The analysis of such genetic variants has led to successful identification of altered proteins underlying many Mendelian disorders. Nevertheless the simple one‐variant one‐phenotype model valid for many monogenic diseases does not capture the complexity of polygenic traits and disorders. Although experimental and computational approaches have improved detection of functionally deleterious variants and important interactions between gene products, the development of comprehensive models relating genotype and phenotypes remains a challenge in the field of genomic medicine. In this context, a new view of the pathologic state as significant perturbation of the network of interactions between biomolecules is crucial for the identification of biochemical pathways associated with complex phenotypes. Seminal studies in systems biology combined the analysis of genetic variation with protein–protein interaction networks to demonstrate that even as biological systems evolve to be robust to genetic variation, their topologies create disease vulnerabilities. More recent analyses model the impact of genetic variants as changes to the “wiring” of the interactome to better capture heterogeneity in genotype–phenotype relationships. These studies lay the foundation for using networks to predict variant effects at scale using machine‐learning or algorithmic approaches. A wealth of databases and resources for the annotation of genotype–phenotype relationships have been developed to support developments in this area. This overview describes how study of the molecular interactome has generated insights linking the organization of biological systems to disease mechanism, and how this information can enable precision medicine.

This article is categorized under:

Translational, Genomic, and Systems Medicine > Translational Medicine

Biological Mechanisms > Cell Signaling

Models of Systems Properties and Processes > Mechanistic Models

Analytical and Computational Methods > Computational Methods

Collapse

Capriotti E, Martelli PL, Fariselli P, Casadio R. Blind prediction of deleterious amino acid variations with SNPs&GO. Hum Mutat 2017;38:1064-1071. [PMID: 28102005 PMCID: PMC5522651 DOI: 10.1002/humu.23179] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 11/08/2016] [Accepted: 01/10/2017] [Indexed: 01/09/2023]

Soualmia LF, Lecroq T. Bioinformatics Methods and Tools to Advance Clinical Care. Findings from the Yearbook 2015 Section on Bioinformatics and Translational Informatics. Yearb Med Inform 2017;10:170-3. [PMID: 26293864 DOI: 10.15265/iy-2015-026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

OBJECTIVES

To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain and clinical care.

METHOD

We provide a synopsis of the articles selected for the IMIA Yearbook 2015, from which we attempt to derive a synthetic overview of current and future activities in the field. As last year, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor has evaluated separately the set of 1,594 articles and the evaluation results were merged for retaining 15 articles for peer-review.

RESULTS

The selection and evaluation process of this Yearbook's section on Bioinformatics and Translational Informatics yielded four excellent articles regarding data management and genome medicine that are mainly tool-based papers. In the first article, the authors present PPISURV a tool for uncovering the role of specific genes in cancer survival outcome. The second article describes the classifier PredictSNP which combines six performing tools for predicting disease-related mutations. In the third article, by presenting a high-coverage map of the human proteome using high resolution mass spectrometry, the authors highlight the need for using mass spectrometry to complement genome annotation. The fourth article is also related to patient survival and decision support. The authors present datamining methods of large-scale datasets of past transplants. The objective is to identify chances of survival.

CONCLUSIONS

The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care. Indeed, there is a need for powerful tools for managing and interpreting complex, large-scale genomic and biological datasets, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts contribute to the challenge of impacting clinically the obtained results towards a personalized medicine.

Collapse

Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016;590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]

Bromberg Y, Capriotti E, Carter H. VarI-SIG 2015: methods for personalized medicine - the role of variant interpretation in research and diagnostics. BMC Genomics 2016;17 Suppl 2:425. [PMID: 27357578 PMCID: PMC4928159 DOI: 10.1186/s12864-016-2721-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Tang H, Thomas PD. Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation. Genetics 2016;203:635-47. [PMID: 27270698 PMCID: PMC4896183 DOI: 10.1534/genetics.116.190033] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 04/01/2016] [Indexed: 01/09/2023] Open

Bendl J, Musil M, Štourač J, Zendulka J, Damborský J, Brezovský J. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions. PLoS Comput Biol 2016;12:e1004962. [PMID: 27224906 PMCID: PMC4880439 DOI: 10.1371/journal.pcbi.1004962] [Citation(s) in RCA: 133] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 05/05/2016] [Indexed: 12/20/2022] Open

Abstract

An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.

Collapse

Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016;37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]

Mahmood ASMA, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS One 2016;11:e0152725. [PMID: 27073839 PMCID: PMC4830514 DOI: 10.1371/journal.pone.0152725] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 03/19/2016] [Indexed: 11/22/2022] Open

Douville C, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, Karchin R. Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel). Hum Mutat 2016;37:28-35. [PMID: 26442818 PMCID: PMC5057310 DOI: 10.1002/humu.22911] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 09/14/2015] [Indexed: 12/11/2022]

Cheng R, Leung RKK, Chen Y, Pan Y, Tong Y, Li Z, Ning L, Ling XB, He J. Virtual Pharmacist: A Platform for Pharmacogenomics. PLoS One 2015;10:e0141105. [PMID: 26496198 PMCID: PMC4619711 DOI: 10.1371/journal.pone.0141105] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 10/03/2015] [Indexed: 01/15/2023] Open

Regan K, Payne PRO. From Molecules to Patients: The Clinical Applications of Translational Bioinformatics. Yearb Med Inform 2015;10:164-9. [PMID: 26293863 PMCID: PMC4587059 DOI: 10.15265/iy-2015-005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Bromberg Y, Capriotti E. VarI-SIG 2014--From SNPs to variants: interpreting different types of genetic variants. BMC Genomics 2015;16 Suppl 8:I1. [PMID: 26110281 PMCID: PMC4480323 DOI: 10.1186/1471-2164-16-s8-i1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open

Tian R, Basu MK, Capriotti E. Computational methods and resources for the interpretation of genomic variants in cancer. BMC Genomics 2015;16 Suppl 8:S7. [PMID: 26111056 PMCID: PMC4480958 DOI: 10.1186/1471-2164-16-s8-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Fariselli P, Martelli PL, Savojardo C, Casadio R. INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics 2015;31:2816-21. [DOI: 10.1093/bioinformatics/btv291] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/02/2015] [Indexed: 12/22/2022] Open

Limongelli I, Marini S, Bellazzi R. PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinformatics 2015;16:123. [PMID: 25928477 PMCID: PMC4411653 DOI: 10.1186/s12859-015-0554-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2014] [Accepted: 01/15/2015] [Indexed: 12/31/2022] Open

Luxembourg B, D`Souza M, Körber S, Seifried E. Prediction of the pathogenicity of antithrombin sequence variations by in silico methods. Thromb Res 2015;135:404-9. [DOI: 10.1016/j.thromres.2014.11.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 10/30/2014] [Accepted: 11/30/2014] [Indexed: 10/24/2022]

Tian R, Basu MK, Capriotti E. ContrastRank: a new method for ranking putative cancer driver genes and classification of tumor samples. ACTA ACUST UNITED AC 2015;30:i572-8. [PMID: 25161249 PMCID: PMC4147919 DOI: 10.1093/bioinformatics/btu466] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Abstract

Motivation: The recent advance in high-throughput sequencing technologies is generating a huge amount of data that are becoming an important resource for deciphering the genotype underlying a given phenotype. Genome sequencing has been extensively applied to the study of the cancer genomes. Although a few methods have been already proposed for the detection of cancer-related genes, their automatic identification is still a challenging task. Using the genomic data made available by The Cancer Genome Atlas Consortium (TCGA), we propose a new prioritization approach based on the analysis of the distribution of putative deleterious variants in a large cohort of cancer samples.

Results: In this paper, we present ContastRank, a new method for the prioritization of putative impaired genes in cancer. The method is based on the comparison of the putative defective rate of each gene in tumor versus normal and 1000 genome samples. We show that the method is able to provide a ranked list of putative impaired genes for colon, lung and prostate adenocarcinomas. The list significantly overlaps with the list of known cancer driver genes previously published. More importantly, by using our scoring approach, we can successfully discriminate between TCGA normal and tumor samples. A binary classifier based on ContrastRank score reaches an overall accuracy >90% and the area under the curve (AUC) of receiver operating characteristics (ROC) >0.95 for all the three types of adenocarcinoma analyzed in this paper. In addition, using ContrastRank score, we are able to discriminate the three tumor types with a minimum overall accuracy of 77% and AUC of 0.83.

Conclusions: We describe ContrastRank, a method for prioritizing putative impaired genes in cancer. The method is based on the comparison of exome sequencing data from different cohorts and can detect putative cancer driver genes.

ContrastRank can also be used to estimate a global score for an individual genome about the risk of adenocarcinoma based on the genetic variants information from a whole-exome VCF (Variant Calling Format) file. We believe that the application of ContrastRank can be an important step in genomic medicine to enable genome-based diagnosis.

Availability and implementation: The lists of ContrastRank scores of all genes in each tumor type are available as supplementary materials. A webserver for evaluating the risk of the three studied adenocarcinomas starting from whole-exome VCF file is under development.

Contact:emidio@uab.edu

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Doncheva NT, Klein K, Morris JH, Wybrow M, Domingues FS, Albrecht M. Integrative visual analysis of protein sequence mutations. BMC Proc 2014;8:S2. [PMID: 25237389 PMCID: PMC4155609 DOI: 10.1186/1753-6561-8-s2-s2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Li B, Seligman C, Thusberg J, Miller JL, Auer J, Whirl-Carrillo M, Capriotti E, Klein TE, Mooney SD. In silico comparative characterization of pharmacogenomic missense variants. BMC Genomics 2014;15 Suppl 4:S4. [PMID: 25057096 PMCID: PMC4092878 DOI: 10.1186/1471-2164-15-s4-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

SNP-SIG 2013: from coding to non-coding--new approaches for genomic variant interpretation. BMC Genomics 2014;15 Suppl 4:S1. [PMID: 25056427 PMCID: PMC4083406 DOI: 10.1186/1471-2164-15-s4-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Pires DEV, Ascher DB, Blundell TL. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res 2014;42:W314-9. [PMID: 24829462 PMCID: PMC4086143 DOI: 10.1093/nar/gku411] [Citation(s) in RCA: 588] [Impact Index Per Article: 58.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Wu TJ, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014;2014:bau022. [PMID: 24667251 PMCID: PMC3965850 DOI: 10.1093/database/bau022] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Abstract

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies.

Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu

Collapse

Cole C, Krampis K, Karagiannis K, Almeida JS, Faison WJ, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics 2014;15:28. [PMID: 24467687 PMCID: PMC3916084 DOI: 10.1186/1471-2105-15-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 01/22/2014] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it.

RESULTS

To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr).

CONCLUSIONS

Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.

Collapse

Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J. PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 2014;10:e1003440. [PMID: 24453961 PMCID: PMC3894168 DOI: 10.1371/journal.pcbi.1003440] [Citation(s) in RCA: 529] [Impact Index Per Article: 52.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/03/2013] [Indexed: 02/07/2023] Open

Affiliation(s)

Jaroslav Bendl Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
Jan Stourac Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
Ondrej Salanda Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Antonin Pavelka Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
Eric D. Wieben Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, New York, United States of America
Jaroslav Zendulka Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Jan Brezovsky Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic * E-mail: (JB); (JD)
Jiri Damborsky Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic * E-mail: (JB); (JD)

Collapse

Compiani M, Capriotti E. Computational and theoretical methods for protein folding. Biochemistry 2013;52:8601-24. [PMID: 24187909 DOI: 10.1021/bi4001529] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One 2013;8:e77940. [PMID: 24194902 PMCID: PMC3806772 DOI: 10.1371/journal.pone.0077940] [Citation(s) in RCA: 94] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 09/05/2013] [Indexed: 12/02/2022] Open

Bromberg Y. Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 2013;425:3993-4005. [PMID: 23928561 DOI: 10.1016/j.jmb.2013.07.038] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 07/26/2013] [Accepted: 07/28/2013] [Indexed: 12/24/2022]

Capriotti E, Calabrese R, Fariselli P, Martelli PL, Altman RB, Casadio R. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation. BMC Genomics 2013;14 Suppl 3:S6. [PMID: 23819482 PMCID: PMC3665478 DOI: 10.1186/1471-2164-14-s3-s6] [Citation(s) in RCA: 216] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Capriotti E, Altman RB, Bromberg Y. Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 2013;14 Suppl 3:S2. [PMID: 23819846 PMCID: PMC3839641 DOI: 10.1186/1471-2164-14-s3-s2] [Citation(s) in RCA: 176] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

Background

In recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy.

Results

Here we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthew's correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor.

Conclusions

Here we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC. Meta-SNP server is freely accessible at http://snps.biofold.org/meta-snp.

Collapse

Bromberg Y, Capriotti E. Thoughts from SNP-SIG 2012: future challenges in the annotation of genetic variations. BMC Genomics 2013;14 Suppl 3:S1. [PMID: 23819751 PMCID: PMC3665538 DOI: 10.1186/1471-2164-14-s3-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 2013;29:1433-9. [PMID: 23564842 DOI: 10.1093/bioinformatics/btt156] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Hegyi H. GABBR1 has a HERV-W LTR in its regulatory region--a possible implication for schizophrenia. Biol Direct 2013;8:5. [PMID: 23391219 PMCID: PMC3574838 DOI: 10.1186/1745-6150-8-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Accepted: 02/04/2013] [Indexed: 11/25/2022] Open