1
|
Tosco-Herrera E, Muñoz-Barrera A, Jáspez D, Rubio-Rodríguez LA, Mendoza-Alvarez A, Rodriguez-Perez H, Jou J, Iñigo-Campos A, Corrales A, Ciuffreda L, Martinez-Bugallo F, Prieto-Morin C, García-Olivares V, González-Montelongo R, Lorenzo-Salazar JM, Marcelino-Rodriguez I, Flores C. Evaluation of a whole-exome sequencing pipeline and benchmarking of causal germline variant prioritizers. Hum Mutat 2022; 43:2010-2020. [PMID: 36054330 DOI: 10.1002/humu.24459] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 08/20/2022] [Accepted: 08/30/2022] [Indexed: 01/25/2023]
Abstract
Most causal variants of Mendelian diseases are exonic. Whole-exome sequencing (WES) has become the diagnostic gold standard, but causative variant prioritization constitutes a bottleneck. Here we assessed an in-house sample-to-sequence pipeline and benchmarked free prioritization tools for germline causal variants from WES data. WES of 61 unselected patients with a known genetic disease cause was obtained. Variant prioritizations were performed by diverse tools and recorded to obtain a diagnostic yield when the causal variant was present in the first, fifth, and 10th top rankings. A fraction of causal variants was not captured by WES (8.2%) or did not pass the quality control criteria (13.1%). Most of the applications inspected were unavailable or had technical limitations, leaving nine tools for complete evaluation. Exomiser performed best in the top first rankings, while LIRICAL led in the top fifth rankings. Based on the more conservative top 10th rankings, Xrare had the highest diagnostic yield, followed by a three-way tie among Exomiser, LIRICAL, and PhenIX, then followed by AMELIE, TAPES, Phen-Gen, AIVar, and VarNote-PAT. Xrare, Exomiser, LIRICAL, and PhenIX are the most efficient options for variant prioritization in real patient WES data.
Collapse
Affiliation(s)
- Eva Tosco-Herrera
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
| | - Adrián Muñoz-Barrera
- Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Luis A Rubio-Rodríguez
- Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Alejandro Mendoza-Alvarez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
| | - Hector Rodriguez-Perez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
| | - Jonathan Jou
- Department of Surgery, University of Illinois College of Medicine, Peoria, Illinois, USA
| | - Antonio Iñigo-Campos
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Almudena Corrales
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Laura Ciuffreda
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain
| | - Francisco Martinez-Bugallo
- Clinical Analysis Service, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain
| | - Carol Prieto-Morin
- Clinical Analysis Service, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain
| | - Víctor García-Olivares
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | | | - Jose Miguel Lorenzo-Salazar
- Escuela de Doctorado y Estudios de Posgrado de la Universidad de La Laguna (EDEPULL), Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | | | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria (HUNSC), Santa Cruz de Tenerife, Spain.,Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain.,Facultad de Ciencias de la Salud, Universidad Fernando Pessoa Canarias, Las Palmas de Gran Canaria, Spain
| |
Collapse
|
2
|
Zeng Z, Aptekmann AA, Bromberg Y. Decoding the effects of synonymous variants. Nucleic Acids Res 2021; 49:12673-12691. [PMID: 34850938 PMCID: PMC8682775 DOI: 10.1093/nar/gkab1159] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 11/02/2021] [Accepted: 11/08/2021] [Indexed: 12/12/2022] Open
Abstract
Synonymous single nucleotide variants (sSNVs) are common in the human genome but are often overlooked. However, sSNVs can have significant biological impact and may lead to disease. Existing computational methods for evaluating the effect of sSNVs suffer from the lack of gold-standard training/evaluation data and exhibit over-reliance on sequence conservation signals. We developed synVep (synonymous Variant effect predictor), a machine learning-based method that overcomes both of these limitations. Our training data was a combination of variants reported by gnomAD (observed) and those unreported, but possible in the human genome (generated). We used positive-unlabeled learning to purify the generated variant set of any likely unobservable variants. We then trained two sequential extreme gradient boosting models to identify subsets of the remaining variants putatively enriched and depleted in effect. Our method attained 90% precision/recall on a previously unseen set of variants. Furthermore, although synVep does not explicitly use conservation, its scores correlated with evolutionary distances between orthologs in cross-species variation analysis. synVep was also able to differentiate pathogenic vs. benign variants, as well as splice-site disrupting variants (SDV) vs. non-SDVs. Thus, synVep provides an important improvement in annotation of sSNVs, allowing users to focus on variants that most likely harbor effects.
Collapse
Affiliation(s)
- Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Ariel A Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|