Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Supernat A, Vidarsson OV, Steen VM, Stokowy T. Comparison of three variant callers for human whole genome sequencing. Sci Rep 2018;8:17851. [PMID: 30552369 PMCID: PMC6294778 DOI: 10.1038/s41598-018-36177-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 11/13/2018] [Indexed: 12/30/2022] Open

For:	Supernat A, Vidarsson OV, Steen VM, Stokowy T. Comparison of three variant callers for human whole genome sequencing. Sci Rep 2018;8:17851. [PMID: 30552369 PMCID: PMC6294778 DOI: 10.1038/s41598-018-36177-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 11/13/2018] [Indexed: 12/30/2022] Open

Number

Cited by Other Article(s)

Mahmood K, Sarup P, Oertelt L, Jahoor A, Orabi J. Assessing myBaits Target Capture Sequencing Methodology Using Short-Read Sequencing for Variant Detection in Oat Genomics and Breeding. Genes (Basel) 2024;15:700. [PMID: 38927635 PMCID: PMC11203172 DOI: 10.3390/genes15060700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 05/18/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open

Abstract

The integration of target capture systems with next-generation sequencing has emerged as an efficient tool for exploring specific genetic regions with a high resolution and facilitating the rapid discovery of novel alleles. Despite these advancements, the application of targeted sequencing methodologies, such as the myBaits technology, in polyploid oat species remains relatively unexplored. In this study, we utilized the myBaits target capture method offered by Daicel Arbor Biosciences to detect variants and assess their reliability for variant detection in oat genomics and breeding. Ten oat genotypes were carefully chosen for targeted sequencing, focusing on specific regions on chromosome 2A to detect variants. The selected region harbors 98 genes. Precisely designed baits targeting the genes within these regions were employed for the target capture sequencing. We employed various mappers and variant callers to identify variants. After the identification of variants, we focused on the variants identified via all variants callers to assess the applicability of the myBaits sequencing methodology in oat breeding. In our efforts to validate the identified variants, we focused on two SNPs, one deletion and one insertion identified via all variant callers in the genotypes KF-318 and NOS 819111-70 but absent in the remaining eight genotypes. The Sanger sequencing of targeted SNPs failed to reproduce target capture data obtained through the myBaits technology. Similarly, the validation of deletion and insertion variants via high-resolution melting (HRM) curve analysis also failed to reproduce target capture data, again suggesting limitations in the reliability of the myBaits target capture sequencing using short-read sequencing for variant detection in the oat genome. This study shed light on the importance of exercising caution when employing the myBaits target capture strategy for variant detection in oats. This study provides valuable insights for breeders seeking to advance oat breeding efforts and marker development using myBaits target capture sequencing, emphasizing the significance of methodological sequencing considerations in oat genomics research.

Collapse

Kalleberg J, Rissman J, Schnabel RD. Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589602. [PMID: 38659907 PMCID: PMC11042298 DOI: 10.1101/2024.04.15.589602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]

Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024;11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open

de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. CELL GENOMICS 2024;4:100527. [PMID: 38537634 PMCID: PMC11019364 DOI: 10.1016/j.xgen.2024.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/26/2023] [Accepted: 02/29/2024] [Indexed: 04/09/2024]

Affiliation(s)

Tristan V de Jong Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
Yanchao Pan Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
Pasi Rastas Institute of Biotechnology, University of Helsinki, Helsinki, Finland
Daniel Munro Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
Monika Tutaj Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Huda Akil Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
Chris Benner Department of Medicine, University of California San Diego, San Diego, CA, USA
Denghui Chen Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Apurva S Chitre Department of Psychiatry, University of California San Diego, San Diego, CA, USA
William Chow Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Vincenza Colonna Institute of Genetics and Biophysics, National Research Council, Naples, Italy; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Clifton L Dalgard Department of Anatomy, Physiology & Genetics, The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
Wendy M Demos Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Peter A Doris The Brown Foundation Institute of Molecular Medicine, Center for Human Genetics, University of Texas Health Science Center, Houston, TX, USA
Erik Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Aron M Geurts Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
Hakan M Gunturkun Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
Victor Guryev Genome Structure and Ageing, University of Groningen, UMC, Groningen, the Netherlands
Thibaut Hourlier European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Jun Huang Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
Ted Kalbfleisch Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
Panjun Kim Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Ling Li Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
Spencer Mahaffey Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Fergal J Martin European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
Pejman Mohammadi Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA; Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
Ayse Bilge Ozel Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
Oksana Polesskaya Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Michal Pravenec Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
Pjotr Prins Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Jonathan Sebat Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Jennifer R Smith Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Leah C Solberg Woods Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
Boris Tabakoff Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Alan Tracey Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Marcela Uliano-Silva Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Flavia Villani Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Hongyang Wang Department of Animal Sciences, Washington State University, Pullman, WA, USA
Burt M Sharp Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Francesca Telese Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Zhihua Jiang Department of Animal Sciences, Washington State University, Pullman, WA, USA
Laura Saba Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Xusheng Wang Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
Terence D Murphy National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Abraham A Palmer Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Anne E Kwitek Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Melinda R Dwinell Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Robert W Williams Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Jun Z Li Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
Hao Chen Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.

Collapse

Olszewska M, Malcher A, Stokowy T, Pollock N, Berman AJ, Budkiewicz S, Kamieniczna M, Jackowiak H, Suszynska-Zajczyk J, Jedrzejczak P, Yatsenko AN, Kurpisz M. Effects of Tcte1 knockout on energy chain transportation and spermatogenesis: implications for male infertility. Hum Reprod Open 2024;2024:hoae020. [PMID: 38650655 PMCID: PMC11035007 DOI: 10.1093/hropen/hoae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 03/08/2024] [Indexed: 04/25/2024] Open

Abstract

STUDY QUESTION

Is the Tcte1 mutation causative for male infertility?

SUMMARY ANSWER

Our collected data underline the complex and devastating effect of the single-gene mutation on the testicular molecular network, leading to male reproductive failure.

WHAT IS KNOWN ALREADY

Recent data have revealed mutations in genes related to axonemal dynein arms as causative for morphology and motility abnormalities in spermatozoa of infertile males, including dysplasia of fibrous sheath (DFS) and multiple morphological abnormalities in the sperm flagella (MMAF). The nexin-dynein regulatory complex (N-DRC) coordinates the dynein arm activity and is built from the DRC1-DRC7 proteins. DRC5 (TCTE1), one of the N-DRC elements, has already been reported as a candidate for abnormal sperm flagella beating; however, only in a restricted manner with no clear explanation of respective observations.

STUDY DESIGN SIZE DURATION

Using the CRISPR/Cas9 genome editing technique, a mouse Tcte1 gene knockout line was created on the basis of the C57Bl/6J strain. The mouse reproductive potential, semen characteristics, testicular gene expression levels, sperm ATP, and testis apoptosis level measurements were then assessed, followed by visualization of N-DRC proteins in sperm, and protein modeling in silico. Also, a pilot genomic sequencing study of samples from human infertile males (n = 248) was applied for screening of TCTE1 variants.

PARTICIPANTS/MATERIALS SETTING METHODS

To check the reproductive potential of KO mice, adult animals were crossed for delivery of three litters per caged pair, but for no longer than for 6 months, in various combinations of zygosity. All experiments were performed for wild-type (WT, control group), heterozygous Tcte1+/- and homozygous Tcte1-/- male mice. Gross anatomy was performed on testis and epididymis samples, followed by semen analysis. Sequencing of RNA (RNAseq; Illumina) was done for mice testis tissues. STRING interactions were checked for protein-protein interactions, based on changed expression levels of corresponding genes identified in the mouse testis RNAseq experiments. Immunofluorescence in situ staining was performed to detect the N-DRC complex proteins: Tcte1 (Drc5), Drc7, Fbxl13 (Drc6), and Eps8l1 (Drc3) in mouse spermatozoa. To determine the amount of ATP in spermatozoa, the luminescence level was measured. In addition, immunofluorescence in situ staining was performed to check the level of apoptosis via caspase 3 visualization on mouse testis samples. DNA from whole blood samples of infertile males (n = 137 with non-obstructive azoospermia or cryptozoospermia, n = 111 samples with a spectrum of oligoasthenoteratozoospermia, including n = 47 with asthenozoospermia) was extracted to perform genomic sequencing (WGS, WES, or Sanger). Protein prediction modeling of human-identified variants and the exon 3 structure deleted in the mouse knockout was also performed.

MAIN RESULTS AND THE ROLE OF CHANCE

No progeny at all was found for the homozygous males which were revealed to have oligoasthenoteratozoospermia, while heterozygous animals were fertile but manifested oligozoospermia, suggesting haploinsufficiency. RNA-sequencing of the testicular tissue showed the influence of Tcte1 mutations on the expression pattern of 21 genes responsible for mitochondrial ATP processing or linked with apoptosis or spermatogenesis. In Tcte1-/- males, the protein was revealed in only residual amounts in the sperm head nucleus and was not transported to the sperm flagella, as were other N-DRC components. Decreased ATP levels (2.4-fold lower) were found in the spermatozoa of homozygous mice, together with disturbed tail:midpiece ratios, leading to abnormal sperm tail beating. Casp3-positive signals (indicating apoptosis) were observed in spermatogonia only, at a similar level in all three mouse genotypes. Mutation screening of human infertile males revealed one novel and five ultra-rare heterogeneous variants (predicted as disease-causing) in 6.05% of the patients studied. Protein prediction modeling of identified variants revealed changes in the protein surface charge potential, leading to disruption in helix flexibility or its dynamics, thus suggesting disrupted interactions of TCTE1 with its binding partners located within the axoneme.

LARGE SCALE DATA

All data generated or analyzed during this study are included in this published article and its supplementary information files. RNAseq data are available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) under the accession number GSE207805. The results described in the publication are based on whole-genome or exome sequencing data which includes sensitive information in the form of patient-specific germline variants. Information regarding such variants must not be shared publicly following European Union legislation, therefore access to raw data that support the findings of this study are available from the corresponding author upon reasonable request.

LIMITATIONS REASONS FOR CAUTION

In the study, the in vitro fertilization performance of sperm from homozygous male mice was not checked.

WIDER IMPLICATIONS OF THE FINDINGS

This study contains novel and comprehensive data concerning the role of TCTE1 in male infertility. The TCTE1 gene is the next one that should be added to the 'male infertility list' because of its crucial role in spermatogenesis and proper sperm functioning.

STUDY FUNDING/COMPETING INTERESTS

This work was supported by National Science Centre in Poland, grants no.: 2015/17/B/NZ2/01157 and 2020/37/B/NZ5/00549 (to M.K.), 2017/26/D/NZ5/00789 (to A.M.), and HD096723, GM127569-03, NIH SAP #4100085736 PA DoH (to A.N.Y.). The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

Collapse

Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024;61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]

Abdelwahab O, Belzile F, Torkamaneh D. Performance analysis of conventional and AI-based variant callers using short and long reads. BMC Bioinformatics 2023;24:472. [PMID: 38097928 PMCID: PMC10720095 DOI: 10.1186/s12859-023-05596-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/04/2023] [Indexed: 12/18/2023] Open

Park H, Gim J. A comparative investigation of single nucleotide variant calling for a personal non-Caucasian sequencing sample. Genes Genomics 2023;45:1527-1536. [PMID: 37651066 DOI: 10.1007/s13258-023-01439-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023]

Abstract

BACKGROUND

Dropping cost and increasing clinical application of whole genome sequencing (WGS) lead a necessity of efficient (accurate and rapid) variant calling procedures from a personal WGS data (n = 1). A number of variant calling pipelines have been introduced utilizing the human genome reference GRCh38 as a reference and a benchmark dataset called 'NA12878', which are both 'standard' but limited ethnic origin. Considering the nature of variant calling algorithms and recent updates in sequencing protocol, however, it is necessary to revisit the efficiency of the current best pipelines for a personal WGS data from diverse ethnicity.

OBJECTIVE

We discuss the most efficient practices for variant calling of a personal WGS reads, with a particular emphasis on whether (1) ethnic match or mismatch between the reference genome and a WGS data produces a distinct result and more importantly (2) there is an ethnic-specific optimal workflow.

METHODS

Here, we generate an appropriate WGS data, DNA array, and sufficient number of Sanger validated variants from a single Korean subject to perform such a comprehensive comparison. We applied this WGS reads and the 'NA12878' reads to 8 different variant calling pipelines with 2 different reference genomes (GRCh38 and KOREF, a Korean reference genome) to which the WGS reads from different ethnic origins are aligned.

RESULTS

We evaluated the performance of the pipelines with the matched array genotype data and Sanger sequencing validation and demonstrated that: regardless to the ethnic match/mismatch (1) Novoalign-GATK4 showed the most efficient performance with the exceptional calls in MHC region; (2) the overall performance was better with GRCh38, while a significant difference in recall was observed. In addition, we found it is largely reduced computing cost maintaining performance to remove 'markduplication' step with PCR-free WGS data.

CONCLUSION

For variant calling of a personal PCR-free WGS data, regardless of ethnicity consideration, we recommend the use of the Novoalign + GATK4 with GRCh38 and without 'markduplication'.

Collapse

Xiang X, Lu B, Song D, Li J, Shu K, Pu D. Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data. Sci Rep 2023;13:20444. [PMID: 37993475 PMCID: PMC10665316 DOI: 10.1038/s41598-023-47135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open

Abstract

Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.

Collapse

Guhlin J, Le Lec MF, Wold J, Koot E, Winter D, Biggs PJ, Galla SJ, Urban L, Foster Y, Cox MP, Digby A, Uddstrom LR, Eason D, Vercoe D, Davis T, Howard JT, Jarvis ED, Robertson FE, Robertson BC, Gemmell NJ, Steeves TE, Santure AW, Dearden PK. Species-wide genomics of kākāpō provides tools to accelerate recovery. Nat Ecol Evol 2023;7:1693-1705. [PMID: 37640765 DOI: 10.1038/s41559-023-02165-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 07/11/2023] [Indexed: 08/31/2023]

Affiliation(s)

Joseph Guhlin Genomics Aotearoa, Biochemistry Department, School of Biomedical Sciences, University of Otago, Dunedin, Aotearoa New Zealand
Marissa F Le Lec Genomics Aotearoa, Biochemistry Department, School of Biomedical Sciences, University of Otago, Dunedin, Aotearoa New Zealand
Jana Wold School of Biological Sciences, University of Canterbury, Christchurch, Aotearoa New Zealand
Emily Koot The New Zealand Institute for Plant and Food Research Ltd, Palmerston North, Aotearoa New Zealand
David Winter School of Natural Sciences, Massey University, Palmerston North, Aotearoa New Zealand
Patrick J Biggs School of Natural Sciences, Massey University, Palmerston North, Aotearoa New Zealand School of Veterinary Science, Massey University, Palmerston North, Aotearoa New Zealand
Stephanie J Galla School of Biological Sciences, University of Canterbury, Christchurch, Aotearoa New Zealand Department of Biological Sciences, Boise State University, Boise, ID, USA
Lara Urban Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, Aotearoa New Zealand Helmholtz Pioneer Campus, Helmholtz Zentrum Muenchen, Neuherberg, Germany Helmholtz AI, Helmholtz Zentrum Muenchen, Neuherberg, Germany School of Life Sciences, Technical University of Munich, Freising, Germany
Yasmin Foster Department of Zoology, University of Otago, Dunedin, Aotearoa New Zealand
Murray P Cox School of Natural Sciences, Massey University, Palmerston North, Aotearoa New Zealand Department of Statistics, University of Auckland, Auckland, Aotearoa New Zealand
Andrew Digby Kākāpō Recovery Programme, Department of Conservation, Invercargill, Aotearoa New Zealand
Lydia R Uddstrom Kākāpō Recovery Programme, Department of Conservation, Invercargill, Aotearoa New Zealand
Daryl Eason Kākāpō Recovery Programme, Department of Conservation, Invercargill, Aotearoa New Zealand
Deidre Vercoe Kākāpō Recovery Programme, Department of Conservation, Invercargill, Aotearoa New Zealand
Tāne Davis Rakiura Tītī Islands Administering Body, Invercargill, Aotearoa New Zealand
Jason T Howard Neurogenetics of Language Lab, The Rockefeller University, New York, NY, USA Mirxes, Cambridge, MA, USA
Erich D Jarvis The Rockefeller University, New York, NY, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA
Fiona E Robertson Department of Zoology, University of Otago, Dunedin, Aotearoa New Zealand
Bruce C Robertson Department of Zoology, University of Otago, Dunedin, Aotearoa New Zealand
Neil J Gemmell Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, Aotearoa New Zealand
Tammy E Steeves School of Biological Sciences, University of Canterbury, Christchurch, Aotearoa New Zealand
Anna W Santure School of Biological Sciences, University of Auckland, Auckland, Aotearoa New Zealand
Peter K Dearden Genomics Aotearoa, Biochemistry Department, School of Biomedical Sciences, University of Otago, Dunedin, Aotearoa New Zealand.

Collapse

de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.13.536694. [PMID: 37214860 PMCID: PMC10197727 DOI: 10.1101/2023.04.13.536694] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Affiliation(s)

Tristan V de Jong Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
Yanchao Pan Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
Pasi Rastas Institute of Biotechnology, University of Helsinki, Helsinki, Finland
Daniel Munro Department of Psychiatry, University of California San Diego, San Diego, CA, USA Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
Monika Tutaj Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Huda Akil Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
Chris Benner Department of Medicine, University of California San Diego, San Diego, CA, USA
Denghui Chen Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Apurva S Chitre Department of Psychiatry, University of California San Diego, San Diego, CA, USA
William Chow Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Vincenza Colonna Institute of Genetics and Biophysics, National Research Council, Naples, Italy Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Clifton L Dalgard Department of Anatomy, Physiology & Genetics; The American Genome Center, Uniformed Services University of the Health Sciences, Washington DC, USA
Wendy M Demos Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Peter A Doris The Brown Foundation Institute of Molecular Medicine, Center For Human Genetics, University of Texas Health Science Center, Houston, TX, USA
Erik Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Aron M Geurts Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
Hakan M Gunturkun Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
Victor Guryev Genome Structure and Ageing, University of Groningen, UMC Groningen, The Netherlands
Thibaut Hourlier European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Jun Huang Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
Ted Kalbfleisch Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
Panjun Kim Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Ling Li Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, USA
Spencer Mahaffey Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Fergal J Martin European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
Pejman Mohammadi Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
Ayse Bilge Ozel Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
Oksana Polesskaya Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Michal Pravenec Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
Pjotr Prins Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Jonathan Sebat Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Jennifer R Smith Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Leah C Solberg Woods Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
Boris Tabakoff Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Alan Tracey Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Marcela Uliano-Silva Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Flavia Villani Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Hongyang Wang Department of Animal Sciences, Washington State University, Pullman, WA, USA
Burt M Sharp Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Francesca Telese Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Zhihua Jiang Department of Animal Sciences, Washington State University, Pullman, WA, USA
Laura Saba Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Xusheng Wang Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, USA
Terence D Murphy National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Abraham A Palmer Department of Psychiatry, University of California San Diego, San Diego, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Anne E Kwitek Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Melinda R Dwinell Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
Robert W Williams Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Jun Z Li Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
Hao Chen Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA

Collapse

Chen NC, Kolesnikov A, Goel S, Yun T, Chang PC, Carroll A. Improving variant calling using population data and deep learning. BMC Bioinformatics 2023;24:197. [PMID: 37173615 PMCID: PMC10182612 DOI: 10.1186/s12859-023-05294-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 04/17/2023] [Indexed: 05/15/2023] Open

Park H, Gim J. A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome. RESEARCH SQUARE 2023:rs.3.rs-2580940. [PMID: 36945432 PMCID: PMC10029055 DOI: 10.21203/rs.3.rs-2580940/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]

Lin J, Ngiam KY. How data science and AI-based technologies impact genomics. Singapore Med J 2023;64:59-66. [PMID: 36722518 PMCID: PMC9979798 DOI: 10.4103/singaporemedj.smj-2021-438] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Cai Y, Chen R, Gao S, Li W, Liu Y, Su G, Song M, Jiang M, Jiang C, Zhang X. Artificial intelligence applied in neoantigen identification facilitates personalized cancer immunotherapy. Front Oncol 2023;12:1054231. [PMID: 36698417 PMCID: PMC9868469 DOI: 10.3389/fonc.2022.1054231] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 12/16/2022] [Indexed: 01/10/2023] Open

Betschart RO, Thiéry A, Aguilera-Garcia D, Zoche M, Moch H, Twerenbold R, Zeller T, Blankenberg S, Ziegler A. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci Rep 2022;12:21502. [PMID: 36513709 PMCID: PMC9748128 DOI: 10.1038/s41598-022-26181-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 12/12/2022] [Indexed: 12/14/2022] Open

Affiliation(s)

Raphael O. Betschart Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland
Alexandre Thiéry Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland
Domingo Aguilera-Garcia grid.412004.30000 0004 0478 9977Institute of Pathology and Molecular Pathology, University Hospital Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
Martin Zoche grid.412004.30000 0004 0478 9977Institute of Pathology and Molecular Pathology, University Hospital Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
Holger Moch grid.412004.30000 0004 0478 9977Institute of Pathology and Molecular Pathology, University Hospital Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
Raphael Twerenbold grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,4grid.13648.380000 0001 2180 3484University Center of Cardiovascular Research Hamburg, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,5grid.452396.f0000 0004 5937 5237German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
Tanja Zeller grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,4grid.13648.380000 0001 2180 3484University Center of Cardiovascular Research Hamburg, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,5grid.452396.f0000 0004 5937 5237German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
Stefan Blankenberg Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland ,3grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,4grid.13648.380000 0001 2180 3484University Center of Cardiovascular Research Hamburg, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,5grid.452396.f0000 0004 5937 5237German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
Andreas Ziegler Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland ,3grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,6School Mathematics, Statistics and Computer Science, Scottsville, Private Bag X01, Pietermaritzburg, 3209 South Africa

Collapse

Woerner AE, Mandape S, Kapema KB, Duque TM, Smuts A, King JL, Crysup B, Wang X, Huang M, Ge J, Budowle B. Optimized variant calling for estimating kinship. Forensic Sci Int Genet 2022;61:102785. [DOI: 10.1016/j.fsigen.2022.102785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/07/2022] [Accepted: 09/29/2022] [Indexed: 11/16/2022]

Malcher A, Stokowy T, Berman A, Olszewska M, Jedrzejczak P, Sielski D, Nowakowski A, Rozwadowska N, Yatsenko AN, Kurpisz MK. Whole-genome sequencing identifies new candidate genes for nonobstructive azoospermia. Andrology 2022;10:1605-1624. [PMID: 36017582 PMCID: PMC9826517 DOI: 10.1111/andr.13269] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/21/2022] [Accepted: 08/17/2022] [Indexed: 01/11/2023]

Evaluation of the Available Variant Calling Tools for Oxford Nanopore Sequencing in Breast Cancer. Genes (Basel) 2022;13:genes13091583. [PMID: 36140751 PMCID: PMC9498802 DOI: 10.3390/genes13091583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 08/30/2022] [Accepted: 08/31/2022] [Indexed: 11/23/2022] Open

Borden ES, Buetow KH, Wilson MA, Hastings KT. Cancer Neoantigens: Challenges and Future Directions for Prediction, Prioritization, and Validation. Front Oncol 2022;12:836821. [PMID: 35311072 PMCID: PMC8929516 DOI: 10.3389/fonc.2022.836821] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/07/2022] [Indexed: 12/16/2022] Open

Barbitoff YA, Abasov R, Tvorogova VE, Glotov AS, Predeus AV. Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery. BMC Genomics 2022;23:155. [PMID: 35193511 PMCID: PMC8862519 DOI: 10.1186/s12864-022-08365-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 02/03/2022] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Accurate variant detection in the coding regions of the human genome is a key requirement for molecular diagnostics of Mendelian disorders. Efficiency of variant discovery from next-generation sequencing (NGS) data depends on multiple factors, including reproducible coverage biases of NGS methods and the performance of read alignment and variant calling software. Although variant caller benchmarks are published constantly, no previous publications have leveraged the full extent of available gold standard whole-genome (WGS) and whole-exome (WES) sequencing datasets.

RESULTS

In this work, we systematically evaluated the performance of 4 popular short read aligners (Bowtie2, BWA, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Clair3, DeepVariant, Octopus, GATK, FreeBayes, and Strelka2) using a set of 14 "gold standard" WES and WGS datasets available from Genome In A Bottle (GIAB) consortium. Additionally, we have indirectly evaluated each pipeline's performance using a set of 6 non-GIAB samples of African and Russian ethnicity. In our benchmark, Bowtie2 performed significantly worse than other aligners, suggesting it should not be used for medical variant calling. When other aligners were considered, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. Among the tested variant callers, DeepVariant consistently showed the best performance and the highest robustness. Other actively developed tools, such as Clair3, Octopus, and Strelka2, also performed well, although their efficiency had greater dependence on the quality and type of the input data. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting.

CONCLUSIONS

The results show surprisingly large differences in the performance of cutting-edge tools even in high confidence regions of the coding genome. This highlights the importance of regular benchmarking of quickly evolving tools and pipelines. We also discuss the need for a more diverse set of gold standard genomes that would include samples of African, Hispanic, or mixed ancestry. Additionally, there is also a need for better variant caller assessment in the repetitive regions of the coding genome.

Collapse

Comparison of GATK and DeepVariant by trio sequencing. Sci Rep 2022;12:1809. [PMID: 35110657 PMCID: PMC8810758 DOI: 10.1038/s41598-022-05833-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 01/12/2022] [Indexed: 12/03/2022] Open

Abstract

While next-generation sequencing (NGS) has transformed genetic testing, it generates large quantities of noisy data that require a significant amount of bioinformatics to generate useful interpretation. The accuracy of variant calling is therefore critical. Although GATK HaplotypeCaller is a widely used tool for this purpose, newer methods such as DeepVariant have shown higher accuracy in assessments of gold-standard samples for whole-genome sequencing (WGS) and whole-exome sequencing (WES), but a side-by-side comparison on clinical samples has not been performed. Trio WES was used to compare GATK (4.1.2.0) HaplotypeCaller and DeepVariant (v0.8.0). The performance of the two pipelines was evaluated according to the Mendelian error rate, transition-to-transversion (Ti/Tv) ratio, concordance rate, and pathological variant detection rate. Data from 80 trios were analyzed. The Mendelian error rate of the 77 biological trios calculated from the data by DeepVariant (3.09 ± 0.83%) was lower than that calculated from the data by GATK (5.25 ± 0.91%) (p < 0.001). DeepVariant also yielded a higher Ti/Tv ratio (2.38 ± 0.02) than GATK (2.04 ± 0.07) (p < 0.001), suggesting that DeepVariant proportionally called more true positives. The concordance rate between the 2 pipelines was 88.73%. Sixty-three disease-causing variants were detected in the 80 trios. Among them, DeepVariant detected 62 variants, and GATK detected 61 variants. The one variant called by DeepVariant but not GATK HaplotypeCaller might have been missed by GATK HaplotypeCaller due to low coverage. OTC exon 2 (139 bp) deletion was not detected by either method. Mendelian error rate calculation is an effective way to evaluate variant callers. By this method, DeepVariant outperformed GATK, while the two pipelines performed equally in other parameters.

Collapse

Brady SW, Gout AM, Zhang J. Therapeutic and prognostic insights from the analysis of cancer mutational signatures. Trends Genet 2022;38:194-208. [PMID: 34483003 PMCID: PMC8752466 DOI: 10.1016/j.tig.2021.08.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 08/06/2021] [Accepted: 08/11/2021] [Indexed: 02/08/2023]

Kelly CJ, Brown APY, Taylor JA. Artificial Intelligence in Pediatrics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Bathke J, Lühken G. OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow. BMC Bioinformatics 2021;22:402. [PMID: 34388963 PMCID: PMC8361789 DOI: 10.1186/s12859-021-04317-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/04/2021] [Indexed: 12/30/2022] Open

Abstract

Background

The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time.

Results

A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half.

Conclusions

The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling.

Collapse

de Jong TV, Kim P, Guryev V, Mulligan MK, Williams RW, Redei EE, Chen H. Whole genome sequencing of nearly isogenic WMI and WLI inbred rats identifies genes potentially involved in depression and stress reactivity. Sci Rep 2021;11:14774. [PMID: 34285244 PMCID: PMC8292482 DOI: 10.1038/s41598-021-92993-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/17/2021] [Indexed: 02/06/2023] Open

Li H, Dawood M, Khayat MM, Farek JR, Jhangiani SN, Khan ZM, Mitani T, Coban-Akdemir Z, Lupski JR, Venner E, Posey JE, Sabo A, Gibbs RA. Exome variant discrepancies due to reference-genome differences. Am J Hum Genet 2021;108:1239-1250. [PMID: 34129815 PMCID: PMC8322936 DOI: 10.1016/j.ajhg.2021.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/19/2021] [Indexed: 12/15/2022] Open

Zhu N, Swietlik EM, Welch CL, Pauciulo MW, Hagen JJ, Zhou X, Guo Y, Karten J, Pandya D, Tilly T, Lutz KA, Martin JM, Treacy CM, Rosenzweig EB, Krishnan U, Coleman AW, Gonzaga-Jauregui C, Lawrie A, Trembath RC, Wilkins MR, Morrell NW, Shen Y, Gräf S, Nichols WC, Chung WK. Rare variant analysis of 4241 pulmonary arterial hypertension cases from an international consortium implicates FBLN2, PDGFD, and rare de novo variants in PAH. Genome Med 2021;13:80. [PMID: 33971972 PMCID: PMC8112021 DOI: 10.1186/s13073-021-00891-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 04/19/2021] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Pulmonary arterial hypertension (PAH) is a lethal vasculopathy characterized by pathogenic remodeling of pulmonary arterioles leading to increased pulmonary pressures, right ventricular hypertrophy, and heart failure. PAH can be associated with other diseases (APAH: connective tissue diseases, congenital heart disease, and others) but often the etiology is idiopathic (IPAH). Mutations in bone morphogenetic protein receptor 2 (BMPR2) are the cause of most heritable cases but the vast majority of other cases are genetically undefined.

METHODS

To identify new risk genes, we utilized an international consortium of 4241 PAH cases with exome or genome sequencing data from the National Biological Sample and Data Repository for PAH, Columbia University Irving Medical Center, and the UK NIHR BioResource - Rare Diseases Study. The strength of this combined cohort is a doubling of the number of IPAH cases compared to either national cohort alone. We identified protein-coding variants and performed rare variant association analyses in unrelated participants of European ancestry, including 1647 IPAH cases and 18,819 controls. We also analyzed de novo variants in 124 pediatric trios enriched for IPAH and APAH-CHD.

RESULTS

Seven genes with rare deleterious variants were associated with IPAH with false discovery rate smaller than 0.1: three known genes (BMPR2, GDF2, and TBX4), two recently identified candidate genes (SOX17, KDR), and two new candidate genes (fibulin 2, FBLN2; platelet-derived growth factor D, PDGFD). The new genes were identified based solely on rare deleterious missense variants, a variant type that could not be adequately assessed in either cohort alone. The candidate genes exhibit expression patterns in lung and heart similar to that of known PAH risk genes, and most variants occur in conserved protein domains. For pediatric PAH, predicted deleterious de novo variants exhibited a significant burden compared to the background mutation rate (2.45×, p = 2.5e-5). At least eight novel pediatric candidate genes carrying de novo variants have plausible roles in lung/heart development.

CONCLUSIONS

Rare variant analysis of a large international consortium identified two new candidate genes-FBLN2 and PDGFD. The new genes have known functions in vasculogenesis and remodeling. Trio analysis predicted that ~ 15% of pediatric IPAH may be explained by de novo variants.

Collapse

Affiliation(s)

Na Zhu Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA Department of Systems Biology, Columbia University, New York, NY, USA
Emilia M Swietlik Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Carrie L Welch Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
Michael W Pauciulo Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
Jacob J Hagen Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA Department of Systems Biology, Columbia University, New York, NY, USA
Xueya Zhou Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA Department of Systems Biology, Columbia University, New York, NY, USA
Yicheng Guo Department of Systems Biology, Columbia University, New York, NY, USA
Johannes Karten 42Genetics, Belfast, Ireland
Divya Pandya Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Tobias Tilly Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Katie A Lutz Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
Jennifer M Martin Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, UK
Carmen M Treacy Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Erika B Rosenzweig Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
Usha Krishnan Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
Anna W Coleman Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
Claudia Gonzaga-Jauregui Regeneron Pharmaceuticals, New York, NY, USA
Allan Lawrie Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, UK
Richard C Trembath Department of Medical and Molecular Genetics, King's College London, London, UK
Martin R Wilkins National Heart & Lung Institute, Imperial College London, London, UK




Nicholas W Morrell Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, UK Addenbrooke's Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK Royal Papworth Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
Yufeng Shen Department of Systems Biology, Columbia University, New York, NY, USA Department of Biomedical Informatics, Columbia University, New York, NY, USA
Stefan Gräf Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, UK Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
William C Nichols Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
Wendy K Chung Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA. Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.

Collapse

Next Generation Sequencing Technology in the Clinic and Its Challenges. Cancers (Basel) 2021;13:cancers13081751. [PMID: 33916923 PMCID: PMC8067551 DOI: 10.3390/cancers13081751] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 03/30/2021] [Accepted: 04/05/2021] [Indexed: 12/12/2022] Open

Fischer C, Koblmüller S, Börger C, Michelitsch G, Trajanoski S, Schlötterer C, Guelly C, Thallinger GG, Sturmbauer C. Genome sequences of Tropheus moorii and Petrochromis trewavasae, two eco-morphologically divergent cichlid fishes endemic to Lake Tanganyika. Sci Rep 2021;11:4309. [PMID: 33619328 PMCID: PMC7900123 DOI: 10.1038/s41598-021-81030-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 12/28/2020] [Indexed: 01/01/2023] Open

Zhou X, Zhang L, Weng Z, Dill DL, Sidow A. Aquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads. Nat Commun 2021;12:1077. [PMID: 33597536 PMCID: PMC7889865 DOI: 10.1038/s41467-021-21395-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 01/20/2021] [Indexed: 01/19/2023] Open

Gorla A, Jew B, Zhang L, Sul JH. xGAP: A python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery. Bioinformatics 2021;37:9-16. [PMID: 33416856 PMCID: PMC8034531 DOI: 10.1093/bioinformatics/btaa1097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/04/2021] [Indexed: 11/14/2022] Open

Molina-Mora JA, Solano-Vargas M. Set-theory based benchmarking of three different variant callers for targeted sequencing. BMC Bioinformatics 2021;22:20. [PMID: 33413082 PMCID: PMC7791862 DOI: 10.1186/s12859-020-03926-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 12/09/2020] [Indexed: 12/05/2022] Open

Abstract

Background

Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality.

Results

We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set.

Conclusions

Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.

Collapse

Artificial Intelligence in Pediatrics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_316-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Padmavathi P, Setlur AS, Chandrashekar K, Niranjan V. A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100762] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep 2020. [PMID: 33214604 DOI: 10.1101/2020.03.27.011767v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep 2020;10:20222. [PMID: 33214604 PMCID: PMC7678823 DOI: 10.1038/s41598-020-77218-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 11/02/2020] [Indexed: 12/30/2022] Open

DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020;2020:7231205. [PMID: 32952600 PMCID: PMC7481958 DOI: 10.1155/2020/7231205] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/15/2020] [Accepted: 08/21/2020] [Indexed: 12/18/2022]

Comparison of commercially available whole-genome sequencing kits for variant detection in circulating cell-free DNA. Sci Rep 2020;10:6190. [PMID: 32277101 PMCID: PMC7148341 DOI: 10.1038/s41598-020-63102-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 03/19/2020] [Indexed: 12/13/2022] Open

Fjeld K, Masson E, Lin JH, Michl P, Stokowy T, Gravdal A, El Jellas K, Steine SJ, Hoem D, Johansson BB, Dalva M, Ruffert C, Zou WB, Li ZS, Njølstad PR, Chen JM, Liao Z, Johansson S, Rosendahl J, Férec C, Molven A. Characterization of CEL-DUP2: Complete duplication of the carboxyl ester lipase gene is unlikely to influence risk of chronic pancreatitis. Pancreatology 2020;20:377-384. [PMID: 32007358 DOI: 10.1016/j.pan.2020.01.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 01/17/2020] [Accepted: 01/18/2020] [Indexed: 12/11/2022]

Affiliation(s)

Karianne Fjeld The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway; Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway; Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway.
Emmanuelle Masson Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200, Brest, France; CHRU Brest, Service de Génétique, Brest, France
Jin-Huan Lin Department of Gastroenterology, Changhai Hospital, Second Military Medical University, Shanghai, China; Shanghai Institute of Pancreatic Diseases, Shanghai, China
Patrick Michl Department of Internal Medicine I, Martin Luther University, Halle, Germany
Tomasz Stokowy Genomics Core Facility, Department of Clinical Science, University of Bergen, Bergen, Norway
Anny Gravdal The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway; Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway; Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
Khadija El Jellas The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway; Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
Solrun J Steine The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway
Dag Hoem Department of Gastrointestinal Surgery, Haukeland University Hospital, Bergen, Norway
Bente B Johansson Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
Monica Dalva The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway; Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
Claudia Ruffert Department of Internal Medicine I, Martin Luther University, Halle, Germany
Wen-Bin Zou Department of Gastroenterology, Changhai Hospital, Second Military Medical University, Shanghai, China; Shanghai Institute of Pancreatic Diseases, Shanghai, China
Zhao-Shen Li Department of Gastroenterology, Changhai Hospital, Second Military Medical University, Shanghai, China; Shanghai Institute of Pancreatic Diseases, Shanghai, China
Pål R Njølstad Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway; Department of Pediatrics and Adolescent Medicine, Haukeland University Hospital, Bergen, Norway
Jian-Min Chen Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200, Brest, France
Zhuan Liao Department of Gastroenterology, Changhai Hospital, Second Military Medical University, Shanghai, China; Shanghai Institute of Pancreatic Diseases, Shanghai, China
Stefan Johansson Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway; Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
Jonas Rosendahl Department of Internal Medicine I, Martin Luther University, Halle, Germany
Claude Férec Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200, Brest, France; CHRU Brest, Service de Génétique, Brest, France
Anders Molven The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway; Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway; Department of Pathology, Haukeland University Hospital, Bergen, Norway

Collapse

Stenton SL, Kremer LS, Kopajtich R, Ludwig C, Prokisch H. The diagnosis of inborn errors of metabolism by an integrative "multi-omics" approach: A perspective encompassing genomics, transcriptomics, and proteomics. J Inherit Metab Dis 2020;43:25-35. [PMID: 31119744 DOI: 10.1002/jimd.12130] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 05/21/2019] [Accepted: 05/21/2019] [Indexed: 12/12/2022]

Loka TP, Tausch SH, Renard BY. Reliable variant calling during runtime of Illumina sequencing. Sci Rep 2019;9:16502. [PMID: 31712740 PMCID: PMC6848508 DOI: 10.1038/s41598-019-52991-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 10/16/2019] [Indexed: 02/03/2023] Open

Battaglia S. Neoantigen prediction from genomic and transcriptomic data. Methods Enzymol 2019;635:267-281. [PMID: 32122550 DOI: 10.1016/bs.mie.2019.10.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Svensson D, Sjögren R, Sundell D, Sjödin A, Trygg J. doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows. BMC Bioinformatics 2019;20:498. [PMID: 31615395 PMCID: PMC6794737 DOI: 10.1186/s12859-019-3091-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 09/10/2019] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Selecting the proper parameter settings for bioinformatic software tools is challenging. Not only will each parameter have an individual effect on the outcome, but there are also potential interaction effects between parameters. Both of these effects may be difficult to predict. To make the situation even more complex, multiple tools may be run in a sequential pipeline where the final output depends on the parameter configuration for each tool in the pipeline. Because of the complexity and difficulty of predicting outcomes, in practice parameters are often left at default settings or set based on personal or peer experience obtained in a trial and error fashion. To allow for the reliable and efficient selection of parameters for bioinformatic pipelines, a systematic approach is needed.

RESULTS

We present doepipeline, a novel approach to optimizing bioinformatic software parameters, based on core concepts of the Design of Experiments methodology and recent advances in subset designs. Optimal parameter settings are first approximated in a screening phase using a subset design that efficiently spans the entire search space, then optimized in the subsequent phase using response surface designs and OLS modeling. Doepipeline was used to optimize parameters in four use cases; 1) de-novo assembly, 2) scaffolding of a fragmented genome assembly, 3) k-mer taxonomic classification of Oxford Nanopore Technologies MinION reads, and 4) genetic variant calling. In all four cases, doepipeline found parameter settings that produced a better outcome with respect to the characteristic measured when compared to using default values. Our approach is implemented and available in the Python package doepipeline.

CONCLUSIONS

Our proposed methodology provides a systematic and robust framework for optimizing software parameter settings, in contrast to labor- and time-intensive manual parameter tweaking. Implementation in doepipeline makes our methodology accessible and user-friendly, and allows for automatic optimization of tools in a wide range of cases. The source code of doepipeline is available at https://github.com/clicumu/doepipeline and it can be installed through conda-forge.

Collapse

Variant calling and quality control of large-scale human genome sequencing data. Emerg Top Life Sci 2019;3:399-409. [DOI: 10.1042/etls20190007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 06/28/2019] [Accepted: 07/16/2019] [Indexed: 12/12/2022]