1
|
Niazi SK. The Coming of Age of AI/ML in Drug Discovery, Development, Clinical Testing, and Manufacturing: The FDA Perspectives. Drug Des Devel Ther 2023; 17:2691-2725. [PMID: 37701048 PMCID: PMC10493153 DOI: 10.2147/dddt.s424991] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) represent significant advancements in computing, building on technologies that humanity has developed over millions of years-from the abacus to quantum computers. These tools have reached a pivotal moment in their development. In 2021 alone, the U.S. Food and Drug Administration (FDA) received over 100 product registration submissions that heavily relied on AI/ML for applications such as monitoring and improving human performance in compiling dossiers. To ensure the safe and effective use of AI/ML in drug discovery and manufacturing, the FDA and numerous other U.S. federal agencies have issued continuously updated, stringent guidelines. Intriguingly, these guidelines are often generated or updated with the aid of AI/ML tools themselves. The overarching goal is to expedite drug discovery, enhance the safety profiles of existing drugs, introduce novel treatment modalities, and improve manufacturing compliance and robustness. Recent FDA publications offer an encouraging outlook on the potential of these tools, emphasizing the need for their careful deployment. This has expanded market opportunities for retraining personnel handling these technologies and enabled innovative applications in emerging therapies such as gene editing, CRISPR-Cas9, CAR-T cells, mRNA-based treatments, and personalized medicine. In summary, the maturation of AI/ML technologies is a testament to human ingenuity. Far from being autonomous entities, these are tools created by and for humans designed to solve complex problems now and in the future. This paper aims to present the status of these technologies, along with examples of their present and future applications.
Collapse
|
2
|
Safonov A, Nomakuchi TT, Chao E, Horton C, Dolinsky JS, Yussuf A, Richardson M, Speare V, Li S, Bogus ZC, Bonanni M, Raper A, Kallish S, Ritchie MD, Nathanson KL, Drivas TG. A genotype-first approach identifies high incidence of NF1 pathogenic variants with distinct disease associations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.08.23293676. [PMID: 37609227 PMCID: PMC10441497 DOI: 10.1101/2023.08.08.23293676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Loss of function variants in the NF1 gene cause neurofibromatosis type 1 (NF1), a genetic disorder characterized by complete penetrance, prevalence of 1 in 3,000, characteristic physical exam findings, and a substantially increased risk for malignancy. However, our understanding of the disorder is entirely based on patients ascertained through phenotype-first approaches. Leveraging a genotype-first approach in two large patient cohorts, we demonstrate unexpectedly high prevalence (1 in 450-750) of NF1 pathogenic variants. Half were identified in individuals lacking clinical features of NF1, with many appearing to have post-zygotic mosaicism for the identified variant. Incidentally discovered variants were not associated with classic NF1 features but were associated with an increased incidence of malignancy compared to a control population. Our findings suggest that NF1 pathogenic variants are substantially more common than previously thought, often characterized by somatic mosaicism and reduced penetrance, and are important contributors to cancer risk in the general population.
Collapse
|
3
|
Dasariraju S, Gragert L, Wager GL, McCullough K, Brown NK, Kamoun M, Urbanowicz RJ. HLA amino acid Mismatch-Based risk stratification of kidney allograft failure using a novel Machine learning algorithm. J Biomed Inform 2023; 142:104374. [PMID: 37120046 PMCID: PMC10286565 DOI: 10.1016/j.jbi.2023.104374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 04/02/2023] [Accepted: 04/23/2023] [Indexed: 05/01/2023]
Abstract
OBJECTIVE While associations between HLA antigen-level mismatches (Ag-MM) and kidney allograft failure are well established, HLA amino acid-level mismatches (AA-MM) have been less explored. Ag-MM fails to consider the substantial variability in the number of MMs at polymorphic amino acid (AA) sites within any given Ag-MM category, which may conceal variable impact on allorecognition. In this study we aim to develop a novel Feature Inclusion Bin Evolver for Risk Stratification (FIBERS) and apply it to automatically discover bins of HLA amino acid mismatches that stratify donor-recipient pairs into low versus high graft survival risk groups. METHODS Using data from the Scientific Registry of Transplant Recipients, we applied FIBERS on a multiethnic population of 166,574 kidney transplants between 2000 and 2017. FIBERS was applied (1) across all HLA-A, B, C, DRB1, and DQB1 locus AA-MMs with comparison to 0-ABDR Ag-MM risk stratification, (2) on AA-MMs within each HLA locus individually, and (3) using cross validation to evaluate FIBERS generalizability. The predictive power of graft failure risk stratification was evaluated while adjusting for donor/recipient characteristics and HLA-A, B, C, DRB1, and DQB1 Ag-MMs as covariates. RESULTS FIBERS's best-performing bin (on AA-MMs across all loci) added significant predictive power (hazard ratio = 1.10, Bonferroni adj. p < 0.001) in stratifying graft failure risk (where low-risk is defined as zero AA-MMs and high-risk is one or more AA-MMs) even after adjusting for Ag-MMs and donor/recipient covariates. The best bin also categorized more than twice as many patients to the low-risk category, compared to traditional 0-ABDR Ag mismatching (∼24.4% vs ∼ 9.1%). When HLA loci were binned individually, the bin for DRB1 exhibited the strongest risk stratification; relative to zero AA-MM, one or more MMs in the bin yielded HR = 1.11, p < 0.005 in a fully adjusted Cox model. AA-MMs at HLA-DRB1 peptide contact sites contributed most to incremental risk of graft failure. Additionally, FIBERS points to possible risk associated with HLA-DQB1 AA-MMs at positions that determine specificity of peptide anchor residues and HLA-DQ heterodimer stability. CONCLUSION FIBERS's performance suggests potential for discovery of HLA immunogenetics-based risk stratification of kidney graft failure that outperforms traditional assessment.
Collapse
Affiliation(s)
- Satvik Dasariraju
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, United States; The Lawrenceville School, Lawrenceville, NJ, United States
| | - Loren Gragert
- Department of Pathology and Laboratory Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Grace L Wager
- Department of Pathology and Laboratory Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Keith McCullough
- Arbor Research Collaborative for Health, Ann Arbor, MI, United States
| | - Nicholas K Brown
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Malek Kamoun
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Ryan J Urbanowicz
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, CA, United States.
| |
Collapse
|
4
|
Hui D, Mehrabi S, Quimby AE, Chen T, Chen S, Park J, Li B, Ruckenstein MJ, Rader DJ, Ritchie MD, Brant JA, Epstein DJ, Mathieson I. Gene burden analysis identifies genes associated with increased risk and severity of adult-onset hearing loss in a diverse hospital-based cohort. PLoS Genet 2023; 19:e1010584. [PMID: 36656851 PMCID: PMC9888707 DOI: 10.1371/journal.pgen.1010584] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/31/2023] [Accepted: 12/20/2022] [Indexed: 01/20/2023] Open
Abstract
Loss or absence of hearing is common at both extremes of human lifespan, in the forms of congenital deafness and age-related hearing loss. While these are often studied separately, there is increasing evidence that their genetic basis is at least partially overlapping. In particular, both common and rare variants in genes associated with monogenic forms of hearing loss also contribute to the more polygenic basis of age-related hearing loss. Here, we directly test this model in the Penn Medicine BioBank-a healthcare system cohort of around 40,000 individuals with linked genetic and electronic health record data. We show that increased burden of predicted deleterious variants in Mendelian hearing loss genes is associated with increased risk and severity of adult-onset hearing loss. As a specific example, we identify one gene-TCOF1, responsible for a syndromic form of congenital hearing loss-in which deleterious variants are also associated with adult-onset hearing loss. We also identify four additional novel candidate genes (COL5A1, HMMR, RAPGEF3, and NNT) in which rare variant burden may be associated with hearing loss. Our results confirm that rare variants in Mendelian hearing loss genes contribute to polygenic risk of hearing loss, and emphasize the utility of healthcare system cohorts to study common complex traits and diseases.
Collapse
Affiliation(s)
- Daniel Hui
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Shadi Mehrabi
- Department of Otolaryngology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Alexandra E. Quimby
- Department of Otolaryngology–Head and Neck Surgery, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Tingfang Chen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Sixing Chen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Joseph Park
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Binglan Li
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
| | | | - Penn Medicine Biobank
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Michael J. Ruckenstein
- Department of Otolaryngology–Head and Neck Surgery, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Daniel J. Rader
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jason A. Brant
- Department of Otolaryngology–Head and Neck Surgery, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Otolaryngology–Head and Neck Surgery, Corporal Michael J. Crescenz VAMC, Philadelphia, Pennsylvania, United States of America
| | - Douglas J. Epstein
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (DJE); (IM)
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (DJE); (IM)
| |
Collapse
|
5
|
Zhang C, Verma A, Feng Y, Melo MCR, McQuillan M, Hansen M, Lucas A, Park J, Ranciaro A, Thompson S, Rubel MA, Campbell MC, Beggs W, Hirbo J, Wata Mpoloka S, George Mokone G, Nyambo T, Wolde Meskel D, Belay G, Fokunang C, Njamnshi AK, Omar SA, Williams SM, Rader DJ, Ritchie MD, de la Fuente-Nunez C, Sirugo G, Tishkoff SA. Impact of natural selection on global patterns of genetic variation and association with clinical phenotypes at genes involved in SARS-CoV-2 infection. Proc Natl Acad Sci U S A 2022; 119:e2123000119. [PMID: 35580180 PMCID: PMC9173769 DOI: 10.1073/pnas.2123000119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/29/2022] [Indexed: 01/09/2023] Open
Abstract
Human genomic diversity has been shaped by both ancient and ongoing challenges from viruses. The current coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has had a devastating impact on population health. However, genetic diversity and evolutionary forces impacting host genes related to SARS-CoV-2 infection are not well understood. We investigated global patterns of genetic variation and signatures of natural selection at host genes relevant to SARS-CoV-2 infection (angiotensin converting enzyme 2 [ACE2], transmembrane protease serine 2 [TMPRSS2], dipeptidyl peptidase 4 [DPP4], and lymphocyte antigen 6 complex locus E [LY6E]). We analyzed data from 2,012 ethnically diverse Africans and 15,977 individuals of European and African ancestry with electronic health records and integrated with global data from the 1000 Genomes Project. At ACE2, we identified 41 nonsynonymous variants that were rare in most populations, several of which impact protein function. However, three nonsynonymous variants (rs138390800, rs147311723, and rs145437639) were common among central African hunter-gatherers from Cameroon (minor allele frequency 0.083 to 0.164) and are on haplotypes that exhibit signatures of positive selection. We identify signatures of selection impacting variation at regulatory regions influencing ACE2 expression in multiple African populations. At TMPRSS2, we identified 13 amino acid changes that are adaptive and specific to the human lineage compared with the chimpanzee genome. Genetic variants that are targets of natural selection are associated with clinical phenotypes common in patients with COVID-19. Our study provides insights into global variation at host genes related to SARS-CoV-2 infection, which have been shaped by natural selection in some populations, possibly due to prior viral infections.
Collapse
Affiliation(s)
- Chao Zhang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Anurag Verma
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Division of Translational Medicine and Human Genetics, Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA 19104
| | - Yuanqing Feng
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Marcelo C. R. Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104
| | - Michael McQuillan
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Matthew Hansen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Anastasia Lucas
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Joseph Park
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Alessia Ranciaro
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Simon Thompson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Meagan A. Rubel
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Michael C. Campbell
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089
| | - William Beggs
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Jibril Hirbo
- Department of Medicine, Vanderbilt University, Nashville, TN 37232
| | | | | | | | - Thomas Nyambo
- Department of Biochemistry, Kampala International University in Tanzania, Dar es Salaam, Tanzania
| | - Dawit Wolde Meskel
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Gurja Belay
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Charles Fokunang
- Department of Pharmacotoxicology and Pharmacokinetics, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, Yaoundé, Cameroon
| | - Alfred K. Njamnshi
- Department of Neurology, Central Hospital Yaoundé, Yaoundé, Cameroon
- Brain Research Africa Initiative, Neuroscience Laboratory, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, Yaoundé, Cameroon
| | - Sabah A. Omar
- Center for Biotechnology Research and Development, Kenya Medical Research Institute, Nairobi, Kenya
| | - Scott M. Williams
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106
| | - Daniel J. Rader
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104
| | - Giorgio Sirugo
- Division of Translational Medicine and Human Genetics, Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA 19104
| | - Sarah A. Tishkoff
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104
- Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
6
|
Zhang C, Verma A, Feng Y, Dos Reis Melo MC, McQuillan M, Hansen M, Lucas A, Park J, Ranciaro A, Thompson S, Rubel M, Campbell M, Beggs W, Hirbo J, Mpoloka SW, Mokone GG, Jones M, Nyambo T, Meskel DW, Belay G, Fokunang C, Njamnshi A, Omar S, Williams S, Rader D, Ritchie M, de la Fuente C, Sirugo G, Tishkoff S. Impact of natural selection on global patterns of genetic variation, and association with clinical phenotypes, at genes involved in SARS-CoV-2 infection. RESEARCH SQUARE 2021:rs.3.rs-673011. [PMID: 34341784 PMCID: PMC8328070 DOI: 10.21203/rs.3.rs-673011/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
We investigated global patterns of genetic variation and signatures of natural selection at host genes relevant to SARS-CoV-2 infection ( ACE2, TMPRSS2, DPP4 , and LY6E ). We analyzed novel data from 2,012 ethnically diverse Africans and 15,997 individuals of European and African ancestry with electronic health records, and integrated with global data from the 1000GP. At ACE2 , we identified 41 non-synonymous variants that were rare in most populations, several of which impact protein function. However, three non-synonymous variants were common among Central African hunter-gatherers from Cameroon and are on haplotypes that exhibit signatures of positive selection. We identify strong signatures of selection impacting variation at regulatory regions influencing ACE2 expression in multiple African populations. At TMPRSS2 , we identified 13 amino acid changes that are adaptive and specific to the human lineage. Genetic variants that are targets of natural selection are associated with clinical phenotypes common in patients with COVID-19.
Collapse
Affiliation(s)
| | - Anurag Verma
- Perelman School of Medicine, University of Pennsylvania
| | | | | | | | | | | | - Joseph Park
- Perelman School of Medicine, University of Pennsylvania
| | | | | | | | | | | | | | | | | | | | | | - Dawit Wolde Meskel
- Addis Ababa University Department of Microbial Cellular and Molecular Biology
| | - Guija Belay
- Addis Ababa University Department of Microbial Cellular and Molecular Biology
| | - Charles Fokunang
- Department of Pharmacotoxicology and Pharmacokinetics, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, Yaoundé, Cameroon
| | | | | | | | - Daniel Rader
- Perelman School of Medicine at the University of Pennsylvania
| | | | | | | | | |
Collapse
|
7
|
Shivakumar M, Miller JE, Dasari VR, Zhang Y, Lee MTM, Carey DJ, Gogoi R, Kim D. Genetic Analysis of Functional Rare Germline Variants across Nine Cancer Types from an Electronic Health Record Linked Biobank. Cancer Epidemiol Biomarkers Prev 2021; 30:1681-1688. [PMID: 34244158 DOI: 10.1158/1055-9965.epi-21-0082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 02/15/2021] [Accepted: 06/17/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Rare variants play an essential role in the etiology of cancer. In this study, we aim to characterize rare germline variants that impact the risk of cancer. METHODS We performed a genome-wide rare variant analysis using germline whole exome sequencing (WES) data derived from the Geisinger MyCode initiative to discover cancer predisposition variants. The case-control association analysis was conducted by binning variants in 5,538 patients with cancer and 7,286 matched controls in a discovery set and 1,991 patients with cancer and 2,504 matched controls in a validation set across nine cancer types. Further, The Cancer Genome Atlas (TCGA) germline data were used to replicate the findings. RESULTS We identified 133 significant pathway-cancer pairs (85 replicated) and 90 significant gene-cancer pairs (12 replicated). In addition, we identified 18 genes and 3 pathways that were associated with survival outcome across cancers (Bonferroni P < 0.05). CONCLUSIONS In this study, we identified potential predisposition genes and pathways based on rare variants in nine cancers. IMPACT This work adds to the knowledge base and progress being made in precision medicine.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger, Danville, Pennsylvania
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason E Miller
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | | | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania
| | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, Pennsylvania.
| | | |
Collapse
|
8
|
Fore R, Boehme J, Li K, Westra J, Tintle N. Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants. Front Genet 2020; 11:591606. [PMID: 33240333 PMCID: PMC7680887 DOI: 10.3389/fgene.2020.591606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.
Collapse
Affiliation(s)
- Ruby Fore
- Department of Biostatistics, Brown University, Providence, RI, United States
| | - Jaden Boehme
- Department of Mathematics, Oregon State University, Corvallis, OR, United States
| | - Kevin Li
- Department of Mathematics, School of Arts and Sciences, Columbia University, New York, NY, United States
| | - Jason Westra
- Department of Mathematics and Statistics, Dordt University, Sioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics and Statistics, Dordt University, Sioux Center, IA, United States
| |
Collapse
|
9
|
Zolotovskaia M, Sorokin M, Garazha A, Borisov N, Buzdin A. Molecular Pathway Analysis of Mutation Data for Biomarkers Discovery and Scoring of Target Cancer Drugs. Methods Mol Biol 2020; 2063:207-234. [PMID: 31667773 DOI: 10.1007/978-1-0716-0138-9_16] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA mutations govern cancer development. Cancer mutation profiles vary dramatically among the individuals. In some cases, they may serve as the predictors of disease progression and response to therapies. However, the biomarker potential of cancer mutations can be dramatically (several orders of magnitude) enhanced by applying molecular pathway-based approach. We developed Oncobox system for calculation of pathway instability (PI) values for the molecular pathways that are aggregated mutation frequencies of the pathway members normalized on gene lengths and on number of genes in the pathway. PI scores can be effective biomarkers in different types of comparisons, for example, as the cancer type biomarkers and as the predictors of tumor response to target therapies. The latter option is implemented using mutation drug score (MDS) values, which algorithmically rank the drugs capacity of interfering with the mutated molecular pathways. Here, describe the mathematical basis and algorithms for PI and MDS values calculation, validation and implementation. The example analysis is provided encompassing 5956 human tumor mutation profiles of 15 cancer types from The Cancer Genome Atlas (TCGA) project, that totally make 2,316,670 mutations in 19,872 genes and 1748 molecular pathways, thus enabling ranking of 128 clinically approved target drugs. Our results evidence that the Oncobox PI and MDS approaches are highly useful for basic and applied aspects of molecular oncology and pharmacology research.
Collapse
Affiliation(s)
- Marianna Zolotovskaia
- Omicsway Corp., Walnut, CA, USA
- Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Maxim Sorokin
- Omicsway Corp., Walnut, CA, USA
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Nikolay Borisov
- Omicsway Corp., Walnut, CA, USA
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Anton Buzdin
- Omicsway Corp., Walnut, CA, USA.
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| |
Collapse
|
10
|
An exome-wide rare variant analysis of Korean men identifies three novel genes predisposing to prostate cancer. Sci Rep 2019; 9:17173. [PMID: 31748686 PMCID: PMC6868235 DOI: 10.1038/s41598-019-53445-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 10/25/2019] [Indexed: 01/26/2023] Open
Abstract
Since prostate cancer is highly heritable, common variants associated with prostate cancer have been studied in various populations, including those in Korea. However, rare and low-frequency variants have a significant influence on the heritability of the disease. The contributions of rare variants to prostate cancer susceptibility have not yet been systematically evaluated in a Korean population. In this work, we present a large-scale exome-wide rare variant analysis of 7,258 individuals (985 cases with prostate cancer and 6,273 controls). In total, 19 rare variant loci spanning 7 genes contributed to an association with prostate cancer susceptibility. In addition to replicating previously known susceptibility genes (e.g., CDYL2, MST1R, GPER1, and PARD3B), 3 novel genes were identified (FDR q < 0.05), including the non-coding RNAs ENTPD3-AS1, LOC102724438, and protein-coding gene SPATA3. Additionally, 6 pathways were identified based on identified variants and genes, including estrogen signaling pathway, signaling by MST1, IL-15 production, MSP-RON signaling pathway, and IL-12 signaling and production in macrophages, which are known to be associated with prostate cancer. In summary, we report novel genes and rare variants that potentially play a role in prostate cancer susceptibility in the Korean population. These observations demonstrated a path towards one of the fundamental goals of precision medicine, which is to identify biomarkers for a subset of the population with a greater risk of disease than others.
Collapse
|
11
|
Shivakumar M, Miller JE, Dasari VR, Gogoi R, Kim D. Exome-Wide Rare Variant Analysis From the DiscovEHR Study Identifies Novel Candidate Predisposition Genes for Endometrial Cancer. Front Oncol 2019; 9:574. [PMID: 31338326 PMCID: PMC6626914 DOI: 10.3389/fonc.2019.00574] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/13/2019] [Indexed: 12/19/2022] Open
Abstract
Endometrial cancer is the fourth most commonly diagnosed cancer in women. Family history is a known risk factor for endometrial cancer. The incidence of endometrial cancer in a first-degree relative elevates the relative risk to range between 1.3 and 2.8. It is unclear to what extent or what other novel germline variants are at play in endometrial cancer. We aim to address this question by utilizing whole exome sequencing as a means to identify novel, rare variant associations between exonic regions and endometrial cancer. The MyCode community health initiative is an excellent resource for this study with germline whole exome data for 60,000 patients available in the first phase, and further 30,000 patients independently sequenced in the second phase as part of DiscovEHR study. We conducted exome-wide rare variant association using 472 cases and 4,110 controls in 60,000 patients (discovery cohort); and 261 cases and 1,531 controls from 30,000 patients (replication cohort). After binning rare germline variants into genes, case-control association tests performed using Optimal Unified Approach for Rare-Variant Association, SKAT-O. Seven genes, including RBM12, NDUFB6, ATP6V1A, RECK, SLC35E1, RFX3 (Bonferroni-corrected P < 0.05) and ATP8A1 (suggestive P < 10−5), and one long non-coding RNA, DLGAP4-AS1 (Bonferroni-corrected P < 0.05), were associated with endometrial cancer. Notably, RECK, and ATP8A1 were replicated from the replication cohort (suggestive threshold P < 0.05). Additionally, a pathway-based rare variant analysis, using pathogenic and likely pathogenic variants, identified two significant pathways, pyrimidine metabolism and protein processing in the endoplasmic reticulum (Bonferroni-corrected P < 0.05). In conclusion, our results using the single-source electronic health records (EHR) linked to genomic data highlights candidate genes and pathways associated with endometrial cancer and indicates rare variants involvement in endometrial cancer predisposition, which could help in personalized prognosis and also further our understanding of its genetic etiology.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States
| | - Jason E Miller
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States.,Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, PA, United States
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States.,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
12
|
Zhang X, Basile AO, Pendergrass SA, Ritchie MD. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico. BMC Bioinformatics 2019; 20:46. [PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/26/2018] [Indexed: 11/11/2022] Open
Abstract
Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna O Basile
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
13
|
Zolotovskaia MA, Sorokin MI, Roumiantsev SA, Borisov NM, Buzdin AA. Pathway Instability Is an Effective New Mutation-Based Type of Cancer Biomarkers. Front Oncol 2019; 8:658. [PMID: 30662873 PMCID: PMC6328788 DOI: 10.3389/fonc.2018.00658] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 12/12/2018] [Indexed: 01/20/2023] Open
Abstract
DNA mutations play a crucial role in cancer development and progression. Mutation profiles vary dramatically in different cancer types and between individual tumors. Mutations of several individual genes are known as reliable cancer biomarkers, although the number of such genes is tiny and does not enable differential diagnostics for most of the cancers. We report here a technique enabling dramatically increased efficiency of cancer biomarkers development using DNA mutations data. It includes a quantitative metric termed Pathway instability (PI) based on mutations enrichment of intracellular molecular pathways. This method was tested on 5,956 tumor mutation profiles of 15 cancer types from The Cancer Genome Atlas (TCGA) project. Totally, we screened 2,316,670 mutations in 19,872 genes and 1,748 molecular pathways. Our results demonstrated considerable advantage of pathway-based mutation biomarkers over individual gene mutation profiles, as reflected by more than two orders of magnitude greater numbers by high-quality [ROC area-under-curve (AUC)>0.75] biomarkers. For example, the number of such high-quality mutational biomarkers distinguishing between different cancer types was only six for the individual gene mutations, and already 660 for the pathway-based biomarkers. These results evidence that PI value can be used as a new generation of complex cancer biomarkers significantly outperforming the existing gene mutation biomarkers.
Collapse
Affiliation(s)
- Marianna A Zolotovskaia
- Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia.,Oncobox Ltd., Moscow, Russia
| | - Maxim I Sorokin
- The Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Omicsway Corp., Walnut, CA, United States
| | - Sergey A Roumiantsev
- Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Nikolay M Borisov
- Oncobox Ltd., Moscow, Russia.,The Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Anton A Buzdin
- The Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Omicsway Corp., Walnut, CA, United States.,The Laboratory of Systems Biology, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| |
Collapse
|
14
|
Basile AO, Byrska-Bishop M, Wallace J, Frase AT, Ritchie MD. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants. Bioinformatics 2018; 34:527-529. [PMID: 28968757 PMCID: PMC5860358 DOI: 10.1093/bioinformatics/btx559] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna O Basile
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marta Byrska-Bishop
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - John Wallace
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - Alexander T Frase
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| |
Collapse
|
15
|
Miller JE, Shivakumar MK, Lee Y, Han S, Horgousluoglu E, Risacher SL, Saykin AJ, Nho K, Kim D. Rare variants in the splicing regulatory elements of EXOC3L4 are associated with brain glucose metabolism in Alzheimer's disease. BMC Med Genomics 2018; 11:76. [PMID: 30255815 PMCID: PMC6156983 DOI: 10.1186/s12920-018-0390-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is one of the most common neurodegenerative diseases that causes problems related to brain function. To some extent it is understood on a molecular level how AD arises, however there are a lack of biomarkers that can be used for early diagnosis. Two popular methods to identify AD-related biomarkers use genetics and neuroimaging. Genes and neuroimaging phenotypes have provided some insights as to the potential for AD biomarkers. While the field of imaging-genomics has identified genetic features associated with structural and functional neuroimaging phenotypes, it remains unclear how variants that affect splicing could be important for understanding the genetic etiology of AD. METHODS In this study, rare variants (minor allele frequency < 0.01) in splicing regulatory element (SRE) loci from whole genome sequencing (WGS) in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, were used to identify genes that are associated with global brain cortical glucose metabolism in AD measured by FDG PET-scans. Gene-based associated analyses of rare variants were performed using the program BioBin and the optimal Sequence Kernel Association Test (SKAT-O). RESULTS The gene, EXOC3L4, was identified as significantly associated with global cortical glucose metabolism (FDR (false discovery rate) corrected p < 0.05) using SRE coding variants only. Three loci that may affect splicing within EXOC3L4 contribute to the association. CONCLUSION Based on sequence homology, EXOC3L4 is likely a part of the exocyst complex. Our results suggest the possibility that variants which affect proper splicing of EXOC3L4 via SREs may impact vesicle transport, giving rise to AD related phenotypes. Overall, by utilizing WGS and functional neuroimaging we have identified a gene significantly associated with an AD related endophenotype, potentially through a mechanism that involves splicing.
Collapse
Affiliation(s)
- Jason E Miller
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,Present Address: Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Manu K Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Younghee Lee
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, 84106, USA
| | - Seonggyun Han
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, 84106, USA
| | - Emrin Horgousluoglu
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA.
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
| | | |
Collapse
|
16
|
Verma SS, Josyula N, Verma A, Zhang X, Veturi Y, Dewey FE, Hartzel DN, Lavage DR, Leader J, Ritchie MD, Pendergrass SA. Rare variants in drug target genes contributing to complex diseases, phenome-wide. Sci Rep 2018; 8:4624. [PMID: 29545597 PMCID: PMC5854600 DOI: 10.1038/s41598-018-22834-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 03/01/2018] [Indexed: 12/30/2022] Open
Abstract
The DrugBank database consists of ~800 genes that are well characterized drug targets. This list of genes is a useful resource for association testing. For example, loss of function (LOF) genetic variation has the potential to mimic the effect of drugs, and high impact variation in these genes can impact downstream traits. Identifying novel associations between genetic variation in these genes and a range of diseases can also uncover new uses for the drugs that target these genes. Phenome Wide Association Studies (PheWAS) have been successful in identifying genetic associations across hundreds of thousands of diseases. We have conducted a novel gene based PheWAS to test the effect of rare variants in DrugBank genes, evaluating associations between these genes and more than 500 quantitative and dichotomous phenotypes. We used whole exome sequencing data from 38,568 samples in Geisinger MyCode Community Health Initiative. We evaluated the results of this study when binning rare variants using various filters based on potential functional impact. We identified multiple novel associations, and the majority of the significant associations were driven by functionally annotated variation. Overall, this study provides a sweeping exploration of rare variant associations within functionally relevant genes across a wide range of diagnoses.
Collapse
Affiliation(s)
- Shefali Setia Verma
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Navya Josyula
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA
| | - Anurag Verma
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Xinyuan Zhang
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yogasudha Veturi
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - Dustin N Hartzel
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Daniel R Lavage
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Joe Leader
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA.,Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA.
| |
Collapse
|
17
|
Verma SS, Ritchie MD. Another Round of "Clue" to Uncover the Mystery of Complex Traits. Genes (Basel) 2018; 9:E61. [PMID: 29370075 PMCID: PMC5852557 DOI: 10.3390/genes9020061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/19/2017] [Accepted: 01/15/2018] [Indexed: 12/13/2022] Open
Abstract
A plethora of genetic association analyses have identified several genetic risk loci. Technological and statistical advancements have now led to the identification of not only common genetic variants, but also low-frequency variants, structural variants, and environmental factors, as well as multi-omics variations that affect the phenotypic variance of complex traits in a population, thus referred to as complex trait architecture. The concept of heritability, or the proportion of phenotypic variance due to genetic inheritance, has been studied for several decades, but its application is mainly in addressing the narrow sense heritability (or additive genetic component) from Genome-Wide Association Studies (GWAS). In this commentary, we reflect on our perspective on the complexity of understanding heritability for human traits in comparison to model organisms, highlighting another round of clues beyond GWAS and an alternative approach, investigating these clues comprehensively to help in elucidating the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Shefali Setia Verma
- The Huck Institute of Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Marylyn D Ritchie
- The Huck Institute of Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
18
|
Miller JE, Shivakumar MK, Risacher SL, Saykin AJ, Lee S, Nho K, Kim D. Codon bias among synonymous rare variants is associated with Alzheimer's disease imaging biomarker. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:365-376. [PMID: 29218897 PMCID: PMC5756629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Alzheimer's disease (AD) is a neurodegenerative disorder with few biomarkers even though it impacts a relatively large portion of the population and is predicted to affect significantly more individuals in the future. Neuroimaging has been used in concert with genetic information to improve our understanding in relation to how AD arises and how it can be potentially diagnosed. Additionally, evidence suggests synonymous variants can have a functional impact on gene regulatory mechanisms, including those related to AD. Some synonymous codons are preferred over others leading to a codon bias. The bias can arise with respect to codons that are more or less frequently used in the genome. A bias can also result from optimal and non-optimal codons, which have stronger and weaker codon anti-codon interactions, respectively. Although association tests have been utilized before to identify genes associated with AD, it remains unclear how codon bias plays a role and if it can improve rare variant analysis. In this work, rare variants from whole-genome sequencing from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort were binned into genes using BioBin. An association analysis of the genes with AD-related neuroimaging biomarker was performed using SKAT-O. While using all synonymous variants we did not identify any genomewide significant associations, using only synonymous variants that affected codon frequency we identified several genes as significantly associated with the imaging phenotype. Additionally, significant associations were found using only rare variants that contains an optimal codon in among minor alleles and a non-optimal codon in the major allele. These results suggest that codon bias may play a role in AD and that it can be used to improve detection power in rare variant association analysis.
Collapse
Affiliation(s)
- Jason E Miller
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
PURPOSE OF REVIEW Over many decades, researchers have been designing studies to investigate the relationship between genotypes and phenotypes to gain an understanding about the effect of genetics on disease. Recently, a high-throughput approach called phenome-wide associations studies (PheWAS) have been extensively used to identify associations between genetic variants and many diseases and traits simultaneously. In this review, we describe the value of PheWAS along with methodological issues and challenges in interpretation for current applications of PheWAS. RECENT FINDINGS PheWAS have uncovered a paradigm to identify new associations for genetic loci across many diseases. The application of PheWAS have been effective with phenotype data from electronic health records, epidemiological studies, and clinical trials data. SUMMARY The key strength of a PheWAS is to identify the association of one or more genetic variants with multiple phenotypes, which can showcase interconnections among the phenotypes due to shared genetic associations. While the PheWAS approach appears promising, there are a number of challenges that need to be addressed to provide additional robustness to PheWAS findings.
Collapse
Affiliation(s)
- Anurag Verma
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA
| |
Collapse
|
20
|
Kim D, Basile AO, Bang L, Horgusluoglu E, Lee S, Ritchie MD, Saykin AJ, Nho K. Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease. BMC Med Inform Decis Mak 2017; 17:61. [PMID: 28539126 PMCID: PMC5444041 DOI: 10.1186/s12911-017-0454-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Rapid advancement of next generation sequencing technologies such as whole genome sequencing (WGS) has facilitated the search for genetic factors that influence disease risk in the field of human genetics. To identify rare variants associated with human diseases or traits, an efficient genome-wide binning approach is needed. In this study we developed a novel biological knowledge-based binning approach for rare-variant association analysis and then applied the approach to structural neuroimaging endophenotypes related to late-onset Alzheimer’s disease (LOAD). Methods For rare-variant analysis, we used the knowledge-driven binning approach implemented in Bin-KAT, an automated tool, that provides 1) binning/collapsing methods for multi-level variant aggregation with a flexible, biologically informed binning strategy and 2) an option of performing unified collapsing and statistical rare variant analyses in one tool. A total of 750 non-Hispanic Caucasian participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort who had both WGS data and magnetic resonance imaging (MRI) scans were used in this study. Mean bilateral cortical thickness of the entorhinal cortex extracted from MRI scans was used as an AD-related neuroimaging endophenotype. SKAT was used for a genome-wide gene- and region-based association analysis of rare variants (MAF (minor allele frequency) < 0.05) and potential confounding factors (age, gender, years of education, intracranial volume (ICV) and MRI field strength) for entorhinal cortex thickness were used as covariates. Significant associations were determined using FDR adjustment for multiple comparisons. Results Our knowledge-driven binning approach identified 16 functional exonic rare variants in FANCC significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In addition, the approach identified 7 evolutionary conserved regions, which were mapped to FAF1, RFX7, LYPLAL1 and GOLGA3, significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In further analysis, the functional exonic rare variants in FANCC were also significantly associated with hippocampal volume and cerebrospinal fluid (CSF) Aβ1–42 (p-value < 0.05). Conclusions Our novel binning approach identified rare variants in FANCC as well as 7 evolutionary conserved regions significantly associated with a LOAD-related neuroimaging endophenotype. FANCC (fanconi anemia complementation group C) has been shown to modulate TLR and p38 MAPK-dependent expression of IL-1β in macrophages. Our results warrant further investigation in a larger independent cohort and demonstrate that the biological knowledge-driven binning approach is a powerful strategy to identify rare variants associated with AD and other complex disease.
Collapse
Affiliation(s)
- Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Anna O Basile
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Emrin Horgusluoglu
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Seunggeun Lee
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Marylyn D Ritchie
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Andrew J Saykin
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kwangsik Nho
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA.
| |
Collapse
|
21
|
Moore CCB, Basile AO, Wallace JR, Frase AT, Ritchie MD. A biologically informed method for detecting rare variant associations. BioData Min 2016; 9:27. [PMID: 27582876 PMCID: PMC5006419 DOI: 10.1186/s13040-016-0107-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 06/18/2016] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND BioBin is a bioinformatics software package developed to automate the process of binning rare variants into groups for statistical association analysis using a biological knowledge-driven framework. BioBin collapses variants into biological features such as genes, pathways, evolutionary conserved regions (ECRs), protein families, regulatory regions, and others based on user-designated parameters. BioBin provides the infrastructure to create complex and interesting hypotheses in an automated fashion thereby circumventing the necessity for advanced and time consuming scripting. PURPOSE OF THE STUDY In this manuscript, we describe the software package for BioBin, along with type I error and power simulations to demonstrate the strengths and various customizable features and analysis options of this variant binning tool. RESULTS Simulation testing highlights the utility of BioBin as a fast, comprehensive and expandable tool for the biologically-inspired binning and analysis of low-frequency variants in sequence data. CONCLUSIONS AND POTENTIAL IMPLICATIONS The BioBin software package has the capability to transform and streamline the analysis pipelines for researchers analyzing rare variants. This automated bioinformatics tool minimizes the manual effort of creating genomic regions for binning such that time can be spent on the much more interesting task of statistical analyses. This software package is open source and freely available from http://ritchielab.com/software/biobin-download.
Collapse
Affiliation(s)
| | - Anna Okula Basile
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA
| | - John Robert Wallace
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| | - Alex Thomas Frase
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| | - Marylyn DeRiggi Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| |
Collapse
|
22
|
Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 2016; 17:129-45. [PMID: 26875678 DOI: 10.1038/nrg.2015.36] [Citation(s) in RCA: 177] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome-phenome relationship.
Collapse
|
23
|
Basile AO, Wallace JR, Peissig P, McCarty CA, Brilliant M, Ritchie MD. KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:249-260. [PMID: 26776191 PMCID: PMC4824557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Next-generation sequencing technology has presented an opportunity for rare variant discovery and association of these variants with disease. To address the challenges of rare variant analysis, multiple statistical methods have been developed for combining rare variants to increase statistical power for detecting associations. BioBin is an automated tool that expands on collapsing/binning methods by performing multi-level variant aggregation with a flexible, biologically informed binning strategy using an internal biorepository, the Library of Knowledge (LOKI). The databases within LOKI provide variant details, regional annotations and pathway interactions which can be used to generate bins of biologically-related variants, thereby increasing the power of any subsequent statistical test. In this study, we expand the framework of BioBin to incorporate statistical tests, including a dispersion-based test, SKAT, thereby providing the option of performing a unified collapsing and statistical rare variant analysis in one tool. Extensive simulation studies performed on gene-coding regions showed a Bin-KAT analysis to have greater power than BioBin-regression in all simulated conditions, including variants influencing the phenotype in the same direction, a scenario where burden tests often retain greater power. The use of Madsen- Browning variant weighting increased power in the burden analysis to that equitable with Bin-KAT; but overall Bin-KAT retained equivalent or higher power under all conditions. Bin-KAT was applied to a study of 82 pharmacogenes sequenced in the Marshfield Personalized Medicine Research Project (PMRP). We looked for association of these genes with 9 different phenotypes extracted from the electronic health record. This study demonstrates that Bin-KAT is a powerful tool for the identification of genes harboring low frequency variants for complex phenotypes.
Collapse
Affiliation(s)
- Anna O Basile
- Department of Biochemistry, Microbiology and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | | | | | | | | | | |
Collapse
|
24
|
Tyler AL, Crawford DC, Pendergrass SA. The detection and characterization of pleiotropy: discovery, progress, and promise. Brief Bioinform 2015. [PMID: 26223525 DOI: 10.1093/bib/bbv050] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The impact of a single genetic locus on multiple phenotypes, or pleiotropy, is an important area of research. Biological systems are dynamic complex networks, and these networks exist within and between cells. In humans, the consideration of multiple phenotypes such as physiological traits, clinical outcomes and drug response, in the context of genetic variation, can provide ways of developing a more complete understanding of the complex relationships between genetic architecture and how biological systems function in health and disease. In this article, we describe recent studies exploring the relationships between genetic loci and more than one phenotype. We also cover methodological developments incorporating pleiotropy applied to model organisms as well as humans, and discuss how stepping beyond the analysis of a single phenotype leads to a deeper understanding of complex genetic architecture.
Collapse
|
25
|
Abu-Elmagd M, Assidi M, Schulten HJ, Dallol A, Pushparaj PN, Ahmed F, Scherer SW, Al-Qahtani M. Individualized medicine enabled by genomics in Saudi Arabia. BMC Med Genomics 2015; 8 Suppl 1:S3. [PMID: 25951871 PMCID: PMC4315314 DOI: 10.1186/1755-8794-8-s1-s3] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The biomedical research sector in Saudi Arabia has recently received special attention from the government, which is currently supporting research aimed at improving the understanding and treatment of common diseases afflicting Saudi Arabian society. To build capacity for research and training, a number of centres of excellence were established in different areas of the country. Among these, is the Centre of Excellence in Genomic Medicine Research (CEGMR) at King Abdulaziz University, Jeddah, with its internationally ranked and highly productive team performing translational research in the area of individualized medicine. Here, we present a panorama of the recent trends in different areas of biomedical research in Saudi Arabia drawing from our vision of where genomics will have maximal impact in the Kingdom of Saudi Arabia. We describe advances in a number of research areas including; congenital malformations, infertility, consanguinity and pre-implantation genetic diagnosis, cancer and genomic classifications in Saudi Arabia, epigenetic explanations of idiopathic disease, and pharmacogenomics and personalized medicine. We conclude that CEGMR will continue to play a pivotal role in advances in the field of genomics and research in this area is facing a number of challenges including generating high quality control data from Saudi population and policies for using these data need to comply with the international set up.
Collapse
Affiliation(s)
- Muhammad Abu-Elmagd
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
- KACST Technology Innovation Centre in Personalized Medicine at King Abdulaziz University (CIPM), P.O. Box: 80216 Jeddah 21589, KSA
- School of Biological Sciences, University of East Anglia, Norwich, Norfolk, NR4 7TJ, UK
- Zoology Department, Faculty of Science, Minia University, Minia, P.O. Box 61519, Egypt
| | - Mourad Assidi
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
- KACST Technology Innovation Centre in Personalized Medicine at King Abdulaziz University (CIPM), P.O. Box: 80216 Jeddah 21589, KSA
| | - Hans-Juergen Schulten
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
| | - Ashraf Dallol
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
- KACST Technology Innovation Centre in Personalized Medicine at King Abdulaziz University (CIPM), P.O. Box: 80216 Jeddah 21589, KSA
| | - Peter Natesan Pushparaj
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
| | - Farid Ahmed
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
| | - Stephen W Scherer
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, the Hospital for Sick Children, Toronto, Ontario, Canada
- McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Mohammed Al-Qahtani
- Centre of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box: 80216 Jeddah 21589, KSA
| |
Collapse
|
26
|
Kim D, Li R, Dudek SM, Wallace JR, Ritchie MD. Binning somatic mutations based on biological knowledge for predicting survival: an application in renal cell carcinoma. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015:96-107. [PMID: 25592572 PMCID: PMC4299944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a new approach for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.
Collapse
|
27
|
Chhibber A, Kroetz DL, Tantisira KG, McGeachie M, Cheng C, Plenge R, Stahl E, Sadee W, Ritchie MD, Pendergrass SA. Genomic architecture of pharmacological efficacy and adverse events. Pharmacogenomics 2014; 15:2025-48. [PMID: 25521360 PMCID: PMC4308414 DOI: 10.2217/pgs.14.144] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The pharmacokinetic and pharmacodynamic disciplines address pharmacological traits, including efficacy and adverse events. Pharmacogenomics studies have identified pervasive genetic effects on treatment outcomes, resulting in the development of genetic biomarkers for optimization of drug therapy. Pharmacogenomics-based tests are already being applied in clinical decision making. However, despite substantial progress in identifying the genetic etiology of pharmacological response, current biomarker panels still largely rely on single gene tests with a large portion of the genetic effects remaining to be discovered. Future research must account for the combined effects of multiple genetic variants, incorporate pathway-based approaches, explore gene-gene interactions and nonprotein coding functional genetic variants, extend studies across ancestral populations, and prioritize laboratory characterization of molecular mechanisms. Because genetic factors can play a key role in drug response, accurate biomarker tests capturing the main genetic factors determining treatment outcomes have substantial potential for improving individual clinical care.
Collapse
Affiliation(s)
- Aparna Chhibber
- Department of Bioengineering & Therapeutic Sciences, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,USA
| | - Deanna L Kroetz
- Department of Bioengineering & Therapeutic Sciences, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,USA
| | - Kelan G Tantisira
- Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Michael McGeachie
- Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Cheng Cheng
- Department of Biostatistics, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Robert Plenge
- Division of Rheumatology, Immunology & Allergy, Division of Genetics, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Eli Stahl
- Department of Genetics & Genomic Sciences, Mount Sinai Hospital, New York, NY, USA
| | - Wolfgang Sadee
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Marylyn D Ritchie
- Department of Biochemistry & Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16801, USA
| | - Sarah A Pendergrass
- Department of Biochemistry & Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16801, USA
| |
Collapse
|
28
|
Couthouis J, Raphael AR, Daneshjou R, Gitler AD. Targeted exon capture and sequencing in sporadic amyotrophic lateral sclerosis. PLoS Genet 2014; 10:e1004704. [PMID: 25299611 PMCID: PMC4191946 DOI: 10.1371/journal.pgen.1004704] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 08/25/2014] [Indexed: 12/11/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease that results in progressive degeneration of motor neurons, ultimately leading to paralysis and death. Approximately 10% of ALS cases are familial, with the remaining 90% of cases being sporadic. Genetic studies in familial cases of ALS have been extremely informative in determining the causative mutations behind ALS, especially as the same mutations identified in familial ALS can also cause sporadic disease. However, the cause of ALS in approximately 30% of familial cases and in the majority of sporadic cases remains unknown. Sporadic ALS cases represent an underutilized resource for genetic information about ALS; therefore, we undertook a targeted sequencing approach of 169 known and candidate ALS disease genes in 242 sporadic ALS cases and 129 matched controls to try to identify novel variants linked to ALS. We found a significant enrichment in novel and rare variants in cases versus controls, indicating that we are likely identifying disease associated mutations. This study highlights the utility of next generation sequencing techniques combined with functional studies and rare variant analysis tools to provide insight into the genetic etiology of a heterogeneous sporadic disease. Amyotrophic lateral sclerosis (ALS), also known as Charcot disease or Lou Gehrig's disease, is one of the most common neuromuscular diseases worldwide. This disease is characterized by a progressive degeneration of motor neurons, leading to patient death within a few years after onset. Despite the fact that most ALS cases are sporadic, most of the ALS genetic studies have focused on familial forms, leading to the genetic determination of cause for 70% of cases of familial ALS but for only 10% of sporadic ALS cases. This, coupled with the dearth of families available for study, suggests that researchers should begin tapping into the relatively untouched reservoir of available sporadic samples to identify novel genetic causes of sporadic ALS. Here we take advantage of high-throughput target sequencing techniques to test four different hypotheses about the genetic causes of ALS in sporadic ALS and uncover new candidate genes and pathways implicated in ALS.
Collapse
Affiliation(s)
- Julien Couthouis
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Alya R. Raphael
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Roxana Daneshjou
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Aaron D. Gitler
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
29
|
Pongor LS, Vera R, Ligeti B. Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification. PLoS One 2014; 9:e103441. [PMID: 25077800 PMCID: PMC4117525 DOI: 10.1371/journal.pone.0103441] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 07/02/2014] [Indexed: 01/23/2023] Open
Abstract
Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.
Collapse
Affiliation(s)
- Lőrinc S. Pongor
- Faculty of Information Technology, Pázmány Péter Catholic University, Budapest, Hungary
- 2nd Department of Pediatrics, Semmelweis University, Budapest, Hungary
- * E-mail:
| | - Roberto Vera
- Faculty of Information Technology, Pázmány Péter Catholic University, Budapest, Hungary
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Trieste, Italy
| | - Balázs Ligeti
- Faculty of Information Technology, Pázmány Péter Catholic University, Budapest, Hungary
| |
Collapse
|
30
|
Drögemöller BI, Wright GEB, Warnich L. Considerations for rare variants in drug metabolism genes and the clinical implications. Expert Opin Drug Metab Toxicol 2014; 10:873-84. [PMID: 24673405 DOI: 10.1517/17425255.2014.903239] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
INTRODUCTION Large-scale whole genome and exome resequencing studies have revealed that humans have a high level of deleterious rare variation, which has important implications for the design of future pharmacogenetics studies. AREAS COVERED Current pharmacogenetic guidelines focus on the implementation of common variation into dosing guidelines. However, it is becoming apparent that rare variation may also play an important role in differential drug response. Current sequencing technologies offer the opportunity to examine rare variation, but there are many challenges associated with such analyses. Nonetheless, if a comprehensive picture of the role that genetic variants play in treatment outcomes is to be obtained, it will be necessary to include the entire spectrum of variation, including rare variants, into pharmacogenetic research. EXPERT OPINION In order to implement pharmacogenetics in the clinic, patients should be genotyped for clinically actionable pharmacogenetic variants and patients responding unfavourably to treatment after pharmacogenetics-based dosing should be identified and resequenced to identify additional functionally relevant variants, including rare variants. All derived information should be added to a central database to allow for the updating of existing dosing guidelines. By routinely implementing such strategies, pharmacogenetics-based treatment guidelines will continue to improve.
Collapse
|
31
|
|