1
|
Guardado M, Perez C, Jackson S, Magaña J, Campana S, Samperio E, Rojas BC, Hernandez S, Syas K, Hernandez R, Zavala EI, Rohlfs R. py_ped_sim - A flexible forward genetic simulator for complex family pedigree analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.25.586501. [PMID: 38585824 PMCID: PMC10996500 DOI: 10.1101/2024.03.25.586501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Background Large-scale family pedigrees are commonly used across medical, evolutionary, and forensic genetics. These pedigrees are tools for identifying genetic disorders, tracking evolutionary patterns, and establishing familial relationships via forensic genetic identification. However, there is a lack of software to accurately simulate different pedigree structures along with genomes corresponding to those individuals in a family pedigree. This limits simulation-based evaluations of methods that use pedigrees. Results We have developed a python command-line-based tool called py_ped_sim that facilitates the simulation of pedigree structures and the genomes of individuals in a pedigree. py_ped_sim represents pedigrees as directed acyclic graphs, enabling conversion between standard pedigree formats and integration with the forward population genetic simulator, SLiM. Notably, py_ped_sim allows the simulation of varying numbers of offspring for a set of parents, with the capacity to shift the distribution of sibship sizes over generations. We additionally add simulations for events of misattributed paternity, which offers a way to simulate half-sibling relationships. We validated the accuracy of our software by simulating genomes onto diverse family pedigree structures, showing that the estimated kinship coefficients closely approximated expected values. Conclusions py_ped_sim is a user-friendly and open-source solution for simulating pedigree structures and conducting pedigree genome simulations. It empowers medical, forensic, and evolutionary genetics researchers to gain deeper insights into the dynamics of genetic inheritance and relatedness within families.
Collapse
Affiliation(s)
- Miguel Guardado
- San Francisco State University, Department of Mathematics, San Francisco CA, 94132, USA
- University of California San Francisco, Biological and Medical Informatics Graduate Program. San Francisco CA, 94158
- Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA; San Francisco, 94134, CA, USA
- University of Oregon; Department of Data Science; Eugene, OR, 97403, USA
| | - Cynthia Perez
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Shalom Jackson
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Joaquín Magaña
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Sthen Campana
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Emily Samperio
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | | | - Selena Hernandez
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
| | - Kaela Syas
- San Francisco State University, Department of Mathematics, San Francisco CA, 94132, USA
| | - Ryan Hernandez
- Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA; San Francisco, 94134, CA, USA
| | - Elena I. Zavala
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
- University of California, Berkeley, Department of Molecular and Cell Biology, Berkeley, CA, 94720, USA
| | - Rori Rohlfs
- San Francisco State University, Department of Biology, San Francisco CA, 94132, USA
- University of Oregon; Department of Data Science; Eugene, OR, 97403, USA
| |
Collapse
|
3
|
Lindstrand A, Eisfeldt J, Pettersson M, Carvalho CMB, Kvarnung M, Grigelioniene G, Anderlid BM, Bjerin O, Gustavsson P, Hammarsjö A, Georgii-Hemming P, Iwarsson E, Johansson-Soller M, Lagerstedt-Robinson K, Lieden A, Magnusson M, Martin M, Malmgren H, Nordenskjöld M, Norling A, Sahlin E, Stranneheim H, Tham E, Wincent J, Ygberg S, Wedell A, Wirta V, Nordgren A, Lundin J, Nilsson D. From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability. Genome Med 2019; 11:68. [PMID: 31694722 PMCID: PMC6836550 DOI: 10.1186/s13073-019-0675-1] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/09/2019] [Indexed: 12/30/2022] Open
Abstract
Background Since different types of genetic variants, from single nucleotide variants (SNVs) to large chromosomal rearrangements, underlie intellectual disability, we evaluated the use of whole-genome sequencing (WGS) rather than chromosomal microarray analysis (CMA) as a first-line genetic diagnostic test. Methods We analyzed three cohorts with short-read WGS: (i) a retrospective cohort with validated copy number variants (CNVs) (cohort 1, n = 68), (ii) individuals referred for monogenic multi-gene panels (cohort 2, n = 156), and (iii) 100 prospective, consecutive cases referred to our center for CMA (cohort 3). Bioinformatic tools developed include FindSV, SVDB, Rhocall, Rhoviz, and vcf2cytosure. Results First, we validated our structural variant (SV)-calling pipeline on cohort 1, consisting of three trisomies and 79 deletions and duplications with a median size of 850 kb (min 500 bp, max 155 Mb). All variants were detected. Second, we utilized the same pipeline in cohort 2 and analyzed with monogenic WGS panels, increasing the diagnostic yield to 8%. Next, cohort 3 was analyzed by both CMA and WGS. The WGS data was processed for large (> 10 kb) SVs genome-wide and for exonic SVs and SNVs in a panel of 887 genes linked to intellectual disability as well as genes matched to patient-specific Human Phenotype Ontology (HPO) phenotypes. This yielded a total of 25 pathogenic variants (SNVs or SVs), of which 12 were detected by CMA as well. We also applied short tandem repeat (STR) expansion detection and discovered one pathologic expansion in ATXN7. Finally, a case of Prader-Willi syndrome with uniparental disomy (UPD) was validated in the WGS data. Important positional information was obtained in all cohorts. Remarkably, 7% of the analyzed cases harbored complex structural variants, as exemplified by a ring chromosome and two duplications found to be an insertional translocation and part of a cryptic unbalanced translocation, respectively. Conclusion The overall diagnostic rate of 27% was more than doubled compared to clinical microarray (12%). Using WGS, we detected a wide range of SVs with high accuracy. Since the WGS data also allowed for analysis of SNVs, UPD, and STRs, it represents a powerful comprehensive genetic test in a clinical diagnostic laboratory setting.
Collapse
Affiliation(s)
- Anna Lindstrand
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden. .,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden. .,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.
| | - Jesper Eisfeldt
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Maria Pettersson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Malin Kvarnung
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Giedre Grigelioniene
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Britt-Marie Anderlid
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Olof Bjerin
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Peter Gustavsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anna Hammarsjö
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | | | - Erik Iwarsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Maria Johansson-Soller
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Kristina Lagerstedt-Robinson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Agne Lieden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Måns Magnusson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Marcel Martin
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Helena Malmgren
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Magnus Nordenskjöld
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Ameli Norling
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Ellika Sahlin
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Henrik Stranneheim
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Emma Tham
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Josephine Wincent
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Sofia Ygberg
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Wedell
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Valtteri Wirta
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden.,Science for Life Laboratory, Department of Microbiology, Tumor and Cell biology, Karolinska Institutet, Stockholm, Sweden
| | - Ann Nordgren
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Johanna Lundin
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Daniel Nilsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
8
|
He Z, Zhang D, Renton AE, Li B, Zhao L, Wang GT, Goate AM, Mayeux R, Leal SM. The Rare-Variant Generalized Disequilibrium Test for Association Analysis of Nuclear and Extended Pedigrees with Application to Alzheimer Disease WGS Data. Am J Hum Genet 2017; 100:193-204. [PMID: 28065470 PMCID: PMC5294711 DOI: 10.1016/j.ajhg.2016.12.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 12/06/2016] [Indexed: 01/10/2023] Open
Abstract
Whole-genome and exome sequence data can be cost-effectively generated for the detection of rare-variant (RV) associations in families. Causal variants that aggregate in families usually have larger effect sizes than those found in sporadic cases, so family-based designs can be a more powerful approach than population-based designs. Moreover, some family-based designs are robust to confounding due to population admixture or substructure. We developed a RV extension of the generalized disequilibrium test (GDT) to analyze sequence data obtained from nuclear and extended families. The GDT utilizes genotype differences of all discordant relative pairs to assess associations within a family, and the RV extension combines the single-variant GDT statistic over a genomic region of interest. The RV-GDT has increased power by efficiently incorporating information beyond first-degree relatives and allows for the inclusion of covariates. Using simulated genetic data, we demonstrated that the RV-GDT method has well-controlled type I error rates, even when applied to admixed populations and populations with substructure. It is more powerful than existing family-based RV association methods, particularly for the analysis of extended pedigrees and pedigrees with missing data. We analyzed whole-genome sequence data from families affected by Alzheimer disease to illustrate the application of the RV-GDT. Given the capability of the RV-GDT to adequately control for population admixture or substructure and analyze pedigrees with missing genotype data and its superior power over other family-based methods, it is an effective tool for elucidating the involvement of RVs in the etiology of complex traits.
Collapse
Affiliation(s)
- Zongxiao He
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Di Zhang
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alan E. Renton
- Department of Neuroscience and Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029, USA
| | - Biao Li
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Linhai Zhao
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Gao T. Wang
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alison M. Goate
- Department of Neuroscience and Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029, USA
| | - Richard Mayeux
- Department of Neurology, Taub Institute on Alzheimer’s Disease and the Aging Brain and Gertrude H. Sergievsky Center, Columbia University, New York, NY 10027, USA
| | - Suzanne M. Leal
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Corresponding author
| |
Collapse
|