1
|
Tian P, Wang W, Luo S, Du X, Zhong Y, Sun F, Xu Z, Xiao J, Yu S, Niu W. Genomic Single Nucleotide Polymorphism (SNP) markers and mitochondrial haplotypes illuminate the origins of Crown-of-Thorns Starfish (Acanthaster solaris) outbreaks in the South China Sea. BMC Genomics 2024; 25:1094. [PMID: 39550602 PMCID: PMC11568665 DOI: 10.1186/s12864-024-11011-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 11/07/2024] [Indexed: 11/18/2024] Open
Abstract
BACKGROUND Outbreaks of the coral predator Crown-of-Thorns Starfish (CoTS) pose a severe threat to coral reefs in the Indo-Pacific Ocean. In 2018, the South China Sea (SCS) experienced significant CoTS outbreaks, leading to extensive coral mortality across the Xisha, Zhongsha, Dongsha, and Nansha Islands, severely impacting the coral reef ecosystem. RESULTS To explore the origins of these outbreaks, we conducted a comprehensive genomic analysis using data from genomic single nucleotide polymorphism sites (SNPs) and mitochondrial haplotypes. Our analysis reveals that CoTS populations in the SCS, which exhibit moderate genetic diversity and may have undergone positive selection or population expansion. There was limited genetic differentiation among CoTS populations from XS, ZS, and NS groups. Especially between the XS and ZS groups, there was almost no genetic differentiation. The populations from XS, ZS, and NS groups have strong genetic connections with populations in Vietnam and the Philippines. There was high gene flow from Vietnam to the Xisha Islands and from the Philippines to the Nansha Islands, suggesting that the CoTS populations in these regions primarily originate from these neighboring countries. CONCLUSION The comprehensive analyses of SNP and mitochondrial genomes have provided valuable insights into the population genetics of CoTS. This research has generated significant genomic resources and facilitated important studies on the genetics of the CoTS species. By identifying potential source populations and understanding the genetic basis of their spread, managers can develop more effective conservation strategies to protect vulnerable coral reef ecosystems in the SCS.
Collapse
Affiliation(s)
- Peng Tian
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China
- Nansha Islands Coral Reef Ecosystem National Observation and Research Station, Guangzhou, 510000, China
| | - Wei Wang
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China
- Nansha Islands Coral Reef Ecosystem National Observation and Research Station, Guangzhou, 510000, China
| | - Site Luo
- School of Life Sciences, Xiamen University, Xiamen, 361102, China
| | - Xiao Du
- BGI Research, Shenzhen, 518083, China
- BGI Research, Qingdao, 266555, China
| | - Yinghui Zhong
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China
- Fisheries College, Jimei University, Xiamen, 361000, China
| | - Fucheng Sun
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, Fujian Agriculture and Forestry University, Fuzhou, 350001, China
| | - Ziqing Xu
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China
| | - Jiaguang Xiao
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China.
- Nansha Islands Coral Reef Ecosystem National Observation and Research Station, Guangzhou, 510000, China.
| | - Shuangen Yu
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China.
| | - Wentao Niu
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China.
- Nansha Islands Coral Reef Ecosystem National Observation and Research Station, Guangzhou, 510000, China.
| |
Collapse
|
2
|
Mansueto L, McNally KL, Kretzschmar T, Mauleon R. CannSeek? Yes we Can! An open-source single nucleotide polymorphism database and analysis portal for Cannabis sativa. GIGABYTE 2024; 2024:gigabyte135. [PMID: 39416656 PMCID: PMC11480739 DOI: 10.46471/gigabyte.135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 09/13/2024] [Indexed: 10/19/2024] Open
Abstract
A growing interest in Cannabis sativa uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data. Availability and implementation The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.
Collapse
Affiliation(s)
- Locedie Mansueto
- Southern Cross University, Military Road, Lismore New South Wales 2480, Australia
| | - Kenneth L. McNally
- International Rice Research Institute, Pili Drive, Los Baños Laguna 4031, Philippines
| | - Tobias Kretzschmar
- Southern Cross University, Military Road, Lismore New South Wales 2480, Australia
| | - Ramil Mauleon
- Southern Cross University, Military Road, Lismore New South Wales 2480, Australia
- International Rice Research Institute, Pili Drive, Los Baños Laguna 4031, Philippines
| |
Collapse
|
3
|
Hanna MG, Olson NH, Zarella M, Dash RC, Herrmann MD, Furtado LV, Stram MN, Raciti PM, Hassell L, Mays A, Pantanowitz L, Sirintrapun JS, Krishnamurthy S, Parwani A, Lujan G, Evans A, Glassy EF, Bui MM, Singh R, Souers RJ, de Baca ME, Seheult JN. Recommendations for Performance Evaluation of Machine Learning in Pathology: A Concept Paper From the College of American Pathologists. Arch Pathol Lab Med 2024; 148:e335-e361. [PMID: 38041522 DOI: 10.5858/arpa.2023-0042-cp] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 12/03/2023]
Abstract
CONTEXT.— Machine learning applications in the pathology clinical domain are emerging rapidly. As decision support systems continue to mature, laboratories will increasingly need guidance to evaluate their performance in clinical practice. Currently there are no formal guidelines to assist pathology laboratories in verification and/or validation of such systems. These recommendations are being proposed for the evaluation of machine learning systems in the clinical practice of pathology. OBJECTIVE.— To propose recommendations for performance evaluation of in vitro diagnostic tests on patient samples that incorporate machine learning as part of the preanalytical, analytical, or postanalytical phases of the laboratory workflow. Topics described include considerations for machine learning model evaluation including risk assessment, predeployment requirements, data sourcing and curation, verification and validation, change control management, human-computer interaction, practitioner training, and competency evaluation. DATA SOURCES.— An expert panel performed a review of the literature, Clinical and Laboratory Standards Institute guidance, and laboratory and government regulatory frameworks. CONCLUSIONS.— Review of the literature and existing documents enabled the development of proposed recommendations. This white paper pertains to performance evaluation of machine learning systems intended to be implemented for clinical patient testing. Further studies with real-world clinical data are encouraged to support these proposed recommendations. Performance evaluation of machine learning models is critical to verification and/or validation of in vitro diagnostic tests using machine learning intended for clinical practice.
Collapse
Affiliation(s)
- Matthew G Hanna
- From the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna, Sirintrapun)
| | - Niels H Olson
- The Defense Innovation Unit, Mountain View, California (Olson)
- The Department of Pathology, Uniformed Services University, Bethesda, Maryland (Olson)
| | - Mark Zarella
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Zarella, Seheult)
| | - Rajesh C Dash
- Department of Pathology, Duke University Health System, Durham, North Carolina (Dash)
| | - Markus D Herrmann
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston (Herrmann)
| | - Larissa V Furtado
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee (Furtado)
| | - Michelle N Stram
- The Department of Forensic Medicine, New York University, and Office of Chief Medical Examiner, New York (Stram)
| | | | - Lewis Hassell
- Department of Pathology, Oklahoma University Health Sciences Center, Oklahoma City (Hassell)
| | - Alex Mays
- The MITRE Corporation, McLean, Virginia (Mays)
| | - Liron Pantanowitz
- Department of Pathology & Clinical Labs, University of Michigan, Ann Arbor (Pantanowitz)
| | - Joseph S Sirintrapun
- From the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna, Sirintrapun)
| | | | - Anil Parwani
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus (Parwani, Lujan)
| | - Giovanni Lujan
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus (Parwani, Lujan)
| | - Andrew Evans
- Laboratory Medicine, Mackenzie Health, Toronto, Ontario, Canada (Evans)
| | - Eric F Glassy
- Affiliated Pathologists Medical Group, Rancho Dominguez, California (Glassy)
| | - Marilyn M Bui
- Departments of Pathology and Machine Learning, Moffitt Cancer Center, Tampa, Florida (Bui)
| | - Rajendra Singh
- Department of Dermatopathology, Summit Health, Summit Woodland Park, New Jersey (Singh)
| | - Rhona J Souers
- Department of Biostatistics, College of American Pathologists, Northfield, Illinois (Souers)
| | | | - Jansen N Seheult
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Zarella, Seheult)
| |
Collapse
|
4
|
Aizpurua O, Dunn RR, Hansen LH, Gilbert MTP, Alberdi A. Field and laboratory guidelines for reliable bioinformatic and statistical analysis of bacterial shotgun metagenomic data. Crit Rev Biotechnol 2024; 44:1164-1182. [PMID: 37731336 DOI: 10.1080/07388551.2023.2254933] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/22/2023] [Accepted: 06/27/2023] [Indexed: 09/22/2023]
Abstract
Shotgun metagenomics is an increasingly cost-effective approach for profiling environmental and host-associated microbial communities. However, due to the complexity of both microbiomes and the molecular techniques required to analyze them, the reliability and representativeness of the results are contingent upon the field, laboratory, and bioinformatic procedures employed. Here, we consider 15 field and laboratory issues that critically impact downstream bioinformatic and statistical data processing, as well as result interpretation, in bacterial shotgun metagenomic studies. The issues we consider encompass intrinsic properties of samples, study design, and laboratory-processing strategies. We identify the links of field and laboratory steps with downstream analytical procedures, explain the means for detecting potential pitfalls, and propose mitigation measures to overcome or minimize their impact in metagenomic studies. We anticipate that our guidelines will assist data scientists in appropriately processing and interpreting their data, while aiding field and laboratory researchers to implement strategies for improving the quality of the generated results.
Collapse
Affiliation(s)
- Ostaizka Aizpurua
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Robert R Dunn
- Department of Applied Ecology, North Carolina State University, Raleigh, NC, USA
| | - Lars H Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - M T P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Yang J, Wang DF, Huang JH, Zhu QH, Luo LY, Lu R, Xie XL, Salehian-Dehkordi H, Esmailizadeh A, Liu GE, Li MH. Structural variant landscapes reveal convergent signatures of evolution in sheep and goats. Genome Biol 2024; 25:148. [PMID: 38845023 PMCID: PMC11155191 DOI: 10.1186/s13059-024-03288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 05/21/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Sheep and goats have undergone domestication and improvement to produce similar phenotypes, which have been greatly impacted by structural variants (SVs). Here, we report a high-quality chromosome-level reference genome of Asiatic mouflon, and implement a comprehensive analysis of SVs in 897 genomes of worldwide wild and domestic populations of sheep and goats to reveal genetic signatures underlying convergent evolution. RESULTS We characterize the SV landscapes in terms of genetic diversity, chromosomal distribution and their links with genes, QTLs and transposable elements, and examine their impacts on regulatory elements. We identify several novel SVs and annotate corresponding genes (e.g., BMPR1B, BMPR2, RALYL, COL21A1, and LRP1B) associated with important production traits such as fertility, meat and milk production, and wool/hair fineness. We detect signatures of selection involving the parallel evolution of orthologous SV-associated genes during domestication, local environmental adaptation, and improvement. In particular, we find that fecundity traits experienced convergent selection targeting the gene BMPR1B, with the DEL00067921 deletion explaining ~10.4% of the phenotypic variation observed in goats. CONCLUSIONS Our results provide new insights into the convergent evolution of SVs and serve as a rich resource for the future improvement of sheep, goats, and related livestock.
Collapse
Affiliation(s)
- Ji Yang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Dong-Feng Wang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Jia-Hui Huang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Qiang-Hui Zhu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ling-Yun Luo
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ran Lu
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Hosein Salehian-Dehkordi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, 76169-133, Iran
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD, 20705, USA
| | - Meng-Hua Li
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China.
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
6
|
Li R, Ernst J. Identifying associations of de novo noncoding variants with autism through integration of gene expression, sequence and sex information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.20.585624. [PMID: 38562739 PMCID: PMC10983996 DOI: 10.1101/2024.03.20.585624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Whole-genome sequencing (WGS) data is facilitating genome-wide identification of rare noncoding variants, while elucidating their roles in disease remains challenging. Towards this end, we first revisit a reported significant brain-related association signal of autism spectrum disorder (ASD) detected from de novo noncoding variants attributed to deep-learning and show that local GC content can capture similar association signals. We further show that the association signal appears driven by variants from male proband-female sibling pairs that are upstream of assigned genes. We then develop Expression Neighborhood Sequence Association Study (ENSAS), which utilizes gene expression correlations and sequence information, to more systematically identify phenotype-associated variant sets. Applying ENSAS to the same set of de novo variants, we identify gene expression-based neighborhoods showing significant ASD association signal, enriched for synapse-related gene ontology terms. For these top neighborhoods, we also identify chromatin states annotations of variants that are predictive of the proband-sibling local GC content differences. Our work provides new insights into associations of non-coding de novo mutations in ASD and presents an analytical framework applicable to other phenotypes.
Collapse
Affiliation(s)
- Runjia Li
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, CA, USA
- Computer Science Department, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, USA
| |
Collapse
|
7
|
Sundby RT, Rhodes SD, Komlodi-Pasztor E, Sarnoff H, Grasso V, Upadhyaya M, Kim A, Evans DG, Blakeley JO, Hanemann CO, Bettegowda C. Recommendations for the collection and annotation of biosamples for analysis of biomarkers in neurofibromatosis and schwannomatosis clinical trials. Clin Trials 2024; 21:40-50. [PMID: 37904489 PMCID: PMC10922556 DOI: 10.1177/17407745231203330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
INTRODUCTION Neurofibromatosis 1 and schwannomatosis are characterized by potential lifelong morbidity and life-threatening complications. To date, however, diagnostic and predictive biomarkers are an unmet need in this patient population. The inclusion of biomarker discovery correlatives in neurofibromatosis 1/schwannomatosis clinical trials enables study of low-incidence disease. The implementation of a common data model would further enhance biomarker discovery by enabling effective concatenation of data from multiple studies. METHODS The Response Evaluation in Neurofibromatosis and Schwannomatosis biomarker working group reviewed published data on emerging trends in neurofibromatosis 1 and schwannomatosis biomarker research and developed recommendations in a series of consensus meetings. RESULTS Liquid biopsy has emerged as a promising assay for neurofibromatosis 1/schwannomatosis biomarker discovery and validation. In addition, we review recommendations for a range of biomarkers in clinical trials, neurofibromatosis 1/schwannomatosis-specific data annotations, and common data models for data integration. CONCLUSION These Response Evaluation in Neurofibromatosis and Schwannomatosis consensus guidelines are intended to provide best practices for the inclusion of biomarker studies in neurofibromatosis 1/schwannomatosis clinical trials, data, and sample annotation and to lay a framework for data harmonization and concatenation between trials.
Collapse
Affiliation(s)
- R Taylor Sundby
- Pediatric Oncology Branch, National Cancer Institute, Bethesda, MD, USA
| | - Steven D Rhodes
- Division of Hematology/Oncology/Stem Cell Transplant, Department of Pediatrics, Herman B Wells Center for Pediatric Research, School of Medicine, Indiana University, Indianapolis, IN, USA
| | - Edina Komlodi-Pasztor
- Department of Neurology, MedStar Georgetown University Hospital, Washington, DC, USA
| | - Herb Sarnoff
- Research and Development, Infixion Bioscience, Inc., San Diego, CA, USA
- Patient Representative, REiNS International Collaboration, San Diego, CA, USA
| | - Vito Grasso
- Neural Stem Cell Institute, Rensselaer, NY, USA
- Patient Representative, REiNS International Collaboration, Troy, NY, USA
| | - Meena Upadhyaya
- Division of Cancer and Genetics, Cardiff University, Wales, UK
| | - AeRang Kim
- Center for Cancer and Blood Disorders, Children’s National Hospital, Washington, DC, USA
| | - D Gareth Evans
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester Academic Health Sciences Centre (MAHSC), ERN GENTURIS, Division of Evolution, Infection and Genomics, The University of Manchester, Manchester, UK
| | - Jaishri O Blakeley
- Division of Neuro-Oncology, Department of Neurology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | | | - Chetan Bettegowda
- Department of Neurosurgery, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
8
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
9
|
Reynoso-García J, Santiago-Rodriguez TM, Narganes-Storde Y, Cano RJ, Toranzos GA. Edible flora in pre-Columbian Caribbean coprolites: Expected and unexpected data. PLoS One 2023; 18:e0292077. [PMID: 37819893 PMCID: PMC10566737 DOI: 10.1371/journal.pone.0292077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 09/12/2023] [Indexed: 10/13/2023] Open
Abstract
Coprolites, or mummified feces, are valuable sources of information on ancient cultures as they contain ancient DNA (aDNA). In this study, we analyzed ancient plant DNA isolated from coprolites belonging to two pre-Columbian cultures (Huecoid and Saladoid) from Vieques, Puerto Rico, using shotgun metagenomic sequencing to reconstruct diet and lifestyles. We also analyzed DNA sequences of putative phytopathogenic fungi, likely ingested during food consumption, to further support dietary habits. Our findings show that pre-Columbian Caribbean cultures had a diverse diet consisting of maize (Zea mays), sweet potato (Ipomoea batatas), chili peppers (Capsicum annuum), peanuts (Arachis spp.), papaya (Carica papaya), tomato (Solanum lycopersicum) and, very surprisingly cotton (Gossypium barbadense) and tobacco (Nicotiana sylvestris). Modelling of putative phytopathogenic fungi and plant interactions confirmed the potential consumption of these plants as well as edible fungi, particularly Ustilago spp., which suggest the consumption of maize and huitlacoche. These findings suggest that a variety of dietary, medicinal, and hallucinogenic plants likely played an important role in ancient human subsistence and societal customs. We compared our results with coprolites found in Mexico and the United States, as well as present-day faeces from Mexico, Peru, and the United States. The results suggest that the diet of pre-Columbian cultures resembled that of present-day hunter-gatherers, while agriculturalists exhibited a transitional state in dietary lifestyles between the pre-Columbian cultures and larger scale farmers and United States individuals. Our study highlights differences in dietary patterns related to human lifestyles and provides insight into the flora present in the pre-Columbian Caribbean area. Importantly, data from ancient fecal specimens demonstrate the importance of ancient DNA studies to better understand pre-Columbian populations.
Collapse
Affiliation(s)
- Jelissa Reynoso-García
- Environmental Microbiology Laboratory, Biology Department, University of Puerto Rico, San Juan, Puerto Rico
| | | | | | - Raul J. Cano
- Biological Sciences Department, California Polytechnic State University, San Luis Obispo, California, United States of America
| | - Gary A. Toranzos
- Environmental Microbiology Laboratory, Biology Department, University of Puerto Rico, San Juan, Puerto Rico
| |
Collapse
|
10
|
Beichman AC, Robinson J, Lin M, Moreno-Estrada A, Nigenda-Morales S, Harris K. Evolution of the Mutation Spectrum Across a Mammalian Phylogeny. Mol Biol Evol 2023; 40:msad213. [PMID: 37770035 PMCID: PMC10566577 DOI: 10.1093/molbev/msad213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/21/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023] Open
Abstract
Although evolutionary biologists have long theorized that variation in DNA repair efficacy might explain some of the diversity of lifespan and cancer incidence across species, we have little data on the variability of normal germline mutagenesis outside of humans. Here, we shed light on the spectrum and etiology of mutagenesis across mammals by quantifying mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k-mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clock-like mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these signatures to fit each species' 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the 1-mer spectrum's phylogenetic signal in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.
Collapse
Affiliation(s)
- Annabel C Beichman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Meixi Lin
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity, Advanced Genomics Unit (UGA-LANGEBIO), CINVESTAV, Irapuato, Mexico
| | - Sergio Nigenda-Morales
- Department of Biological Sciences, California State University, San Marcos, San Marcos, CA, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Herbold Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
11
|
Ye H, Zhang X, Wang C, Goode EL, Chen J. Batch-effect correction with sample remeasurement in highly confounded case-control studies. NATURE COMPUTATIONAL SCIENCE 2023; 3:709-719. [PMID: 38177326 PMCID: PMC10993308 DOI: 10.1038/s43588-023-00500-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 07/11/2023] [Indexed: 01/06/2024]
Abstract
Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch-effect correction with remeasured samples are severely underdeveloped. Here we developed a framework for batch-effect correction using remeasured samples in highly confounded case-control studies. We provided theoretical analyses of the proposed procedure, evaluated its power characteristics and provided a power calculation tool to aid in the study design. We found that the number of samples that need to be remeasured depends strongly on the between-batch correlation. When the correlation is high, remeasuring a small subset of samples is possible to rescue most of the power.
Collapse
Affiliation(s)
- Hanxuan Ye
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX, USA.
| | - Chen Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Ellen L Goode
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
12
|
Beichman AC, Robinson J, Lin M, Moreno-Estrada A, Nigenda-Morales S, Harris K. "Evolution of the mutation spectrum across a mammalian phylogeny". BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.31.543114. [PMID: 37398383 PMCID: PMC10312511 DOI: 10.1101/2023.05.31.543114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Little is known about how the spectrum and etiology of germline mutagenesis might vary among mammalian species. To shed light on this mystery, we quantify variation in mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k -mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clocklike mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these clocklike signatures to fit each species' 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the mutation spectrum's phylogenetic signal when fit to non-context-dependent mutation spectrum data in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.
Collapse
Affiliation(s)
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA
| | - Meixi Lin
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity, Advanced Genomics Unit (UGA-LANGEBIO), CINVESTAV, Irapuato, Mexico
| | - Sergio Nigenda-Morales
- Department of Biological Sciences, California State University, San Marcos, San Marcos CA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle WA
| |
Collapse
|
13
|
Grover CE, Arick MA, Thrash A, Sharbrough J, Hu G, Yuan D, Snodgrass S, Miller ER, Ramaraj T, Peterson DG, Udall JA, Wendel JF. Dual Domestication, Diversity, and Differential Introgression in Old World Cotton Diploids. Genome Biol Evol 2022; 14:evac170. [PMID: 36510772 PMCID: PMC9792962 DOI: 10.1093/gbe/evac170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/19/2022] [Accepted: 12/01/2022] [Indexed: 12/15/2022] Open
Abstract
Domestication in the cotton genus is remarkable in that it has occurred independently four different times at two different ploidy levels. Relatively little is known about genome evolution and domestication in the cultivated diploid species Gossypium herbaceum and Gossypium arboreum, due to the absence of wild representatives for the latter species, their ancient domestication, and their joint history of human-mediated dispersal and interspecific gene flow. Using in-depth resequencing of a broad sampling from both species, we provide support for their independent domestication, as opposed to a progenitor-derivative relationship, showing that diversity (mean π = 6 × 10-3) within species is similar, and that divergence between species is modest (FST = 0.413). Individual accessions were homozygous for ancestral single-nucleotide polymorphisms at over half of variable sites, while fixed, derived sites were at modest frequencies. Notably, two chromosomes with a paucity of fixed, derived sites (i.e., chromosomes 7 and 10) were also strongly implicated as having experienced high levels of introgression. Collectively, these data demonstrate variable permeability to introgression among chromosomes, which we propose is due to divergent selection under domestication and/or the phenomenon of F2 breakdown in interspecific crosses. Our analyses provide insight into the evolutionary forces that shape diversity and divergence in the diploid cultivated species and establish a foundation for understanding the contribution of introgression and/or strong parallel selection to the extensive morphological similarities shared between species.
Collapse
Affiliation(s)
- Corrinne E Grover
- Ecology, Evolution, and Organismal Biology Department, Iowa State University, Ames, Iowa 5001, USA
| | - Mark A Arick
- Biocomputing & Biotechnology, Institute for Genomics, Mississippi State University, Mississippi, USA
| | - Adam Thrash
- Biocomputing & Biotechnology, Institute for Genomics, Mississippi State University, Mississippi, USA
| | - Joel Sharbrough
- Biology Department, New Mexico Institute of Mining and Technology, Socorro, New Mexico 87801, USA
| | - Guanjing Hu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Daojun Yuan
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan Hubei 430070, China
| | - Samantha Snodgrass
- Ecology, Evolution, and Organismal Biology Department, Iowa State University, Ames, Iowa 5001, USA
| | - Emma R Miller
- Ecology, Evolution, and Organismal Biology Department, Iowa State University, Ames, Iowa 5001, USA
| | - Thiruvarangan Ramaraj
- School of Computing, College of Computing and Digital Media, DePaul University, Chicago, Illinois 6060, USA
| | - Daniel G Peterson
- Biocomputing & Biotechnology, Institute for Genomics, Mississippi State University, Mississippi, USA
| | - Joshua A Udall
- Crop Germplasm Research Unit, USDA/Agricultural Research Service, 2881 F&B Road, College Station, Texas 77845, USA
| | - Jonathan F Wendel
- Ecology, Evolution, and Organismal Biology Department, Iowa State University, Ames, Iowa 5001, USA
| |
Collapse
|
14
|
The Tibetan-Yi region is both a corridor and a barrier for human gene flow. Cell Rep 2022; 39:110720. [PMID: 35476999 DOI: 10.1016/j.celrep.2022.110720] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 11/08/2021] [Accepted: 03/31/2022] [Indexed: 11/22/2022] Open
Abstract
The Tibetan-Yi Corridor (TYC) region between Tibet and the rest of east Asia has served as a crossroads for human migrations for thousands of years. The lack of whole-genome sequencing data specific to the TYC populations has hindered the understanding of the fundamental patterns of migration and divergence between humans in east Asia and southeast Asia. Here, we provide 248 individual whole genomes from the 16 TYC and 3 outgroup populations to elucidate historical relationships. We find that the Tibetan plateau forms an important barrier to gene flow, with a more Tibetan-like ancestry in northern populations and a southern east Asian-related ancestry in south populations. An isolated population, Achang, shows a prolonged isolation and genetic drift compared to other TYC populations. We also note that previous claims regarding the history and structure of TYC populations inferred by linguistics are incompatible with the genetic evidence.
Collapse
|
15
|
Cinelli C, LaPierre N, Hill BL, Sankararaman S, Eskin E. Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy. Nat Commun 2022; 13:1093. [PMID: 35232963 DOI: 10.1101/2020.10.21.347773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/14/2022] [Indexed: 05/25/2023] Open
Abstract
Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.
Collapse
Affiliation(s)
- Carlos Cinelli
- Department of Statistics, University of Washington, Seattle, WA, USA.
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, CA, USA
| | - Brian L Hill
- Department of Computer Science, University of California, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, USA
| |
Collapse
|
16
|
Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy. Nat Commun 2022; 13:1093. [PMID: 35232963 PMCID: PMC8888767 DOI: 10.1038/s41467-022-28553-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/14/2022] [Indexed: 01/07/2023] Open
Abstract
Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.
Collapse
|
17
|
Kong M, Ma T, Xiang B. ANKRD1 and SPP1 as diagnostic markers and correlated with immune infiltration in biliary atresia. Medicine (Baltimore) 2021; 100:e28197. [PMID: 34918678 PMCID: PMC8678012 DOI: 10.1097/md.0000000000028197] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 11/19/2021] [Indexed: 02/05/2023] Open
Abstract
The diagnosis of biliary atresia (BA) remains a clinical challenge, reliable biomarkers that can easily distinguish BA and other forms of intrahepatic cholestasis (IC) are urgently needed.Differentially expressed genes were identified by R software. The least absolute shrinkage and selection operator regression and support vector machine algorithms were used to filter the diagnostic biomarkers of BA. The candidate biomarkers were further validated in another independent cohort of patients with BA and IC. Then CIBERSORT was used for estimating the fractions of immune cell types in BA. Gene set enrichment analyses were conducted and the correlation between diagnostic genes and immune cells was analyzed.A total of 419 differentially expressed genes in BA were detected and 2 genes (secreted phosphoprotein 1 [SPP1] and ankyrin repeat domain [ANKRD1]) among them were selected as diagnostic biomarkers. The SPP1 yielded an area under the curve (AUC) value of 0.798 (95% confidence interval [CI]: 0.742-0.854) to distinguish patients with BA from those with IC, and ANKRD1 exhibited AUC values of 0.686 (95% CI: 0.616-0.754) in discriminating BA patients and those with IC. Further integrating them into one variable resulted in a higher AUC of 0.830 (95% CI: 0.777-0.879). The regulatory T cells, M2 macrophages cells, CD4 memory T cells, and dendritic cells may be involved in the BA process. The ANKRD1 and SPP1 was negatively correlated with regulatory T cells.In conclusion, the ANKRD1 and SPP1 could potentially provide extra guidance in discriminating BA and IC. The immune cell infiltration of BA gives us new insight to explore its pathogenesis.
Collapse
Affiliation(s)
- Meng Kong
- Department of Pediatric Surgery, Qilu Children's Hospital of Shandong University, Jinan, China
| | - Teng Ma
- Department of Internal Medicine, The Fifth People's Hospital of Jinan, Jinan, China
| | - Bo Xiang
- Department of Pediatric Surgery, West China Hospital of Sichuan University, Chengdu, China
| |
Collapse
|
18
|
Qiao L, Xu L, Yu L, Wynn J, Hernan R, Zhou X, Farkouh-Karoleski C, Krishnan US, Khlevner J, De A, Zygmunt A, Crombleholme T, Lim FY, Needelman H, Cusick RA, Mychaliska GB, Warner BW, Wagner AJ, Danko ME, Chung D, Potoka D, Kosiński P, McCulley DJ, Elfiky M, Azarow K, Fialkowski E, Schindel D, Soffer SZ, Lyon JB, Zalieckas JM, Vardarajan BN, Aspelund G, Duron VP, High FA, Sun X, Donahoe PK, Shen Y, Chung WK. Rare and de novo variants in 827 congenital diaphragmatic hernia probands implicate LONP1 as candidate risk gene. Am J Hum Genet 2021; 108:1964-1980. [PMID: 34547244 PMCID: PMC8546037 DOI: 10.1016/j.ajhg.2021.08.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/25/2021] [Indexed: 12/21/2022] Open
Abstract
Congenital diaphragmatic hernia (CDH) is a severe congenital anomaly that is often accompanied by other anomalies. Although the role of genetics in the pathogenesis of CDH has been established, only a small number of disease-associated genes have been identified. To further investigate the genetics of CDH, we analyzed de novo coding variants in 827 proband-parent trios and confirmed an overall significant enrichment of damaging de novo variants, especially in constrained genes. We identified LONP1 (lon peptidase 1, mitochondrial) and ALYREF (Aly/REF export factor) as candidate CDH-associated genes on the basis of de novo variants at a false discovery rate below 0.05. We also performed ultra-rare variant association analyses in 748 affected individuals and 11,220 ancestry-matched population control individuals and identified LONP1 as a risk gene contributing to CDH through both de novo and ultra-rare inherited largely heterozygous variants clustered in the core of the domains and segregating with CDH in affected familial individuals. Approximately 3% of our CDH cohort who are heterozygous with ultra-rare predicted damaging variants in LONP1 have a range of clinical phenotypes, including other anomalies in some individuals and higher mortality and requirement for extracorporeal membrane oxygenation. Mice with lung epithelium-specific deletion of Lonp1 die immediately after birth, most likely because of the observed severe reduction of lung growth, a known contributor to the high mortality in humans. Our findings of both de novo and inherited rare variants in the same gene may have implications in the design and analysis for other genetic studies of congenital anomalies.
Collapse
Affiliation(s)
- Lu Qiao
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Le Xu
- Department of Pediatrics, University of California, San Diego Medical School, San Diego, CA 92093, USA
| | - Lan Yu
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Julia Wynn
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Rebecca Hernan
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Xueya Zhou
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | | | - Usha S Krishnan
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Julie Khlevner
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Aliva De
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Annette Zygmunt
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | | | - Foong-Yen Lim
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Howard Needelman
- University of Nebraska Medical Center College of Medicine, Omaha, NE 68114, USA
| | - Robert A Cusick
- University of Nebraska Medical Center College of Medicine, Omaha, NE 68114, USA
| | | | - Brad W Warner
- Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Amy J Wagner
- Children's Hospital of Wisconsin, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Melissa E Danko
- Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN 37232, USA
| | - Dai Chung
- Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN 37232, USA
| | | | | | - David J McCulley
- Department of Pediatrics, University of Wisconsin-Madison, Madison, WI 52726, USA
| | | | - Kenneth Azarow
- Oregon Health & Science University, Portland, OR 97239, USA
| | | | | | | | - Jane B Lyon
- Department of Radiology, University of Wisconsin-Madison, Madison, WI 53792, USA
| | - Jill M Zalieckas
- Department of Surgery, Boston Children's Hospital, Boston, MA 02115, USA
| | - Badri N Vardarajan
- Department of Neurology, Taub Institute for Research on Alzheimer Disease and the Aging Brain and the Gertrude H. Sergievsky Center, Columbia University, New York, NY 10032, USA
| | - Gudrun Aspelund
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Vincent P Duron
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Frances A High
- Department of Surgery, Boston Children's Hospital, Boston, MA 02115, USA; Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Pediatrics, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Xin Sun
- Department of Pediatrics, University of California, San Diego Medical School, San Diego, CA 92093, USA
| | - Patricia K Donahoe
- Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Surgery, Harvard Medical School, Boston, MA 02115, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA; JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY 10032, USA.
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Medicine, Columbia University Irving Medical Center, New York, NY 10032, USA.
| |
Collapse
|
19
|
Howard FM, Dolezal J, Kochanny S, Schulte J, Chen H, Heij L, Huo D, Nanda R, Olopade OI, Kather JN, Cipriani N, Grossman RL, Pearson AT. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat Commun 2021; 12:4423. [PMID: 34285218 PMCID: PMC8292530 DOI: 10.1038/s41467-021-24698-1] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 07/01/2021] [Indexed: 12/20/2022] Open
Abstract
The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site.
Collapse
Affiliation(s)
- Frederick M Howard
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - James Dolezal
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Sara Kochanny
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jefree Schulte
- Department of Pathology, University of Chicago, Chicago, IL, USA
| | - Heather Chen
- Department of Pathology, University of Chicago, Chicago, IL, USA
| | - Lara Heij
- Department of Surgery and Transplantation, University Hospital RWTH Aachen, Aachen, Germany
- Institute of Pathology, University Hospital RWTH Aachen, Aachen, Germany
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA
| | - Rita Nanda
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA
| | - Olufunmilayo I Olopade
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA
| | - Jakob N Kather
- Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
- Pathology and Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK
- Medical Oncology, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany
| | - Nicole Cipriani
- Department of Pathology, University of Chicago, Chicago, IL, USA
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA
| | - Robert L Grossman
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.
| | - Alexander T Pearson
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.
| |
Collapse
|
20
|
Zhu N, Swietlik EM, Welch CL, Pauciulo MW, Hagen JJ, Zhou X, Guo Y, Karten J, Pandya D, Tilly T, Lutz KA, Martin JM, Treacy CM, Rosenzweig EB, Krishnan U, Coleman AW, Gonzaga-Jauregui C, Lawrie A, Trembath RC, Wilkins MR, Morrell NW, Shen Y, Gräf S, Nichols WC, Chung WK. Rare variant analysis of 4241 pulmonary arterial hypertension cases from an international consortium implicates FBLN2, PDGFD, and rare de novo variants in PAH. Genome Med 2021; 13:80. [PMID: 33971972 PMCID: PMC8112021 DOI: 10.1186/s13073-021-00891-1] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 04/19/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Pulmonary arterial hypertension (PAH) is a lethal vasculopathy characterized by pathogenic remodeling of pulmonary arterioles leading to increased pulmonary pressures, right ventricular hypertrophy, and heart failure. PAH can be associated with other diseases (APAH: connective tissue diseases, congenital heart disease, and others) but often the etiology is idiopathic (IPAH). Mutations in bone morphogenetic protein receptor 2 (BMPR2) are the cause of most heritable cases but the vast majority of other cases are genetically undefined. METHODS To identify new risk genes, we utilized an international consortium of 4241 PAH cases with exome or genome sequencing data from the National Biological Sample and Data Repository for PAH, Columbia University Irving Medical Center, and the UK NIHR BioResource - Rare Diseases Study. The strength of this combined cohort is a doubling of the number of IPAH cases compared to either national cohort alone. We identified protein-coding variants and performed rare variant association analyses in unrelated participants of European ancestry, including 1647 IPAH cases and 18,819 controls. We also analyzed de novo variants in 124 pediatric trios enriched for IPAH and APAH-CHD. RESULTS Seven genes with rare deleterious variants were associated with IPAH with false discovery rate smaller than 0.1: three known genes (BMPR2, GDF2, and TBX4), two recently identified candidate genes (SOX17, KDR), and two new candidate genes (fibulin 2, FBLN2; platelet-derived growth factor D, PDGFD). The new genes were identified based solely on rare deleterious missense variants, a variant type that could not be adequately assessed in either cohort alone. The candidate genes exhibit expression patterns in lung and heart similar to that of known PAH risk genes, and most variants occur in conserved protein domains. For pediatric PAH, predicted deleterious de novo variants exhibited a significant burden compared to the background mutation rate (2.45×, p = 2.5e-5). At least eight novel pediatric candidate genes carrying de novo variants have plausible roles in lung/heart development. CONCLUSIONS Rare variant analysis of a large international consortium identified two new candidate genes-FBLN2 and PDGFD. The new genes have known functions in vasculogenesis and remodeling. Trio analysis predicted that ~ 15% of pediatric IPAH may be explained by de novo variants.
Collapse
Affiliation(s)
- Na Zhu
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Carrie L Welch
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
| | - Michael W Pauciulo
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Jacob J Hagen
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Xueya Zhou
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Yicheng Guo
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Divya Pandya
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Tobias Tilly
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Katie A Lutz
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Jennifer M Martin
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, UK
| | - Carmen M Treacy
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Erika B Rosenzweig
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
| | - Usha Krishnan
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA
| | - Anna W Coleman
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | | | - Allan Lawrie
- Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, UK
| | - Richard C Trembath
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | - Martin R Wilkins
- National Heart & Lung Institute, Imperial College London, London, UK
| | | | | | | | | | - Nicholas W Morrell
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, UK
- Addenbrooke's Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
- Royal Papworth Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Stefan Gräf
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - William C Nichols
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Irving Medical Center, 1150 St. Nicholas Avenue, Room 620, New York, NY, 10032, USA.
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
21
|
Pires JG, da Silva GF, Weyssow T, Conforte AJ, Pagnoncelli D, da Silva FAB, Carels N. Galaxy and MEAN Stack to Create a User-Friendly Workflow for the Rational Optimization of Cancer Chemotherapy. Front Genet 2021; 12:624259. [PMID: 33679888 PMCID: PMC7935533 DOI: 10.3389/fgene.2021.624259] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 01/22/2021] [Indexed: 12/24/2022] Open
Abstract
One aspect of personalized medicine is aiming at identifying specific targets for therapy considering the gene expression profile of each patient individually. The real-world implementation of this approach is better achieved by user-friendly bioinformatics systems for healthcare professionals. In this report, we present an online platform that endows users with an interface designed using MEAN stack supported by a Galaxy pipeline. This pipeline targets connection hubs in the subnetworks formed by the interactions between the proteins of genes that are up-regulated in tumors. This strategy has been proved to be suitable for the inhibition of tumor growth and metastasis in vitro. Therefore, Perl and Python scripts were enclosed in Galaxy for translating RNA-seq data into protein targets suitable for the chemotherapy of solid tumors. Consequently, we validated the process of target diagnosis by (i) reference to subnetwork entropy, (ii) the critical value of density probability of differential gene expression, and (iii) the inhibition of the most relevant targets according to TCGA and GDC data. Finally, the most relevant targets identified by the pipeline are stored in MongoDB and can be accessed through the aforementioned internet portal designed to be compatible with mobile or small devices through Angular libraries.
Collapse
Affiliation(s)
- Jorge Guerra Pires
- Plataforma de Modelagem de Sistemas Biológicos, Center for Technology Development in Health (CDTS), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil
| | - Gilberto Ferreira da Silva
- Plataforma de Modelagem de Sistemas Biológicos, Center for Technology Development in Health (CDTS), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil
| | - Thomas Weyssow
- Informatic Department, Free University of Brussels (ULB), Brussels, Belgium
| | - Alessandra Jordano Conforte
- Plataforma de Modelagem de Sistemas Biológicos, Center for Technology Development in Health (CDTS), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil.,Laboratório de Modelagem Computacional de Sistemas Biológicos, Scientific Computing Program, FIOCRUZ, Rio de Janeiro, Brazil
| | | | - Fabricio Alves Barbosa da Silva
- Laboratório de Modelagem Computacional de Sistemas Biológicos, Scientific Computing Program, FIOCRUZ, Rio de Janeiro, Brazil
| | - Nicolas Carels
- Plataforma de Modelagem de Sistemas Biológicos, Center for Technology Development in Health (CDTS), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil
| |
Collapse
|
22
|
Gargiulo R, Kull T, Fay MF. Effective double-digest RAD sequencing and genotyping despite large genome size. Mol Ecol Resour 2021; 21:1037-1055. [PMID: 33351289 DOI: 10.1111/1755-0998.13314] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/03/2020] [Accepted: 12/14/2020] [Indexed: 11/28/2022]
Abstract
Obtaining informative data is the ambition of any genomic project, but in nonmodel species with very large genomes, pursuing such a goal requires surmounting a series of analytical challenges. Double-digest RAD sequencing is routinely used in nonmodel organisms and offers some control over the volume of data obtained. However, the volume of data recovered is not always an indication of the reliability of data sets, and quality checks are necessary to ensure that true and artefactual information is set apart. In the present study, we aim to fill the gap existing between the known applicability of RAD sequencing methods in plants with large genomes and the use of the retrieved loci for population genetic inference. By analysing two populations of Cypripedium calceolus, a nonmodel orchid species with a large genome size (1C ~ 31.6 Gbp), we provide a complete workflow from library preparation to bioinformatic filtering and inference of genetic diversity and differentiation. We show how filtering strategies to dismiss potentially misleading data need to be explored and adapted to data set-specific features. Moreover, we suggest that the occurrence of organellar sequences in libraries should not be neglected when planning the experiment and analysing the results. Finally, we explain how, in the absence of prior information about the genome of the species, seeking high standards of quality during library preparation and sequencing can provide an insurance against unpredicted technical or biological constraints.
Collapse
Affiliation(s)
| | - Tiiu Kull
- Estonian University of Life Sciences, Tartu, Estonia
| | - Michael F Fay
- Royal Botanic Gardens, Kew, Richmond, Surrey, UK.,School of Biological Sciences, University of Western Australia, Crawley, WA, Australia
| |
Collapse
|
23
|
Corbett RD, Eveleigh R, Whitney J, Barai N, Bourgey M, Chuah E, Johnson J, Moore RA, Moradin N, Mungall KL, Pereira S, Reuter MS, Thiruvahindrapuram B, Wintle RF, Ragoussis J, Strug LJ, Herbrick JA, Aziz N, Jones SJM, Lathrop M, Scherer SW, Staffa A, Mungall AJ. A Distributed Whole Genome Sequencing Benchmark Study. Front Genet 2020; 11:612515. [PMID: 33335541 PMCID: PMC7736078 DOI: 10.3389/fgene.2020.612515] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 11/10/2020] [Indexed: 12/30/2022] Open
Abstract
Population sequencing often requires collaboration across a distributed network of sequencing centers for the timely processing of thousands of samples. In such massive efforts, it is important that participating scientists can be confident that the accuracy of the sequence data produced is not affected by which center generates the data. A study was conducted across three established sequencing centers, located in Montreal, Toronto, and Vancouver, constituting Canada's Genomics Enterprise (www.cgen.ca). Whole genome sequencing was performed at each center, on three genomic DNA replicates from three well-characterized cell lines. Secondary analysis pipelines employed by each site were applied to sequence data from each of the sites, resulting in three datasets for each of four variables (cell line, replicate, sequencing center, and analysis pipeline), for a total of 81 datasets. These datasets were each assessed according to multiple quality metrics including concordance with benchmark variant truth sets to assess consistent quality across all three conditions for each variable. Three-way concordance analysis of variants across conditions for each variable was performed. Our results showed that the variant concordance between datasets differing only by sequencing center was similar to the concordance for datasets differing only by replicate, using the same analysis pipeline. We also showed that the statistically significant differences between datasets result from the analysis pipeline used, which can be unified and updated as new approaches become available. We conclude that genome sequencing projects can rely on the quality and reproducibility of aggregate data generated across a network of distributed sites.
Collapse
Affiliation(s)
- Richard D. Corbett
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| | - Robert Eveleigh
- McGill Genome Centre, McGill University, Montreal, QC, Canada
| | - Joe Whitney
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Namrata Barai
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Mathieu Bourgey
- McGill Genome Centre, McGill University, Montreal, QC, Canada
| | - Eric Chuah
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| | - Joanne Johnson
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| | - Richard A. Moore
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| | - Neda Moradin
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Karen L. Mungall
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| | - Sergio Pereira
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Miriam S. Reuter
- Canada’s Genomics Enterprise (CGEn), The Hospital for Sick Children, Toronto, ON, Canada
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Richard F. Wintle
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | | | - Lisa J. Strug
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Jo-Anne Herbrick
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Naveed Aziz
- Canada’s Genomics Enterprise (CGEn), The Hospital for Sick Children, Toronto, ON, Canada
| | - Steven J. M. Jones
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| | - Mark Lathrop
- McGill Genome Centre, McGill University, Montreal, QC, Canada
| | - Stephen W. Scherer
- The Centre for Applied Genomics, The Hospital for Sick Children and University of Toronto, Toronto, ON, Canada
| | - Alfredo Staffa
- McGill Genome Centre, McGill University, Montreal, QC, Canada
| | - Andrew J. Mungall
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| |
Collapse
|
24
|
Mukerjee S, Gonzalez-Reymundez A, Lunt SY, Vazquez AI. DNA Methylation and Gene Expression with Clinical Covariates Explain Variation in Aggressiveness and Survival of Pancreatic Cancer Patients. Cancer Invest 2020; 38:502-506. [PMID: 32935594 DOI: 10.1080/07357907.2020.1812079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Pancreatic cancer (PC) is associated with a high mortality rate. We explored the interindividual variation of cancer outcomes, attributable to DNA methylation, gene expression, and clinical factors among PC patients. We aim to determine whether we could differentiate subjects with greater nodal involvement, higher cancer staging, and subsequent survival. We modeled every response variable as a function of a linear predictor involving the effects of clinical variables, methylation, and gene expression in a Bayesian framework. Our results highlight the overall importance of wide-spread alterations in methylation and gene expression patterns associated with survival, nodal metastasis, and staging.
Collapse
Affiliation(s)
- Shyamali Mukerjee
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA
| | - Agustin Gonzalez-Reymundez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA.,Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Sophia Y Lunt
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA.,Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan, USA
| | - Ana I Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA.,Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
25
|
Schultze AE, Bennet B, Rae JC, Chiang AY, Frazier K, Katavolos P, McKinney L, Patrick DJ, Tripathi N. Scientific Regulatory Policy Committee Points to Consider*: Nuisance Factors, Block Effects, and Batch Effects in Nonclinical Safety Assessment Studies. Toxicol Pathol 2020; 48:537-548. [PMID: 32122253 DOI: 10.1177/0192623320906385] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Detection of test article-related effects and the determination of the adversity of those changes are the primary goals of nonclinical safety assessment studies for drugs and chemicals in development. During these studies, variables that are not of primary interest to investigators may change and influence data interpretation. These variables, often referred to as "nuisance factors," may influence other groups of data and result in "block or batch effects" that complicate data interpretation. Definitions of the terms "nuisance factors," "block effects," and "batch effects," as they apply to nonclinical safety assessment studies, are reviewed. Multiple case examples of block and batch effects in safety assessment studies are provided, and the challenges these bring to pathology data interpretation are discussed. Methods to mitigate the occurrence of block and batch effects in safety assessment studies, including statistical blocking and utilization of study designs that minimize potential confounding variables, incorporation of adequate randomization, and use of an appropriate number of animals or repeated measurement of specific parameters for increased precision, are reviewed. [Box: see text].
Collapse
|
26
|
Zhu N, Pauciulo MW, Welch CL, Lutz KA, Coleman AW, Gonzaga-Jauregui C, Wang J, Grimes JM, Martin LJ, He H, Shen Y, Chung WK, Nichols WC. Novel risk genes and mechanisms implicated by exome sequencing of 2572 individuals with pulmonary arterial hypertension. Genome Med 2019; 11:69. [PMID: 31727138 PMCID: PMC6857288 DOI: 10.1186/s13073-019-0685-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 11/06/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Group 1 pulmonary arterial hypertension (PAH) is a rare disease with high mortality despite recent therapeutic advances. Pathogenic remodeling of pulmonary arterioles leads to increased pulmonary pressures, right ventricular hypertrophy, and heart failure. Mutations in bone morphogenetic protein receptor type 2 and other risk genes predispose to disease, but the vast majority of non-familial cases remain genetically undefined. METHODS To identify new risk genes, we performed exome sequencing in a large cohort from the National Biological Sample and Data Repository for PAH (PAH Biobank, n = 2572). We then carried out rare deleterious variant identification followed by case-control gene-based association analyses. To control for population structure, only unrelated European cases (n = 1832) and controls (n = 12,771) were used in association tests. Empirical p values were determined by permutation analyses, and the threshold for significance defined by Bonferroni's correction for multiple testing. RESULTS Tissue kallikrein 1 (KLK1) and gamma glutamyl carboxylase (GGCX) were identified as new candidate risk genes for idiopathic PAH (IPAH) with genome-wide significance. We note that variant carriers had later mean age of onset and relatively moderate disease phenotypes compared to bone morphogenetic receptor type 2 variant carriers. We also confirmed the genome-wide association of recently reported growth differentiation factor (GDF2) with IPAH and further implicate T-box 4 (TBX4) with child-onset PAH. CONCLUSIONS We report robust association of novel genes KLK1 and GGCX with IPAH, accounting for ~ 0.4% and 0.9% of PAH Biobank cases, respectively. Both genes play important roles in vascular hemodynamics and inflammation but have not been implicated in PAH previously. These data suggest new genes, pathogenic mechanisms, and therapeutic targets for this lethal vasculopathy.
Collapse
Affiliation(s)
- Na Zhu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Michael W Pauciulo
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue MLC 7016, Cincinnati, OH, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Carrie L Welch
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
| | - Katie A Lutz
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue MLC 7016, Cincinnati, OH, USA
| | - Anna W Coleman
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue MLC 7016, Cincinnati, OH, USA
| | | | - Jiayao Wang
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Joseph M Grimes
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
| | - Lisa J Martin
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue MLC 7016, Cincinnati, OH, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Hua He
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue MLC 7016, Cincinnati, OH, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York, NY, USA
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - William C Nichols
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue MLC 7016, Cincinnati, OH, USA.
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA.
| |
Collapse
|
27
|
Rasnic R, Brandes N, Zuk O, Linial M. Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants. BMC Cancer 2019; 19:783. [PMID: 31391007 PMCID: PMC6686424 DOI: 10.1186/s12885-019-5994-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND In recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from > 10,000 patients. METHODS Our hypothesis in this study is that whole exome sequences from blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity. RESULTS We report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants. CONCLUSION TCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.
Collapse
Affiliation(s)
- Roni Rasnic
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Or Zuk
- Department of Statistics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
28
|
Fitzgerald JC, Zimprich A, Reddy Bobbili D, Sharma M, May P, Krüger R. Reply: No evidence for rare TRAP1 mutations influencing the risk of idiopathic Parkinson's disease. Brain 2019; 141:e17. [PMID: 29373630 DOI: 10.1093/brain/awx380] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Affiliation(s)
- Julia C Fitzgerald
- Department of Neurodegenerative Diseases, Center of Neurology and Hertie-Institute for Clinical Brain Research, University of Tübingen and German Centre for Neurodegenerative Diseases, Tübingen, Germany
| | | | - Dheeraj Reddy Bobbili
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Manu Sharma
- Centre for Genetic Epidemiology, Institute for Clinical Epidemiology and Applied Biometry, University of Tübingen, Germany
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rejko Krüger
- Department of Neurodegenerative Diseases, Center of Neurology and Hertie-Institute for Clinical Brain Research, University of Tübingen and German Centre for Neurodegenerative Diseases, Tübingen, Germany.,Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg.,Parkinson Research Clinic, Centre Hospitalier de Luxembourg (CHL), Luxembourg
| |
Collapse
|
29
|
Varma M, Paskov KM, Jung JY, Chrisman BS, Stockham NT, Washington PY, Wall DP. Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:260-271. [PMID: 30864328 PMCID: PMC6417813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ℓ1-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders.
Collapse
Affiliation(s)
- Maya Varma
- Departments of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Kelley Marie Paskov
- Departments of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Jae-Yoon Jung
- Departments of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Departments of Pediatrics, Stanford University, Stanford, CA 94305, USA
| | | | | | | | - Dennis Paul Wall
- Departments of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Departments of Pediatrics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
30
|
Dennin RH. Overlooked: Extrachromosomal DNA and Their Possible Impact on Whole Genome Sequencing. Malays J Med Sci 2018; 25:20-26. [PMID: 30918452 PMCID: PMC6422590 DOI: 10.21315/mjms2018.25.2.3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 01/05/2018] [Indexed: 02/08/2023] Open
Abstract
Extrachromosomal (ec) DNA in eukaryotic cells has been known for decades. The structures described range from linear double stranded (ds) DNA to circular dsDNA, distinct from mitochondrial (mt) DNA. The sizes of circular forms are described from some hundred base pairs (bp) up to more than 150 kbp. The number of molecules per cell ranges from several hundred to a thousand. Semi-quantitative determinations of circular dsDNA show proportions as high as several percentages of the total DNA per cell. These ecDNA fractions harbor sequences that are known to be present in chromosomal DNA (chrDNA) too. Sequencing projects on, for example the human genome, have to take into account the ecDNA sequences which are simultaneously ascertained; corrections cannot be performed retrospectively. Concerning the results of sequencings derived from extracted whole DNA: if the ecDNA fractions contained therein are not taken into account, erroneous conclusions at the chromosomal level may result.
Collapse
Affiliation(s)
- Reinhard H Dennin
- Department of Infectious Diseases and Microbiology, University of Luebeck, UKSH, Campus Luebeck, D-23538 Luebeck, Germany
| |
Collapse
|
31
|
Beaudry FEG, Barrett SCH, Wright SI. Genomic Loss and Silencing on the Y Chromosomes of Rumex. Genome Biol Evol 2017; 9:3345-3355. [PMID: 29211839 PMCID: PMC5737746 DOI: 10.1093/gbe/evx254] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2017] [Indexed: 12/19/2022] Open
Abstract
Across many unrelated lineages of plants and animals, Y chromosomes show a recurrent pattern of gene degeneration and loss, but the relative importance of inefficient selection, adaptive gene silencing, and neutral genetic drift in causing degeneration remain poorly understood. Here, we use next-generation genome and transcriptome sequencing to investigate patterns of ongoing Y chromosome degeneration in two annual plant species of Rumex (Polygonaceae) differing in their degree of degeneration and sex chromosome heteromorphism. We find evidence for both gene loss as well as silencing in these young plant sex chromosomes. Our analyses revealed significantly more gene deletion relative to silencing in R. rothschildianus, which has had a larger nonrecombining region for a longer period than R. hastatulus, consistent with this system being at a more advanced stage of degeneration. Intra- and interspecific comparisons of genomic coverage and heterozygosity indicated that loss of expression precedes gene deletion, implying that the final stages of mutation accumulation and gene loss may often occur neutrally. We found no evidence for adaptive silencing of genes that have lost expression. Our results suggest that the initial spread of deleterious regulatory variants and/or epigenetic silencing may be an important driver of early degeneration of Y chromosomes.
Collapse
Affiliation(s)
- Felix E G Beaudry
- Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada
| | - Spencer C H Barrett
- Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada
| | - Stephen I Wright
- Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada
| |
Collapse
|