1
|
Rutter LA, MacKay MJ, Cope H, Szewczyk NJ, Kim J, Overbey E, Tierney BT, Muratani M, Lamm B, Bezdan D, Paul AM, Schmidt MA, Church GM, Giacomello S, Mason CE. Protective alleles and precision healthcare in crewed spaceflight. Nat Commun 2024; 15:6158. [PMID: 39039045 PMCID: PMC11263583 DOI: 10.1038/s41467-024-49423-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 06/05/2024] [Indexed: 07/24/2024] Open
Abstract
Common and rare alleles are now being annotated across millions of human genomes, and omics technologies are increasingly being used to develop health and treatment recommendations. However, these alleles have not yet been systematically characterized relative to aerospace medicine. Here, we review published alleles naturally found in human cohorts that have a likely protective effect, which is linked to decreased cancer risk and improved bone, muscular, and cardiovascular health. Although some technical and ethical challenges remain, research into these protective mechanisms could translate into improved nutrition, exercise, and health recommendations for crew members during deep space missions.
Collapse
Affiliation(s)
- Lindsay A Rutter
- Transborder Medical Research Center, University of Tsukuba, Ibaraki, 305-8575, Japan
- Department of Genome Biology, Institute of Medicine, University of Tsukuba, Ibaraki, 305-8575, Japan
- School of Chemistry, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Matthew J MacKay
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Henry Cope
- School of Medicine, University of Nottingham, Nottingham, DE22 3DT, UK
| | - Nathaniel J Szewczyk
- School of Medicine, University of Nottingham, Nottingham, DE22 3DT, UK
- Ohio Musculoskeletal and Neurological Institute (OMNI), Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, 45701, USA
| | - JangKeun Kim
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Eliah Overbey
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Braden T Tierney
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Masafumi Muratani
- Transborder Medical Research Center, University of Tsukuba, Ibaraki, 305-8575, Japan
- Department of Genome Biology, Institute of Medicine, University of Tsukuba, Ibaraki, 305-8575, Japan
| | - Ben Lamm
- Colossal Biosciences, 1401 Lavaca St, Unit #155 Austin, Austin, TX, 78701, USA
| | - Daniela Bezdan
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, Germany
- Yuri GmbH, Meckenbeuren, Germany
| | - Amber M Paul
- Embry-Riddle Aeronautical University, Department of Human Factors and Behavioral Neurobiology, Daytona Beach, FL, 32114, USA
| | - Michael A Schmidt
- Sovaris Aerospace, Boulder, CO, 80302, USA.
- Advanced Pattern Analysis & Human Performance Group, Boulder, CO, 80302, USA.
| | - George M Church
- GC Therapeutics Inc, Cambridge, MA, 02139, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, MA, 02115, USA.
| | | | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA.
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, 10065, USA.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, MA, 02115, USA.
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, 10065, USA.
| |
Collapse
|
2
|
Eynard SE, Klopp C, Canale-Tabet K, Marande W, Vandecasteele C, Roques C, Donnadieu C, Boone Q, Servin B, Vignal A. The black honey bee genome: insights on specific structural elements and a first step towards pangenomes. Genet Sel Evol 2024; 56:51. [PMID: 38943059 PMCID: PMC11212449 DOI: 10.1186/s12711-024-00917-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 06/04/2024] [Indexed: 07/01/2024] Open
Abstract
BACKGROUND The honey bee reference genome, HAv3.1, was produced from a commercial line sample that was thought to have a largely dominant Apis mellifera ligustica genetic background. Apis mellifera mellifera, often referred to as the black bee, has a separate evolutionary history and is the original type in western and northern Europe. Growing interest in this subspecies for conservation and non-professional apicultural practices, together with the necessity of deciphering genome backgrounds in hybrids, triggered the necessity for a specific genome assembly. Moreover, having several high-quality genomes is becoming key for taking structural variations into account in pangenome analyses. RESULTS Pacific Bioscience technology long reads were produced from a single haploid black bee drone. Scaffolding contigs into chromosomes was done using a high-density genetic map. This allowed for re-estimation of the recombination rate, which was over-estimated in some previous studies due to mis-assemblies, which resulted in spurious inversions in the older reference genomes. The sequence continuity obtained was very high and the only limit towards continuous chromosome-wide sequences seemed to be due to tandem repeat arrays that were usually longer than 10 kb and that belonged to two main families, the 371 and 91 bp repeats, causing problems in the assembly process due to high internal sequence similarity. Our assembly was used together with the reference genome to genotype two structural variants by a pangenome graph approach with Graphtyper2. Genotypes obtained were either correct or missing, when compared to an approach based on sequencing depth analysis, and genotyping rates were 89 and 76% for the two variants. CONCLUSIONS Our new assembly for the Apis mellifera mellifera honey bee subspecies demonstrates the utility of multiple high-quality genomes for the genotyping of structural variants, with a test case on two insertions and deletions. It will therefore be an invaluable resource for future studies, for instance by including structural variants in GWAS. Having used a single haploid drone for sequencing allowed a refined analysis of very large tandem repeat arrays, raising the question of their function in the genome. High quality genome assemblies for multiple subspecies such as presented here, are crucial for emerging projects using pangenomes.
Collapse
Affiliation(s)
- Sonia E Eynard
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | | | - Kamila Canale-Tabet
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | | | | | - Céline Roques
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | | | - Quentin Boone
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
- Sigenae, MIAT, INRAE, Castanet Tolosan, France
| | - Bertrand Servin
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | - Alain Vignal
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France.
| |
Collapse
|
3
|
Cope H, Elsborg J, Demharter S, McDonald JT, Wernecke C, Parthasarathy H, Unadkat H, Chatrathi M, Claudio J, Reinsch S, Avci P, Zwart SR, Smith SM, Heer M, Muratani M, Meydan C, Overbey E, Kim J, Chin CR, Park J, Schisler JC, Mason CE, Szewczyk NJ, Willis CRG, Salam A, Beheshti A. Transcriptomics analysis reveals molecular alterations underpinning spaceflight dermatology. COMMUNICATIONS MEDICINE 2024; 4:106. [PMID: 38862781 PMCID: PMC11166967 DOI: 10.1038/s43856-024-00532-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open
Abstract
BACKGROUND Spaceflight poses a unique set of challenges to humans and the hostile spaceflight environment can induce a wide range of increased health risks, including dermatological issues. The biology driving the frequency of skin issues in astronauts is currently not well understood. METHODS To address this issue, we used a systems biology approach utilizing NASA's Open Science Data Repository (OSDR) on space flown murine transcriptomic datasets focused on the skin, biochemical profiles of 50 NASA astronauts and human transcriptomic datasets generated from blood and hair samples of JAXA astronauts, as well as blood samples obtained from the NASA Twins Study, and skin and blood samples from the first civilian commercial mission, Inspiration4. RESULTS Key biological changes related to skin health, DNA damage & repair, and mitochondrial dysregulation are identified as potential drivers for skin health risks during spaceflight. Additionally, a machine learning model is utilized to determine gene pairings associated with spaceflight response in the skin. While we identified spaceflight-induced dysregulation, such as alterations in genes associated with skin barrier function and collagen formation, our results also highlight the remarkable ability for organisms to re-adapt back to Earth via post-flight re-tuning of gene expression. CONCLUSION Our findings can guide future research on developing countermeasures for mitigating spaceflight-associated skin damage.
Collapse
Affiliation(s)
- Henry Cope
- School of Medicine, University of Nottingham, Derby, DE22 3DT, UK
| | - Jonas Elsborg
- Department of Energy Conversion and Storage, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
- Abzu, Copenhagen, 2150, Denmark
| | | | - J Tyson McDonald
- Department of Radiation Medicine, School of Medicine, Georgetown University, Washington D.C., WA, 20057, USA
| | - Chiara Wernecke
- NASA GeneLab For High Schools Program (GL4HS), Space Biology Program, NASA Ames Research Center, Moffett Field, CA, USA
- Department of Aerospace and Geodesy, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Hari Parthasarathy
- NASA GeneLab For High Schools Program (GL4HS), Space Biology Program, NASA Ames Research Center, Moffett Field, CA, USA
- College of Engineering and Haas School of Business, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Hriday Unadkat
- NASA GeneLab For High Schools Program (GL4HS), Space Biology Program, NASA Ames Research Center, Moffett Field, CA, USA
- School of Engineering and Applied Science, Princeton University, Princeton, NJ, 08540, USA
| | - Mira Chatrathi
- NASA GeneLab For High Schools Program (GL4HS), Space Biology Program, NASA Ames Research Center, Moffett Field, CA, USA
- College of Letters and Science, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Jennifer Claudio
- NASA GeneLab For High Schools Program (GL4HS), Space Biology Program, NASA Ames Research Center, Moffett Field, CA, USA
- Blue Marble Space Institute of Science, Space Biosciences Division, NASA Ames Research Center, Moffett field, CA, USA
| | - Sigrid Reinsch
- NASA GeneLab For High Schools Program (GL4HS), Space Biology Program, NASA Ames Research Center, Moffett Field, CA, USA
- Space Biosciences Division, NASA Ames Research Center, Moffett field, CA, USA
| | - Pinar Avci
- Department of Dermatology and Allergy, University Hospital, LMU Munich, 80337, Munich, Germany
| | - Sara R Zwart
- University of Texas Medical Branch, Galveston, TX, USA
| | - Scott M Smith
- Biomedical Research and Environmental Sciences Division, Human Health and Performance Directorate, NASA Johnson Space Center, Houston, TX, 77058, USA
| | - Martina Heer
- IU International University of Applied Sciences, Erfurt and University of Bonn, Bonn, Germany
| | - Masafumi Muratani
- Transborder Medical Research Center, University of Tsukuba, Ibaraki, 305-8575, Japan
- Department of Genome Biology, Institute of Medicine, University of Tsukuba, Ibaraki, 305-8575, Japan
| | - Cem Meydan
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
| | - Eliah Overbey
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
| | - Jangkeun Kim
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
| | - Christopher R Chin
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
| | - Jiwoon Park
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Jonathan C Schisler
- McAllister Heart Institute and Department of Pharmacology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Christopher E Mason
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Nathaniel J Szewczyk
- School of Medicine, University of Nottingham, Derby, DE22 3DT, UK
- Ohio Musculoskeletal and Neurological Institute, Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, 45701, USA
| | - Craig R G Willis
- School of Chemistry and Biosciences, Faculty of Life Sciences, University of Bradford, Bradford, BD7 1DP, UK
| | - Amr Salam
- St John's Institute of Dermatology, King's College London, Guy's and St Thomas' NHS Foundation Trust, Guy's Hospital, Great Maze Pond, London, SE1 9RT, UK
| | - Afshin Beheshti
- Blue Marble Space Institute of Science, Space Biosciences Division, NASA Ames Research Center, Moffett field, CA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
4
|
Seylani A, Galsinh AS, Tasoula A, I AR, Camera A, Calleja-Agius J, Borg J, Goel C, Kim J, Clark KB, Das S, Arif S, Boerrigter M, Coffey C, Szewczyk N, Mason CE, Manoli M, Karouia F, Schwertz H, Beheshti A, Tulodziecki D. Ethical considerations for the age of non-governmental space exploration. Nat Commun 2024; 15:4774. [PMID: 38862473 PMCID: PMC11166968 DOI: 10.1038/s41467-023-44357-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 12/05/2023] [Indexed: 06/13/2024] Open
Abstract
Mounting ambitions and capabilities for public and private, non-government sector crewed space exploration bring with them an increasingly diverse set of space travelers, raising new and nontrivial ethical, legal, and medical policy and practice concerns which are still relatively underexplored. In this piece, we lay out several pressing issues related to ethical considerations for selecting space travelers and conducting human subject research on them, especially in the context of non-governmental and commercial/private space operations.
Collapse
Affiliation(s)
- Allen Seylani
- School of Medicine, University of California, Riverside. 92521 Botanical Garden Dr, Riverside, CA, 92507, USA
| | - Aman Singh Galsinh
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, AB24 3FX, UK
| | - Alexia Tasoula
- Department of Life Science Engineering, FH Technikum, Vienna, Austria
- Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, USA
| | - Anu R I
- Department of Cancer Biology and Therapeutics, MVR Cancer Centre and Research Institute, Calicut, India
- Department of Clinical Biochemistry, MVR Cancer Centre and Research Institute, Calicut, India
| | - Andrea Camera
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Jean Calleja-Agius
- Department of Anatomy, Faculty of Medicine and Surgery, University of Malta, MSD2080, Msida, Malta
| | - Joseph Borg
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, MSD2080, Msida, Malta
| | - Chirag Goel
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - JangKeun Kim
- Department of Physiology & Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Kevin B Clark
- Cures Within Reach, Chicago, IL, 60602, USA
- Peace Innovation Institute, The Hague 2511, Netherlands & Stanford University, Palo Alto, CA, 94305, USA
- Biometrics and Nanotechnology Councils, Institute for Electrical and Electronics Engineers, New York, NY, 10016-5997, USA
| | - Saswati Das
- Department of Biochemistry, Atal Bihari Vajpayee Institute of Medical Sciences, New Delhi, India
| | - Shehbeel Arif
- Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Caroline Coffey
- Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, USA
| | - Nathaniel Szewczyk
- Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, USA
| | - Christopher E Mason
- Department of Physiology & Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Maria Manoli
- School of Law, University of Aberdeen, Aberdeen, AB24 3UB, UK
| | - Fathi Karouia
- Blue Marble Space Institute for Science, Exobiology Branch, NASA Ames Research Center, Moffett Field, CA, USA
- Space Research Within Reach, San Francisco, CA, USA
- Center for Space Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Hansjörg Schwertz
- Molecular Medicine Program at the University of Utah, Salt Lake City, UT, 84112, USA.
- Division of Occupational Medicine at the University of Utah, Salt Lake City, UT, 84112, USA.
- Occupational Medicine at Billings Clinic Bozeman, Bozeman, MT, 59715, USA.
| | - Afshin Beheshti
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Blue Marble Space Institute of Science, Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA, US.
| | - Dana Tulodziecki
- Department of Philosophy, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
5
|
Spillmann RC, Tan QKG, Reuter C, Schoch K, Kohler J, Bonner D, Zastrow D, Alkelai A, Baugh E, Cope H, Marwaha S, Wheeler MT, Bernstein JA, Shashi V. A concurrent dual analysis of genomic data augments diagnoses: Experiences of 2 clinical sites in the Undiagnosed Diseases Network. Genet Med 2023; 25:100353. [PMID: 36481303 PMCID: PMC10506157 DOI: 10.1016/j.gim.2022.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 11/28/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
PURPOSE Next-generation sequencing (NGS) has revolutionized the diagnostic process for rare/ultrarare conditions. However, diagnosis rates differ between analytical pipelines. In the National Institutes of Health-Undiagnosed Diseases Network (UDN) study, each individual's NGS data are concurrently analyzed by the UDN sequencing core laboratory and the clinical sites. We examined the outcomes of this practice. METHODS A retrospective review was performed at 2 UDN clinical sites to compare the variants and diagnoses/candidate genes identified with the dual analyses of the NGS data. RESULTS In total, 95 individuals had 100 diagnoses/candidate genes. There was 59% concordance between the UDN sequencing core laboratories and the clinical sites in identifying diagnoses/candidate genes. The core laboratory provided more diagnoses, whereas the clinical sites prioritized more research variants/candidate genes (P < .001). The clinical sites solely identified 15% of the diagnoses/candidate genes. The differences between the 2 pipelines were more often because of variant prioritization disparities than variant detection. CONCLUSION The unique dual analysis of NGS data in the UDN synergistically enhances outcomes. The core laboratory provided a clinical analysis with more diagnoses and the clinical sites prioritized more research variants/candidate genes. Implementing such concurrent dual analyses in other genomic research studies and clinical settings can improve both variant detection and prioritization.
Collapse
Affiliation(s)
- Rebecca C Spillmann
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC
| | - Queenie K-G Tan
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC
| | - Chloe Reuter
- Stanford Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA; Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Kelly Schoch
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC
| | - Jennefer Kohler
- Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Devon Bonner
- Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Diane Zastrow
- Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Anna Alkelai
- Institute for Genome Medicine, Columbia University Medical Center, New York, NY
| | - Evan Baugh
- Institute for Genome Medicine, Columbia University Medical Center, New York, NY
| | - Heidi Cope
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC
| | - Shruti Marwaha
- Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Matthew T Wheeler
- Stanford Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA; Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Jonathan A Bernstein
- Stanford Center for Undiagnosed Diseases, Stanford University, and Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | - Vandana Shashi
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC.
| |
Collapse
|
6
|
Park J, Kim J, Lewy T, Rice CM, Elemento O, Rendeiro AF, Mason CE. Spatial omics technologies at multimodal and single cell/subcellular level. Genome Biol 2022; 23:256. [PMID: 36514162 PMCID: PMC9746133 DOI: 10.1186/s13059-022-02824-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open
Abstract
Spatial omics technologies enable a deeper understanding of cellular organizations and interactions within a tissue of interest. These assays can identify specific compartments or regions in a tissue with differential transcript or protein abundance, delineate their interactions, and complement other methods in defining cellular phenotypes. A variety of spatial methodologies are being developed and commercialized; however, these techniques differ in spatial resolution, multiplexing capability, scale/throughput, and coverage. Here, we review the current and prospective landscape of single cell to subcellular resolution spatial omics technologies and analysis tools to provide a comprehensive picture for both research and clinical applications.
Collapse
Affiliation(s)
- Jiwoon Park
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Junbum Kim
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
| | - Tyler Lewy
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Charles M Rice
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Olivier Elemento
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - André F Rendeiro
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Christopher E Mason
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
7
|
Muñoz-Barrera A, Rubio-Rodríguez LA, Díaz-de Usera A, Jáspez D, Lorenzo-Salazar JM, González-Montelongo R, García-Olivares V, Flores C. From Samples to Germline and Somatic Sequence Variation: A Focus on Next-Generation Sequencing in Melanoma Research. Life (Basel) 2022; 12:1939. [PMID: 36431075 PMCID: PMC9695713 DOI: 10.3390/life12111939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/12/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open
Abstract
Next-generation sequencing (NGS) applications have flourished in the last decade, permitting the identification of cancer driver genes and profoundly expanding the possibilities of genomic studies of cancer, including melanoma. Here we aimed to present a technical review across many of the methodological approaches brought by the use of NGS applications with a focus on assessing germline and somatic sequence variation. We provide cautionary notes and discuss key technical details involved in library preparation, the most common problems with the samples, and guidance to circumvent them. We also provide an overview of the sequence-based methods for cancer genomics, exposing the pros and cons of targeted sequencing vs. exome or whole-genome sequencing (WGS), the fundamentals of the most common commercial platforms, and a comparison of throughputs and key applications. Details of the steps and the main software involved in the bioinformatics processing of the sequencing results, from preprocessing to variant prioritization and filtering, are also provided in the context of the full spectrum of genetic variation (SNVs, indels, CNVs, structural variation, and gene fusions). Finally, we put the emphasis on selected bioinformatic pipelines behind (a) short-read WGS identification of small germline and somatic variants, (b) detection of gene fusions from transcriptomes, and (c) de novo assembly of genomes from long-read WGS data. Overall, we provide comprehensive guidance across the main methodological procedures involved in obtaining sequencing results for the most common short- and long-read NGS platforms, highlighting key applications in melanoma research.
Collapse
Affiliation(s)
- Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Ana Díaz-de Usera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, 38010 Santa Cruz de Tenerife, Spain
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Rafaela González-Montelongo
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Víctor García-Olivares
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, 38010 Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, 35450 Las Palmas de Gran Canaria, Spain
| |
Collapse
|
8
|
Xiao C, Chen Z, Chen W, Padilla C, Colgan M, Wu W, Fang LT, Liu T, Yang Y, Schneider V, Wang C, Xiao W. Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples. Genome Biol 2022; 23:237. [PMID: 36352452 PMCID: PMC9648002 DOI: 10.1186/s13059-022-02803-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The use of a personalized haplotype-specific genome assembly, rather than an unrelated, mosaic genome like GRCh38, as a reference for detecting the full spectrum of somatic events from cancers has long been advocated but has never been explored in tumor-normal paired samples. Here, we provide the first demonstrated use of de novo assembled personalized genome as a reference for cancer mutation detection and quantifying the effects of the reference genomes on the accuracy of somatic mutation detection. RESULTS We generate de novo assemblies of the first tumor-normal paired genomes, both nuclear and mitochondrial, derived from the same individual with triple negative breast cancer. The personalized genome was chromosomal scale, haplotype phased, and annotated. We demonstrate that it provides individual specific haplotypes for complex regions and medically relevant genes. We illustrate that the personalized genome reference not only improves read alignments for both short-read and long-read sequencing data but also ameliorates the detection accuracy of somatic SNVs and SVs. We identify the equivalent somatic mutation calls between two genome references and uncover novel somatic mutations only when personalized genome assembly is used as a reference. CONCLUSIONS Our findings demonstrate that use of a personalized genome with individual-specific haplotypes is essential for accurate detection of the full spectrum of somatic mutations in the paired tumor-normal samples. The unique resource and methodology established in this study will be beneficial to the development of precision oncology medicine not only for breast cancer, but also for other cancers.
Collapse
Affiliation(s)
- Chunlin Xiao
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Zhong Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wanqiu Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Cory Padilla
- grid.504403.6Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066 USA
| | - Michael Colgan
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| | - Wenjun Wu
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Li-Tai Fang
- grid.418158.10000 0004 0534 4718Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Road, Belmont, CA 94002 USA
| | - Tiantian Liu
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Yibin Yang
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Valerie Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Charles Wang
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wenming Xiao
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| |
Collapse
|
9
|
Hertzog A, Selvanathan A, Devanapalli B, Ho G, Bhattacharya K, Tolun AA. A narrative review of metabolomics in the era of "-omics": integration into clinical practice for inborn errors of metabolism. Transl Pediatr 2022; 11:1704-1716. [PMID: 36345452 PMCID: PMC9636448 DOI: 10.21037/tp-22-105] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 08/23/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND AND OBJECTIVE Traditional targeted metabolomic investigations identify a pre-defined list of analytes in samples and have been widely used for decades in the diagnosis and monitoring of inborn errors of metabolism (IEMs). Recent technological advances have resulted in the development and maturation of untargeted metabolomics: a holistic, unbiased, analytical approach to detecting metabolic disturbances in human disease. We aim to provide a summary of untargeted metabolomics [focusing on tandem mass spectrometry (MS-MS)] and its application in the field of IEMs. METHODS Data for this review was identified through a literature search using PubMed, Google Scholar, and personal repositories of articles collected by the authors. Findings are presented within several sections describing the metabolome, the current use of targeted metabolomics in the diagnostic pathway of patients with IEMs, the more recent integration of untargeted metabolomics into clinical care, and the limitations of this newly employed analytical technique. KEY CONTENT AND FINDINGS Untargeted metabolomic investigations are increasingly utilized in screening for rare disorders, improving understanding of cellular and subcellular physiology, discovering novel biomarkers, monitoring therapy, and functionally validating genomic variants. Although the untargeted metabolomic approach has some limitations, this "next generation metabolic screening" platform is becoming increasingly affordable and accessible. CONCLUSIONS When used in conjunction with genomics and the other promising "-omic" technologies, untargeted metabolomics has the potential to revolutionize the diagnostics of IEMs (and other rare disorders), improving both clinical and health economic outcomes.
Collapse
Affiliation(s)
- Ashley Hertzog
- NSW Biochemical Genetics Service, The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - Arthavan Selvanathan
- Genetic Metabolic Disorders Service, The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - Beena Devanapalli
- NSW Biochemical Genetics Service, The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - Gladys Ho
- Sydney Genome Diagnostics, The Children's Hospital at Westmead, Westmead, NSW, Australia.,Specialty of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Kaustuv Bhattacharya
- Genetic Metabolic Disorders Service, The Children's Hospital at Westmead, Westmead, NSW, Australia.,Specialty of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Adviye Ayper Tolun
- NSW Biochemical Genetics Service, The Children's Hospital at Westmead, Westmead, NSW, Australia.,Specialty of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
10
|
Tetikol HS, Turgut D, Narci K, Budak G, Kalay O, Arslan E, Demirkaya-Budak S, Dolgoborodov A, Kabakci-Zorlu D, Semenyuk V, Jain A, Davis-Dusenbery BN. Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis. Nat Commun 2022; 13:4384. [PMID: 35927245 PMCID: PMC9352875 DOI: 10.1038/s41467-022-31724-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 06/30/2022] [Indexed: 11/29/2022] Open
Abstract
Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.
Collapse
Affiliation(s)
| | | | - Kubra Narci
- Seven Bridges Genomics, Charlestown, MA, USA
| | | | - Ozem Kalay
- Seven Bridges Genomics, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Charlestown, MA, USA
| | | |
Collapse
|
11
|
Kaminow B, Ballouz S, Gillis J, Dobin A. Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses. Genome Res 2022; 32:738-749. [PMID: 35256454 PMCID: PMC8997357 DOI: 10.1101/gr.275613.121] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 03/02/2022] [Indexed: 11/25/2022]
Abstract
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. In order to find the best haploid genome representation, we constructed consensus genomes at the pan-human, super-population, and population levels, utilizing variant information from the 1000 Genomes Project. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of ~2-3 when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase overusing the pan-human consensus, suggesting a limit in the utility of incorporating more specific genomic variation. Replacing reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
Collapse
Affiliation(s)
- Benjamin Kaminow
- Cold Spring Harbor Laboratory; Weill Cornell Graduate School of Medical Sciences
| | - Sara Ballouz
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research; School of Medical Sciences, University of New South Wales; Cold Spring Harbor Laboratory
| | | | | |
Collapse
|
12
|
Moore KJM, Cahill J, Aidelberg G, Aronoff R, Bektaş A, Bezdan D, Butler DJ, Chittur SV, Codyre M, Federici F, Tanner NA, Tighe SW, True R, Ware SB, Wyllie AL, Afshin EE, Bendesky A, Chang CB, Dela Rosa R, Elhaik E, Erickson D, Goldsborough AS, Grills G, Hadasch K, Hayden A, Her SY, Karl JA, Kim CH, Kriegel AJ, Kunstman T, Landau Z, Land K, Langhorst BW, Lindner AB, Mayer BE, McLaughlin LA, McLaughlin MT, Molloy J, Mozsary C, Nadler JL, D'Silva M, Ng D, O'Connor DH, Ongerth JE, Osuolale O, Pinharanda A, Plenker D, Ranjan R, Rosbash M, Rotem A, Segarra J, Schürer S, Sherrill-Mix S, Solo-Gabriele H, To S, Vogt MC, Yu AD, Mason CE. Loop-Mediated Isothermal Amplification Detection of SARS-CoV-2 and Myriad Other Applications. J Biomol Tech 2021; 32:228-275. [PMID: 35136384 PMCID: PMC8802757 DOI: 10.7171/jbt.21-3203-017] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
As the second year of the COVID-19 pandemic begins, it remains clear that a massive increase in the ability to test for SARS-CoV-2 infections in a myriad of settings is critical to controlling the pandemic and to preparing for future outbreaks. The current gold standard for molecular diagnostics is the polymerase chain reaction (PCR), but the extraordinary and unmet demand for testing in a variety of environments means that both complementary and supplementary testing solutions are still needed. This review highlights the role that loop-mediated isothermal amplification (LAMP) has had in filling this global testing need, providing a faster and easier means of testing, and what it can do for future applications, pathogens, and the preparation for future outbreaks. This review describes the current state of the art for research of LAMP-based SARS-CoV-2 testing, as well as its implications for other pathogens and testing. The authors represent the global LAMP (gLAMP) Consortium, an international research collective, which has regularly met to share their experiences on LAMP deployment and best practices; sections are devoted to all aspects of LAMP testing, including preanalytic sample processing, target amplification, and amplicon detection, then the hardware and software required for deployment are discussed, and finally, a summary of the current regulatory landscape is provided. Included as well are a series of first-person accounts of LAMP method development and deployment. The final discussion section provides the reader with a distillation of the most validated testing methods and their paths to implementation. This review also aims to provide practical information and insight for a range of audiences: for a research audience, to help accelerate research through sharing of best practices; for an implementation audience, to help get testing up and running quickly; and for a public health, clinical, and policy audience, to help convey the breadth of the effect that LAMP methods have to offer.
Collapse
Affiliation(s)
- Keith J M Moore
- School of Science and Engineering, Ateneo de Manila University, Quezon City 1108, Philippines
| | | | - Guy Aidelberg
- Université de Paris, INSERM U1284, Center for Research and Interdisciplinarity (CRI), 75006 Paris, France
- Just One Giant Lab, Centre de Recherches Interdisciplinaires (CRI), 75004 Paris, France
| | - Rachel Aronoff
- Just One Giant Lab, Centre de Recherches Interdisciplinaires (CRI), 75004 Paris, France
- Action for Genomic Integrity Through Research! (AGiR!), Lausanne, Switzerland
- Association Hackuarium, Lausanne, Switzerland
| | - Ali Bektaş
- Oakland Genomics Center, Oakland, CA 94609, USA
| | - Daniela Bezdan
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany
- NGS Competence Center Tübingen (NCCT), University of Tübingen, 72076 Tübingen, Germany
- Poppy Health, Inc, San Francisco, CA 94158, USA
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital, 72076 Tübingen, Germany
| | - Daniel J Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Sridar V Chittur
- Center for Functional Genomics, Department of Biomedical Sciences, School of Public Health, University at Albany, State University of New York, Rensselaer, 12222, USA
| | - Martin Codyre
- GiantLeap Biotechnology Ltd, Wicklow A63 Kv91, Ireland
| | - Fernan Federici
- ANID, Millennium Science Initiative Program, Millennium Institute for Integrative Biology (iBio), Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile
| | | | | | - Randy True
- FloodLAMP Biotechnologies, San Carlos, CA 94070, USA
| | - Sarah B Ware
- Just One Giant Lab, Centre de Recherches Interdisciplinaires (CRI), 75004 Paris, France
- BioBlaze Community Bio Lab, 1800 W Hawthorne Ln, Ste J-1, West Chicago, IL 60185, USA
- Blossom Bio Lab, 1800 W Hawthorne Ln, Ste K-2, West Chicago, IL 60185, USA
| | - Anne L Wyllie
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT 06510, USA
| | - Evan E Afshin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA
| | - Andres Bendesky
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY 10027, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Connie B Chang
- Department of Chemical and Biological Engineering, Montana State University, Bozeman, 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, 59717, USA
| | - Richard Dela Rosa
- School of Science and Engineering, Ateneo de Manila University, Quezon City 1108, Philippines
| | - Eran Elhaik
- Department of Biology, Lund University, Sölvegatan 35, Lund, Sweden
| | - David Erickson
- Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14850, USA
| | | | - George Grills
- Department of Microbiology, University of Pennsylvania, Philadelphia, 19104, USA
| | - Kathrin Hadasch
- Université de Paris, INSERM U1284, Center for Research and Interdisciplinarity (CRI), 75006 Paris, France
- Department of Biology, Membrane Biophysics, Technische Universität Darmstadt, 64289 Darmstadt, Germany
- Lab3 eV, Labspace Darmstadt, 64295 Darmstadt, Germany
- IANUS Verein für Friedensorientierte Technikgestaltung eV, 64289 Darmstadt, Germany
| | - Andrew Hayden
- Center for Functional Genomics, Department of Biomedical Sciences, School of Public Health, University at Albany, State University of New York, Rensselaer, 12222, USA
| | | | - Julie A Karl
- Department of Pathology and Laboratory Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, Madison 53705, USA
| | | | | | | | - Zeph Landau
- Department of Computer Science, University of California, Berkeley, Berkeley, 94720, USA
| | - Kevin Land
- Mologic, Centre for Advanced Rapid Diagnostics, (CARD), Bedford Technology Park, Thurleigh MK44 2YA, England
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, 0028 Pretoria, South Africa
| | | | - Ariel B Lindner
- Université de Paris, INSERM U1284, Center for Research and Interdisciplinarity (CRI), 75006 Paris, France
| | - Benjamin E Mayer
- Department of Biology, Membrane Biophysics, Technische Universität Darmstadt, 64289 Darmstadt, Germany
- Lab3 eV, Labspace Darmstadt, 64295 Darmstadt, Germany
| | | | - Matthew T McLaughlin
- Department of Pathology and Laboratory Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, Madison 53705, USA
| | - Jenny Molloy
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, England
| | - Christopher Mozsary
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Jerry L Nadler
- Department of Pharmacology, New York Medical College, Valhalla, 10595, USA
| | - Melinee D'Silva
- Department of Pharmacology, New York Medical College, Valhalla, 10595, USA
| | - David Ng
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - David H O'Connor
- Department of Pathology and Laboratory Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, Madison 53705, USA
| | - Jerry E Ongerth
- University of Wollongong, Environmental Engineering, Wollongong NSW 2522, Australia
| | - Olayinka Osuolale
- Applied Environmental Metagenomics and Infectious Diseases Research (AEMIDR), Department of Biological Sciences, Elizade University, Ilara Mokin, Nigeria
| | - Ana Pinharanda
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Dennis Plenker
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Ravi Ranjan
- Genomics Resource Laboratory, Institute for Applied Life Sciences, University of Massachusetts, Amherst, 01003, USA
| | - Michael Rosbash
- Howard Hughes Medical Institute and Department of Biology, Brandeis University, Waltham, MA 02453, USA
| | | | | | | | - Scott Sherrill-Mix
- Department of Microbiology, University of Pennsylvania, Philadelphia, 19104, USA
| | | | - Shaina To
- School of Science and Engineering, Ateneo de Manila University, Quezon City 1108, Philippines
| | - Merly C Vogt
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Albert D Yu
- Howard Hughes Medical Institute and Department of Biology, Brandeis University, Waltham, MA 02453, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
13
|
Daw Elbait G, Henschel A, Tay GK, Al Safar HS. A Population-Specific Major Allele Reference Genome From The United Arab Emirates Population. Front Genet 2021; 12:660428. [PMID: 33968136 PMCID: PMC8102833 DOI: 10.3389/fgene.2021.660428] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/19/2021] [Indexed: 12/30/2022] Open
Abstract
The ethnic composition of the population of a country contributes to the uniqueness of each national DNA sequencing project and, ideally, individual reference genomes are required to reduce the confounding nature of ethnic bias. This work represents a representative Whole Genome Sequencing effort of an understudied population. Specifically, high coverage consensus sequences from 120 whole genomes and 33 whole exomes were used to construct the first ever population specific major allele reference genome for the United Arab Emirates (UAE). When this was applied and compared to the archetype hg19 reference, assembly of local Emirati genomes was reduced by ∼19% (i.e., some 1 million fewer calls). In compiling the United Arab Emirates Reference Genome (UAERG), sets of annotated 23,038,090 short (novel: 1,790,171) and 137,713 structural (novel: 8,462) variants; their allele frequencies (AFs) and distribution across the genome were identified. Population-specific genetic characteristics including loss-of-function variants, admixture, and ancestral haplogroup distribution were identified and reported here. We also detect a strong correlation between F ST and admixture components in the UAE. This baseline study was conceived to establish a high-quality reference genome and a genetic variations resource to enable the development of regional population specific initiatives and thus inform the application of population studies and precision medicine in the UAE.
Collapse
Affiliation(s)
- Gihan Daw Elbait
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Electrical Engineering and Computer Science, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Guan K. Tay
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Division of Psychiatry, Faculty of Health and Medical Sciences, The University of Western Australia, Crawley, WA, Australia
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
| | - Habiba S. Al Safar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Genetics and Molecular Biology, College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
14
|
Du L, Liu K, Yao X, Risacher SL, Han J, Saykin AJ, Guo L, Shen L. Multi-Task Sparse Canonical Correlation Analysis with Application to Multi-Modal Brain Imaging Genetics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:227-239. [PMID: 31634139 PMCID: PMC7156329 DOI: 10.1109/tcbb.2019.2947428] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Brain imaging genetics studies the genetic basis of brain structures and functionalities via integrating genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). In this area, both multi-task learning (MTL) and sparse canonical correlation analysis (SCCA) methods are widely used since they are superior to those independent and pairwise univariate analysis. MTL methods generally incorporate a few of QTs and could not select features from multiple QTs; while SCCA methods typically employ one modality of QTs to study its association with SNPs. Both MTL and SCCA are computational expensive as the number of SNPs increases. In this paper, we propose a novel multi-task SCCA (MTSCCA) method to identify bi-multivariate associations between SNPs and multi-modal imaging QTs. MTSCCA could make use of the complementary information carried by different imaging modalities. MTSCCA enforces sparsity at the group level via the G2,1-norm, and jointly selects features across multiple tasks for SNPs and QTs via the l2,1-norm. A fast optimization algorithm is proposed using the grouping information of SNPs. Compared with conventional SCCA methods, MTSCCA obtains better correlation coefficients and canonical weights patterns. In addition, MTSCCA runs very fast and easy-to-implement, indicating its potential power in genome-wide brain-wide imaging genetics.
Collapse
|
15
|
Lee H, Shuaibi A, Bell JM, Pavlichin DS, Ji HP. Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations. NAR Cancer 2020; 2:zcaa034. [PMID: 33345188 PMCID: PMC7727745 DOI: 10.1093/narcan/zcaa034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 10/23/2020] [Accepted: 11/12/2020] [Indexed: 12/26/2022] Open
Abstract
Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ahmed Shuaibi
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - John M Bell
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | - Dmitri S Pavlichin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
16
|
Swart Y, van Eeden G, Sparks A, Uren C, Möller M. Prospective avenues for human population genomics and disease mapping in southern Africa. Mol Genet Genomics 2020; 295:1079-1089. [PMID: 32440765 PMCID: PMC7240165 DOI: 10.1007/s00438-020-01684-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 05/06/2020] [Indexed: 12/22/2022]
Abstract
Population substructure within human populations is globally evident and a well-known confounding factor in many genetic studies. In contrast, admixture mapping exploits population stratification to detect genotype-phenotype correlations in admixed populations. Southern Africa has untapped potential for disease mapping of ancestry-specific disease risk alleles due to the distinct genetic diversity in its populations compared to other populations worldwide. This diversity contributes to a number of phenotypes, including ancestry-specific disease risk and response to pathogens. Although the 1000 Genomes Project significantly improved our understanding of genetic variation globally, southern African populations are still severely underrepresented in biomedical and human genetic studies due to insufficient large-scale publicly available data. In addition to a lack of genetic data in public repositories, existing software, algorithms and resources used for imputation and phasing of genotypic data (amongst others) are largely ineffective for populations with a complex genetic architecture such as that seen in southern Africa. This review article, therefore, aims to summarise the current limitations of conducting genetic studies on populations with a complex genetic architecture to identify potential areas for further research and development.
Collapse
Affiliation(s)
- Yolandi Swart
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Gerald van Eeden
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Anel Sparks
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Caitlin Uren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.
| |
Collapse
|
17
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
18
|
Roddy AC, Jurek-Loughrey A, Souza J, Gilmore A, O’Reilly PG, Stupnikov A, Gonzalez de Castro D, Prise KM, Salto-Tellez M, McArt DG. NUQA: Estimating Cancer Spatial and Temporal Heterogeneity and Evolution through Alignment-Free Methods. Mol Biol Evol 2019; 36:2883-2889. [PMID: 31424551 PMCID: PMC6878956 DOI: 10.1093/molbev/msz182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Longitudinal next-generation sequencing of cancer patient samples has enhanced our understanding of the evolution and progression of various cancers. As a result, and due to our increasing knowledge of heterogeneity, such sampling is becoming increasingly common in research and clinical trial sample collections. Traditionally, the evolutionary analysis of these cohorts involves the use of an aligner followed by subsequent stringent downstream analyses. However, this can lead to large levels of information loss due to the vast mutational landscape that characterizes tumor samples. Here, we propose an alignment-free approach for sequence comparison-a well-established approach in a range of biological applications including typical phylogenetic classification. Such methods could be used to compare information collated in raw sequence files to allow an unsupervised assessment of the evolutionary trajectory of patient genomic profiles. In order to highlight this utility in cancer research we have applied our alignment-free approach using a previously established metric, Jensen-Shannon divergence, and a metric novel to this area, Hellinger distance, to two longitudinal cancer patient cohorts in glioma and clear cell renal cell carcinoma using our software, NUQA. We hypothesize that this approach has the potential to reveal novel information about the heterogeneity and evolutionary trajectory of spatiotemporal tumor samples, potentially revealing early events in tumorigenesis and the origins of metastases and recurrences. Key words: alignment-free, Hellinger distance, exome-seq, evolution, phylogenetics, longitudinal.
Collapse
Affiliation(s)
- Aideen C Roddy
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Anna Jurek-Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, United Kingdom
| | - Jose Souza
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Alan Gilmore
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Paul G O’Reilly
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Alexey Stupnikov
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD
| | - David Gonzalez de Castro
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Kevin M Prise
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Manuel Salto-Tellez
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| | - Darragh G McArt
- Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, United Kingdom
| |
Collapse
|
19
|
Ibrahim O, Sutherland HG, Haupt LM, Griffiths LR. Saliva as a comparable-quality source of DNA for Whole Exome Sequencing on Ion platforms. Genomics 2019; 112:1437-1443. [PMID: 31445087 DOI: 10.1016/j.ygeno.2019.08.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 08/05/2019] [Accepted: 08/19/2019] [Indexed: 11/17/2022]
Abstract
BACKGROUND Whole Exome Sequencing (WES) utilises overlapping fragments prone to sequencing artefacts. Saliva, a non-invasive source of DNA, has been successfully used in WES studies on various platforms. This study explored the validity and quality of DNA sourced from saliva compared to whole blood on an Ion Platform. METHODS DNA was extracted from both sample types from four individuals. WES, performed on the Ion Proton platform was assessed for quality metrics (Depth, Genotyping Quality, etc.) and variant identification for the same source sample-pairs. RESULTS No significant differences in quality metrics were identified between data obtained from whole blood and saliva samples, with several saliva samples demonstrating higher coverage depth. Variants within the same sample, from the two genomic DNA sources, had an average concordance similar to other studies and platforms with different chemistry. CONCLUSION Saliva-extracted DNA provides comparable sequencing quality to whole blood for WES on Ion Torrent Platforms.
Collapse
Affiliation(s)
- Omar Ibrahim
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia
| | - Heidi G Sutherland
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia
| | - Larisa M Haupt
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia
| | - Lyn R Griffiths
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia.
| |
Collapse
|
20
|
Abstract
The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.
Collapse
Affiliation(s)
- Sara Ballouz
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Jesse A Gillis
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
21
|
Du L, Liu K, Yao X, Risacher SL, Han J, Guo L, Saykin AJ, Shen L. Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2019; 2018:356-361. [PMID: 30881731 DOI: 10.1109/bibm.2018.8621298] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Brain imaging genetics studies the genetic basis of brain structures and functions via integrating both genotypic data such as single nucleotide polymorphism (SNP) and imaging quantitative traits (QTs). In this area, both multi-task learning (MTL) and sparse canonical correlation analysis (SCCA) methods are widely used since they are superior to those independent and pairwise univariate analyses. MTL methods generally incorporate a few of QTs and are not designed for feature selection from a large number of QTs; while existing SCCA methods typically employ only one modality of QTs to study its association with SNPs. Both MTL and SCCA encounter computational challenges as the number of SNPs increases. In this paper, combining the merits of MTL and SCCA, we propose a novel multi-task SCCA (MTSCCA) learning framework to identify bi-multivariate associations between SNPs and multi-modal imaging QTs. MTSCCA could make use of the complementary information carried by different imaging modalities. Using the G 2,1-norm regularization, MTSCCA treats all SNPs in the same group together to enforce sparsity at the group level. The l 2 , 1 -norm penalty is used to jointly select features across multiple tasks for SNPs, and across multiple modalities for QTs. A fast optimization algorithm is proposed using the grouping information of SNPs. Compared with conventional SCCA methods, MTSCCA obtains improved performance regarding both correlation coefficients and canonical weights patterns. In addition, our method runs very fast and is easy-to-implement, and thus could provide a powerful tool for genome-wide brain-wide imaging genetic studies.
Collapse
Affiliation(s)
- Lei Du
- School of Automation, Northwestern Polytechnical University
| | - Kefei Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine
| | - Xiaohui Yao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine
| | - Junwei Han
- School of Automation, Northwestern Polytechnical University
| | - Lei Guo
- School of Automation, Northwestern Polytechnical University
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine
| | | |
Collapse
|
22
|
Kaiser VB, Semple CA. Chromatin loop anchors are associated with genome instability in cancer and recombination hotspots in the germline. Genome Biol 2018; 19:101. [PMID: 30060743 PMCID: PMC6066925 DOI: 10.1186/s13059-018-1483-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 07/13/2018] [Indexed: 01/07/2023] Open
Abstract
Background Chromatin loops form a basic unit of interphase nuclear organization, with chromatin loop anchor points providing contacts between regulatory regions and promoters. However, the mutational landscape at these anchor points remains under-studied. Here, we describe the unusual patterns of somatic mutations and germline variation associated with loop anchor points and explore the underlying features influencing these patterns. Results Analyses of whole genome sequencing datasets reveal that anchor points are strongly depleted for single nucleotide variants (SNVs) in tumours. Despite low SNV rates in their genomic neighbourhood, anchor points emerge as sites of evolutionary innovation, showing enrichment for structural variant (SV) breakpoints and a peak of SNVs at focal CTCF sites within the anchor points. Both CTCF-bound and non-CTCF anchor points harbour an excess of SV breakpoints in multiple tumour types and are prone to double-strand breaks in cell lines. Common fragile sites, which are hotspots for genome instability, also show elevated numbers of intersecting loop anchor points. Recurrently disrupted anchor points are enriched for genes with functions in cell cycle transitions and regions associated with predisposition to cancer. We also discover a novel class of CTCF-bound anchor points which overlap meiotic recombination hotspots and are enriched for the core PRDM9 binding motif, suggesting that the anchor points have been foci for diversity generated during recent human evolution. Conclusions We suggest that the unusual chromatin environment at loop anchor points underlies the elevated rates of variation observed, marking them as sites of regulatory importance but also genomic fragility. Electronic supplementary material The online version of this article (10.1186/s13059-018-1483-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vera B Kaiser
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK.
| | - Colin A Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| |
Collapse
|
23
|
Toubia J, Conn VM, Conn SJ. Don't go in circles: confounding factors in gene expression profiling. EMBO J 2018; 37:embj.201797945. [PMID: 29735571 DOI: 10.15252/embj.201797945] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Affiliation(s)
- John Toubia
- ACRF Cancer Genomics Facility, SA Pathology, Adelaide, SA, Australia.,Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, Australia
| | - Vanessa M Conn
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, Australia.,Flinders Centre for Innovation in Cancer, College of Medicine & Public Health, Flinders University, Adelaide, SA, Australia
| | - Simon J Conn
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, Australia.,Flinders Centre for Innovation in Cancer, College of Medicine & Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
24
|
Borghesi A, Mencarelli MA, Memo L, Ferrero GB, Bartuli A, Genuardi M, Stronati M, Villani A, Renieri A, Corsello G. Intersociety policy statement on the use of whole-exome sequencing in the critically ill newborn infant. Ital J Pediatr 2017; 43:100. [PMID: 29100554 PMCID: PMC5670717 DOI: 10.1186/s13052-017-0418-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 10/17/2017] [Indexed: 01/05/2023] Open
Abstract
The rapid advancement of next-generation sequencing (NGS) technology and the decrease in costs for whole-exome sequencing (WES) and whole-genome sequening (WGS), has prompted its clinical application in several fields of medicine. Currently, there are no specific guidelines for the use of NGS in the field of neonatal medicine and in the diagnosis of genetic diseases in critically ill newborn infants. As a consequence, NGS may be underused with reduced diagnostic success rate, or overused, with increased costs for the healthcare system. Most genetic diseases may be already expressed during the neonatal age, but their identification may be complicated by nonspecific presentation, especially in the setting of critical clinical conditions. The differential diagnosis process in the neonatal intensive care unit (NICU) may be time-consuming, uncomfortable for the patient due to repeated sampling, and ineffective in reaching a molecular diagnosis during NICU stay. Serial gene sequencing (Sanger sequencing) may be successful only for conditions for which the clinical phenotype strongly suggests a diagnostic hypothesis and for genetically homogeneous diseases. Newborn screenings with Guthrie cards, which vary from country to country, are designed to only test for a few dozen genetic diseases out of the more than 6000 diseases for which a genetic characterization is available. The use of WES in selected cases in the NICU may overcome these issues. We present an intersociety document that aims to define the best indications for the use of WES in different clinical scenarios in the NICU. We propose that WES is used in the NICU for critically ill newborn infants when an early diagnosis is desirable to guide the clinical management during NICU stay, when a strong hypothesis cannot be formulated based on the clinical phenotype or the disease is genetically heterogeneous, and when specific non-genetic laboratory tests are not available. The use of WES may reduce the time for diagnosis in infants during NICU stay and may eventually result in cost-effectiveness.
Collapse
Affiliation(s)
- Alessandro Borghesi
- Neonatal Intensive Care Unit, Fondazione IRCCS Policlinco San Matteo, Piazzale Golgi, 19, 27100 Pavia, Italy
| | | | - Luigi Memo
- Pediatric Department, S. Martino Hospital, Belluno, Italy
| | | | - Andrea Bartuli
- Rare Diseases and Medical Genetic Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
| | - Maurizio Genuardi
- Institute of Genomic Medicine, Università Cattolica Del Sacro Cuore, Fondazione Policlinico A. Gemelli, Rome, Italy
| | - Mauro Stronati
- Neonatal Intensive Care Unit, Fondazione IRCCS Policlinco San Matteo, Piazzale Golgi, 19, 27100 Pavia, Italy
| | - Alberto Villani
- Pediatric and Infectious Disease Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
| | - Alessandra Renieri
- Genetica Medica, Azienda Ospedaliera Universitaria Senese, Siena, Italy
- Medical Genetics, University of Siena, Siena, Italy
| | - Giovanni Corsello
- Operative Unit of Pediatrics and Neonatal Intensive Therapy, Mother and Child Department, University of Palermo, Palermo, Italy
| |
Collapse
|
25
|
Worthey EA. Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis. ACTA ACUST UNITED AC 2017; 95:9.24.1-9.24.28. [PMID: 29044471 DOI: 10.1002/cphg.49] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 10 years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing or analysis (given access to appropriate tools), but rather clinical interpretation. Interpretation of genetic findings in a complex and ever changing clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires application of appropriate interpretation tools, as well as development and application of appropriate methodologies and standard procedures. This unit provides an overview of these items. Specific challenges related to implementation of genome-wide sequencing in a clinical setting are discussed. © 2017 by John Wiley & Sons, Inc.
Collapse
|
26
|
Highly Variable Genomic Landscape of Endogenous Retroviruses in the C57BL/6J Inbred Strain, Depending on Individual Mouse, Gender, Organ Type, and Organ Location. Int J Genomics 2017; 2017:3152410. [PMID: 28951865 PMCID: PMC5603323 DOI: 10.1155/2017/3152410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 06/16/2017] [Accepted: 07/03/2017] [Indexed: 11/17/2022] Open
Abstract
Transposable repetitive elements, named the "TREome," represent ~40% of the mouse genome. We postulate that the germ line genome undergoes temporal and spatial diversification into somatic genomes in conjunction with the TREome activity. C57BL/6J inbred mice were subjected to genomic landscape analyses using a TREome probe from murine leukemia virus-type endogenous retroviruses (MLV-ERVs). None shared the same MLV-ERV landscape within each comparison group: (1) sperm and 18 tissues from one mouse, (2) six brain compartments from two females, (3) spleen and thymus samples from four age groups, (4) three spatial tissue sets from two females, and (5) kidney and liver samples from three females and three males. Interestingly, males had more genomic MLV-ERV copies than females; moreover, only in the males, the kidneys had higher MLV-ERV copies than the livers. Perhaps, the mouse-, gender-, and tissue/cell-dependent MLV-ERV landscapes are linked to the individual-specific and dynamic phenotypes of the C57BL/6J inbred population.
Collapse
|
27
|
Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 2017; 27:849-864. [PMID: 28396521 PMCID: PMC5411779 DOI: 10.1101/gr.213611.116] [Citation(s) in RCA: 569] [Impact Index Per Article: 81.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 03/14/2017] [Indexed: 11/24/2022]
Abstract
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
Collapse
Affiliation(s)
- Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Tina Graves-Lindsay
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Kerstin Howe
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Hsiu-Chuan Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Paul A Kitts
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Derek Albracht
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Robert S Fulton
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Vincent Magrini
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Chris Markovic
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Sean McGrath
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | | | - Kate Auger
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - William Chow
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Joanna Collins
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glenn Harden
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Timothy Hubbard
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Sarah Pelan
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jared T Simpson
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glen Threadgold
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - James Torrance
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jonathan M Wood
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Paul Peluso
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Heng Li
- Broad Institute, Cambridge, Massachusetts 02142, USA
| | | | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Richard K Wilson
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Deanna M Church
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
28
|
Mason CE, Afshinnekoo E, Tighe S, Wu S, Levy S. International Standards for Genomes, Transcriptomes, and Metagenomes. J Biomol Tech 2017; 28:8-18. [PMID: 28337071 PMCID: PMC5359768 DOI: 10.7171/jbt.17-2801-006] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine.
Collapse
Affiliation(s)
- Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
- Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
- School of Medicine, New York Medical College, Valhalla, New York 10595, USA
| | - Scott Tighe
- Advanced Genomics Lab, University of Vermont Cancer Center, Burlington, Vermont 05405, USA
| | - Shixiu Wu
- Hangzhou Cancer Institute in Hangzhou Cancer Hospital, Hangzhou, China; and
| | - Shawn Levy
- HudsonAlpha Institute of Technology, Huntsville, Alabama 35806, USA
| |
Collapse
|
29
|
Ahsanuddin S, Afshinnekoo E, Gandara J, Hakyemezoğlu M, Bezdan D, Minot S, Greenfield N, Mason CE. Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications. J Biomol Tech 2017; 28:46-55. [PMID: 28344519 DOI: 10.7171/jbt.17-2801-008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample's DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman's rank order coefficient (R2 > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR "jackpot effect," with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified samples, these results demonstrate that amplification of DNA using the REPLI-g method has some limitations. These concerns could be addressed by future improvements in the enzymes or methods for REPLI-g to be considered a >99% robust method for increasing the amount of high-fidelity DNA from low-biomass samples or at the very least, accounted for during computational analysis of MDA samples.
Collapse
Affiliation(s)
- Sofia Ahsanuddin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, USA;; School of Medicine, New York Medical College, Valhalla, New York, USA
| | - Jorge Gandara
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, USA
| | - Mustafa Hakyemezoğlu
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, USA
| | - Daniela Bezdan
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, USA
| | | | | | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, USA;; Feil Family Brain & Mind Research Institute, New York, New York, USA
| |
Collapse
|
30
|
Afshinnekoo E, Chou C, Alexander N, Ahsanuddin S, Schuetz AN, Mason CE. Precision Metagenomics: Rapid Metagenomic Analyses for Infectious Disease Diagnostics and Public Health Surveillance. J Biomol Tech 2017; 28:40-45. [PMID: 28337072 DOI: 10.7171/jbt.17-2801-007] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Next-generation sequencing (NGS) technologies have ushered in the era of precision medicine, transforming the way we treat cancer patients and diagnose disease. Concomitantly, the advent of these technologies has created a surge of microbiome and metagenomic studies over the last decade, many of which are focused on investigating the host-gene-microbial interactions responsible for the development and spread of infectious diseases, as well as delineating their key role in maintaining health. As we continue to discover more information about the etiology of infectious diseases, the translational potential of metagenomic NGS methods for treatment and rapid diagnosis is becoming abundantly clear. Here, we present a robust protocol for the implementation and application of "precision metagenomics" across various sequencing platforms for clinical samples. Such a pipeline integrates DNA/RNA extraction, library preparation, sequencing, and bioinformatics analyses for taxonomic classification, antimicrobial resistance (AMR) marker screening, and functional analysis (biochemical and metabolic pathway abundance). Moreover, the pipeline has 3 tracks: STAT for results within 24 h; Comprehensive that affords a more in-depth analysis and takes between 5 and 7 d, but offers antimicrobial resistance information; and Targeted, which also requires 5-7 d, but with more sensitive analysis for specific pathogens. Finally, we discuss the challenges that need to be addressed before full integration in the clinical setting.
Collapse
Affiliation(s)
- Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10021, USA;; School of Medicine, New York Medical College, Valhalla, New York 10595, USA
| | - Chou Chou
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10021, USA
| | - Noah Alexander
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10021, USA
| | - Sofia Ahsanuddin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10021, USA
| | - Audrey N Schuetz
- Department of Laboratory Medicine and Pathology, Mayo Clinic College of Medicine and Science, Rochester, Minnesota 55905, USA; and
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA;; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10021, USA;; Feil Family Brain & Mind Research Institute, New York, New York 10065, USA
| |
Collapse
|
31
|
GenePANDA-a novel network-based gene prioritizing tool for complex diseases. Sci Rep 2017; 7:43258. [PMID: 28252032 PMCID: PMC5333103 DOI: 10.1038/srep43258] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/23/2017] [Indexed: 02/08/2023] Open
Abstract
Here we describe GenePANDA, a novel network-based tool for prioritizing candidate disease genes. GenePANDA assesses whether a gene is likely a candidate disease gene based on its relative distance to known disease genes in a functional association network. A unique feature of GenePANDA is the introduction of adjusted network distance derived by normalizing the raw network distance between two genes with their respective mean raw network distance to all other genes in the network. The use of adjusted network distance significantly improves GenePANDA’s performance on prioritizing complex disease genes. GenePANDA achieves superior performance over five previously published algorithms for prioritizing disease genes. Finally, GenePANDA can assist in prioritizing functionally important SNPs identified by GWAS.
Collapse
|
32
|
A Fast SCCA Algorithm for Big Data Analysis in Brain Imaging Genetics. GRAPHS IN BIOMEDICAL IMAGE ANALYSIS, COMPUTATIONAL ANATOMY AND IMAGING GENETICS 2017. [DOI: 10.1007/978-3-319-67675-3_19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
33
|
Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016; 239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]
Abstract
Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping.
Collapse
Affiliation(s)
- Chanson J Brumme
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, London, Ontario, Canada.
| |
Collapse
|
34
|
An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes. Nat Commun 2016; 7:13637. [PMID: 27882922 PMCID: PMC5123046 DOI: 10.1038/ncomms13637] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 10/18/2016] [Indexed: 12/20/2022] Open
Abstract
Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity.
Collapse
|
35
|
Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J Genet 2016; 94:731-40. [PMID: 26690529 DOI: 10.1007/s12041-015-0588-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low disc ordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.
Collapse
|
36
|
Vicini P, Fields O, Lai E, Litwack ED, Martin AM, Morgan TM, Pacanowski MA, Papaluca M, Perez OD, Ringel MS, Robson M, Sakul H, Vockley J, Zaks T, Dolsten M, Søgaard M. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development. Clin Pharmacol Ther 2015; 99:198-207. [PMID: 26536838 DOI: 10.1002/cpt.293] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 10/30/2015] [Indexed: 12/15/2022]
Abstract
High throughput molecular and functional profiling of patients is a key driver of precision medicine. DNA and RNA characterization has been enabled at unprecedented cost and scale through rapid, disruptive progress in sequencing technology, but challenges persist in data management and interpretation. We analyze the state-of-the-art of large-scale unbiased sequencing in drug discovery and development, including technology, application, ethical, regulatory, policy and commercial considerations, and discuss issues of LUS implementation in clinical and regulatory practice.
Collapse
Affiliation(s)
- P Vicini
- Pfizer Worldwide Research & Development, La Jolla, California, Collegeville, Pennsylvania, and New York, New York, USA
| | - O Fields
- Pfizer Worldwide Research & Development, La Jolla, California, Collegeville, Pennsylvania, and New York, New York, USA
| | - E Lai
- Takeda Pharmaceuticals International, Deerfield, Illinois, USA
| | - E D Litwack
- Food and Drug Administration, Silver Spring, Maryland, USA
| | - A-M Martin
- GlaxoSmithKline, Collegeville, Pennsylvania, USA
| | - T M Morgan
- Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, and East Hanover, New Jersey, USA
| | - M A Pacanowski
- Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - O D Perez
- Pfizer Worldwide Research & Development, La Jolla, California, Collegeville, Pennsylvania, and New York, New York, USA
| | - M S Ringel
- Boston Consulting Group, Boston, Massachusetts, USA
| | - M Robson
- Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, and East Hanover, New Jersey, USA
| | - H Sakul
- Pfizer Worldwide Research & Development, La Jolla, California, Collegeville, Pennsylvania, and New York, New York, USA
| | - J Vockley
- Inova Translational Medicine Institute, Falls Church, Virginia, USA
| | - T Zaks
- Sanofi, Cambridge, Massachusetts, USA
| | - M Dolsten
- Pfizer Worldwide Research & Development, La Jolla, California, Collegeville, Pennsylvania, and New York, New York, USA
| | - M Søgaard
- Pfizer Worldwide Research & Development, La Jolla, California, Collegeville, Pennsylvania, and New York, New York, USA
| |
Collapse
|
37
|
Ni G, Strom TM, Pausch H, Reimer C, Preisinger R, Simianer H, Erbe M. Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken. BMC Genomics 2015; 16:824. [PMID: 26486989 PMCID: PMC4618161 DOI: 10.1186/s12864-015-2059-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 10/09/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The technical progress in the last decade has made it possible to sequence millions of DNA reads in a relatively short time frame. Several variant callers based on different algorithms have emerged and have made it possible to extract single nucleotide polymorphisms (SNPs) out of the whole-genome sequence. Often, only a few individuals of a population are sequenced completely and imputation is used to obtain genotypes for all sequence-based SNP loci for other individuals, which have been genotyped for a subset of SNPs using a genotyping array. METHODS First, we compared the sets of variants detected with different variant callers, namely GATK, freebayes and SAMtools, and checked the quality of genotypes of the called variants in a set of 50 fully sequenced white and brown layers. Second, we assessed the imputation accuracy (measured as the correlation between imputed and true genotype per SNP and per individual, and genotype conflict between father-progeny pairs) when imputing from high density SNP array data to whole-genome sequence using data from around 1000 individuals from six different generations. Three different imputation programs (Minimac, FImpute and IMPUTE2) were checked in different validation scenarios. RESULTS There were 1,741,573 SNPs detected by all three callers on the studied chromosomes 3, 6, and 28, which was 71.6 % (81.6 %, 88.0 %) of SNPs detected by GATK (SAMtools, freebayes) in total. Genotype concordance (GC) defined as the proportion of individuals whose array-derived genotypes are the same as the sequence-derived genotypes over all non-missing SNPs on the array were 0.98 (GATK), 0.97 (freebayes) and 0.98 (SAMtools). Furthermore, the percentage of variants that had high values (>0.9) for another three measures (non-reference sensitivity, non-reference genotype concordance and precision) were 90 (88, 75) for GATK (SAMtools, freebayes). With all imputation programs, correlation between original and imputed genotypes was >0.95 on average with randomly masked 1000 SNPs from the SNP array and >0.85 for a leave-one-out cross-validation within sequenced individuals. CONCLUSIONS Performance of all variant callers studied was very good in general, particularly for GATK and SAMtools. FImpute performed slightly worse than Minimac and IMPUTE2 in terms of genotype correlation, especially for SNPs with low minor allele frequency, while it had lowest numbers in Mendelian conflicts in available father-progeny pairs. Correlations of real and imputed genotypes remained constantly high even if individuals to be imputed were several generations away from the sequenced individuals.
Collapse
Affiliation(s)
- Guiyan Ni
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | - Tim M Strom
- Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Germany.
| | - Hubert Pausch
- Chair of Animal Breeding, Technische Universität München, Freising, Germany.
| | - Christian Reimer
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | | | - Henner Simianer
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | - Malena Erbe
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany. .,Institute for Animal Breeding, Bavarian State Research Centre for Agriculture, Grub, Germany.
| |
Collapse
|
38
|
Clark PM, Kunkel M, Monos DS. The dichotomy between disease phenotype databases and the implications for understanding complex diseases involving the major histocompatibility complex. Int J Immunogenet 2015; 42:413-22. [PMID: 26456690 DOI: 10.1111/iji.12236] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Revised: 07/14/2015] [Accepted: 08/16/2015] [Indexed: 01/08/2023]
Abstract
Many genes related to innate and adaptive immunity reside within the major histocompatibility complex (MHC) and have been associated with a multitude of complex, immune-related disorders. Despite years of genetic study, this region has seen few causative determinants discovered for immune-mediated diseases. Reported associations have been curated in various databases including the Genetic Association Database, NCBI database of clinically relevant variants (ClinVar) and the Human Gene Mutation Database and together capture genetic associations and annotated pathogenic loci within the MHC and across the genome for a variety of complex, immune-mediated diseases. A review of these three distinct databases reveals disparate annotations between associated genes and pathogenic loci, alluding to the polygenic, multifactorial nature of immune-mediated diseases and the pleiotropic character of genes within the MHC. The technical limitations and inherent biases imposed by current approaches and technologies in studying the MHC create a strong case for the need to perform targeted deep sequencing of the MHC and other immunologically relevant loci in order to fully elucidate and study the causative elements of complex immune-mediated diseases.
Collapse
Affiliation(s)
- P M Clark
- Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - M Kunkel
- Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - D S Monos
- Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
39
|
Reinert K, Langmead B, Weese D, Evers DJ. Alignment of Next-Generation Sequencing Reads. Annu Rev Genomics Hum Genet 2015; 16:133-51. [DOI: 10.1146/annurev-genom-090413-025358] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; ,
| | - Ben Langmead
- Department of Computer Science and Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21218;
| | - David Weese
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; ,
| | | |
Collapse
|
40
|
Tetreault M, Bareke E, Nadaf J, Alirezaie N, Majewski J. Whole-exome sequencing as a diagnostic tool: current challenges and future opportunities. Expert Rev Mol Diagn 2015; 15:749-60. [PMID: 25959410 DOI: 10.1586/14737159.2015.1039516] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Whole-exome sequencing (WES) represents a significant breakthrough in the field of human genetics. This technology has largely contributed to the identification of new disease-causing genes and is now entering clinical laboratories. WES represents a powerful tool for diagnosis and could reduce the 'diagnostic odyssey' for many patients. In this review, we present a technical overview of WES analysis, variants annotation and interpretation in a clinical setting. We evaluate the usefulness of clinical WES in different clinical indications, such as rare diseases, cancer and complex diseases. Finally, we discuss the efficacy of WES as a diagnostic tool and the impact on patient management.
Collapse
Affiliation(s)
- Martine Tetreault
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
| | | | | | | | | |
Collapse
|
41
|
Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry. BMC Genomics 2015; 16:92. [PMID: 25765185 PMCID: PMC4336699 DOI: 10.1186/s12864-015-1233-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 01/12/2015] [Indexed: 12/30/2022] Open
Abstract
Background The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; “tent-dwelling” Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage. Results We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 ‘loss-of-function’ and 2,314 ’deleterious’ coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant’s family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays. Conclusions We report, for the first time, reference genome resource for the population of Persian ancestry. The resource provides a starting point for designing large-scale genetic studies in Peninsula including Kuwait, and Persian population. Such efforts on populations under-represented in global genome variation surveys help augment current knowledge on human genome diversity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1233-x) contains supplementary material, which is available to authorized users.
Collapse
|
42
|
Wijaya E, Shimizu K, Asai K, Hamada M. Reference-free prediction of rearrangement breakpoint reads. ACTA ACUST UNITED AC 2014; 30:2559-67. [PMID: 24876376 DOI: 10.1093/bioinformatics/btu360] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
MOTIVATION Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information. RESULTS In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼ 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome. AVAILABILITY AND IMPLEMENTATION The source code of SlideSort-BPR can be freely downloaded from https://code.google.com/p/slidesort-bpr/.
Collapse
Affiliation(s)
- Edward Wijaya
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kana Shimizu
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kiyoshi Asai
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
43
|
Jacob HJ, Abrams K, Bick DP, Brodie K, Dimmock DP, Farrell M, Geurts J, Harris J, Helbling D, Joers BJ, Kliegman R, Kowalski G, Lazar J, Margolis DA, North P, Northup J, Roquemore-Goins A, Scharer G, Shimoyama M, Strong K, Taylor B, Tsaih SW, Tschannen MR, Veith RL, Wendt-Andrae J, Wilk B, Worthey EA. Genomics in clinical practice: lessons from the front lines. Sci Transl Med 2014; 5:194cm5. [PMID: 23863829 DOI: 10.1126/scitranslmed.3006468] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The price of whole-genome and -exome sequencing has fallen to the point where these methods can be applied to clinical medicine. Here, we outline the lessons we have learned in converting a sequencing laboratory designed for research into a fully functional clinical program.
Collapse
Affiliation(s)
- Howard J Jacob
- Human and Molecular Genetic Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Watson CT, Marques-Bonet T, Sharp AJ, Mefford HC. The genetics of microdeletion and microduplication syndromes: an update. Annu Rev Genomics Hum Genet 2014; 15:215-244. [PMID: 24773319 DOI: 10.1146/annurev-genom-091212-153408] [Citation(s) in RCA: 115] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Chromosomal abnormalities, including microdeletions and microduplications, have long been associated with abnormal developmental outcomes. Early discoveries relied on a common clinical presentation and the ability to detect chromosomal abnormalities by standard karyotype analysis or specific assays such as fluorescence in situ hybridization. Over the past decade, the development of novel genomic technologies has allowed more comprehensive, unbiased discovery of microdeletions and microduplications throughout the human genome. The ability to quickly interrogate large cohorts using chromosome microarrays and, more recently, next-generation sequencing has led to the rapid discovery of novel microdeletions and microduplications associated with disease, including very rare but clinically significant rearrangements. In addition, the observation that some microdeletions are associated with risk for several neurodevelopmental disorders contributes to our understanding of shared genetic susceptibility for such disorders. Here, we review current knowledge of microdeletion/duplication syndromes, with a particular focus on recurrent rearrangement syndromes.
Collapse
Affiliation(s)
- Corey T Watson
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva, Universitat Pompeu Fabra/CSIC, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain.,Centro Nacional de Análisis Genómico, 08023 Barcelona, Spain
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Heather C Mefford
- Department of Pediatrics, University of Washington, Seattle, Washington 98195
| |
Collapse
|
45
|
Bodian DL, McCutcheon JN, Kothiyal P, Huddleston KC, Iyer RK, Vockley JG, Niederhuber JE. Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing. PLoS One 2014; 9:e94554. [PMID: 24728327 PMCID: PMC3984285 DOI: 10.1371/journal.pone.0094554] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 02/17/2014] [Indexed: 01/05/2023] Open
Abstract
Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.
Collapse
Affiliation(s)
- Dale L. Bodian
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Justine N. McCutcheon
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Prachi Kothiyal
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Kathi C. Huddleston
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Ramaswamy K. Iyer
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Joseph G. Vockley
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
- * E-mail:
| | - John E. Niederhuber
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| |
Collapse
|
46
|
Ling Y, Jin Z, Su M, Zhong J, Zhao Y, Yu J, Wu J, Xiao J. VCGDB: a dynamic genome database of the Chinese population. BMC Genomics 2014; 15:265. [PMID: 24708222 PMCID: PMC4028056 DOI: 10.1186/1471-2164-15-265] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Accepted: 03/28/2014] [Indexed: 12/18/2022] Open
Abstract
Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases.
Collapse
Affiliation(s)
| | | | | | | | | | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | | | | |
Collapse
|
47
|
Patel ZH, Kottyan LC, Lazaro S, Williams MS, Ledbetter DH, Tromp H, Rupert A, Kohram M, Wagner M, Husami A, Qian Y, Valencia CA, Zhang K, Hostetter MK, Harley JB, Kaufman KM. The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors. Front Genet 2014; 5:16. [PMID: 24575121 PMCID: PMC3921572 DOI: 10.3389/fgene.2014.00016] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 01/16/2014] [Indexed: 12/30/2022] Open
Abstract
Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.
Collapse
Affiliation(s)
- Zubin H Patel
- Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Medical Scientist Training Program, University of Cincinnati College of Medicine, Cincinnati OH, USA
| | - Leah C Kottyan
- Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Department of Veterans Affairs, Veterans Affairs Medical Center - Cincinnati, Cincinnati OH, USA
| | - Sara Lazaro
- Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Department of Veterans Affairs, Veterans Affairs Medical Center - Cincinnati, Cincinnati OH, USA
| | - Marc S Williams
- Genomic Medicine Institute, Geisinger Health System, Danville PA, USA
| | - David H Ledbetter
- Genomic Medicine Institute, Geisinger Health System, Danville PA, USA
| | - Hbgerard Tromp
- Genomic Medicine Institute, Geisinger Health System, Danville PA, USA
| | - Andrew Rupert
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - Mojtaba Kohram
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - Michael Wagner
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - Ammar Husami
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - Yaping Qian
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - C Alexander Valencia
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - Kejian Zhang
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - Margaret K Hostetter
- Division of Infectious Disease, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA
| | - John B Harley
- Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Department of Veterans Affairs, Veterans Affairs Medical Center - Cincinnati, Cincinnati OH, USA
| | - Kenneth M Kaufman
- Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Department of Veterans Affairs, Veterans Affairs Medical Center - Cincinnati, Cincinnati OH, USA
| |
Collapse
|
48
|
Moore CB, Wallace JR, Wolfe DJ, Frase AT, Pendergrass SA, Weiss KM, Ritchie MD. Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data. PLoS Genet 2013; 9:e1003959. [PMID: 24385916 PMCID: PMC3873241 DOI: 10.1371/journal.pgen.1003959] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 10/01/2013] [Indexed: 12/13/2022] Open
Abstract
Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses. Low frequency variants are likely to play an important role in uncovering complex trait heritability; however, they are often continent or population specific. This specificity complicates genetic analyses investigating low frequency variants for two reasons: low frequency variant signals in an association test are often difficult to generalize beyond a single population or continental group, and there is an increase in false positive results in association analyses due to underlying population stratification. In order to reveal the magnitude of low frequency population stratification, we performed pairwise population comparisons using the 1000 Genomes Project Phase I data to investigate differences in low frequency variant burden across multiple biological features. We found that low frequency variant confounding is much more prevalent than one might expect, even within continental groups. The proportion of significant differences in low frequency variant burden was also dependent on the region of interest; for example, annotated regulatory regions showed fewer low frequency burden differences between populations than intergenic regions. Knowledge of population structure and the genomic landscape in a region of interest are important factors in determining the extent of confounding due to population stratification in a low frequency genomic analysis.
Collapse
Affiliation(s)
- Carrie B. Moore
- Center for Human Genetic Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - John R. Wallace
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - Daniel J. Wolfe
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - Alex T. Frase
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - Sarah A. Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - Kenneth M. Weiss
- Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Marylyn D. Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
49
|
Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing-derived variants for clinical diagnosis. CURRENT PROTOCOLS IN HUMAN GENETICS 2013; 79:9.24.1-9.24.24. [PMID: 24510652 DOI: 10.1002/0471142905.hg0924s79] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Over the last several years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test under particular circumstances in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing but rather the analysis and interpretation. Interpretation of genetic findings in a clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires the development of novel or repositioned analysis tools, methodologies, and processes. This unit provides an overview of these items. Specific challenges related to implementation in a clinical setting are discussed.
Collapse
Affiliation(s)
- Elizabeth A Worthey
- Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin.,The Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin.,Department of Computer Science, University of Wisconsin, Milwaukee, Wisconsin
| |
Collapse
|
50
|
Bromberg Y. Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 2013; 425:3993-4005. [PMID: 23928561 DOI: 10.1016/j.jmb.2013.07.038] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 07/26/2013] [Accepted: 07/28/2013] [Indexed: 12/24/2022]
Abstract
Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, transcriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice.
Collapse
Affiliation(s)
- Y Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08873, USA.
| |
Collapse
|