1
|
Boehler NA, Seheult SDI, Wahid M, Hase K, D'Amico SF, Saini S, Mascarenhas B, Bergman ME, Phillips MA, Faure PA, Cheng HYM. A novel copy number variant in the murine Cdh23 gene gives rise to profound deafness and vestibular dysfunction. Hum Mol Genet 2024; 33:1648-1659. [PMID: 38981620 DOI: 10.1093/hmg/ddae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/10/2024] [Accepted: 05/30/2024] [Indexed: 07/11/2024] Open
Abstract
Hearing loss is the most common congenital sensory deficit worldwide and exhibits high genetic heterogeneity, making molecular diagnoses elusive for most individuals. Detecting novel mutations that contribute to hearing loss is crucial to providing accurate personalized diagnoses, tailored interventions, and improving prognosis. Copy number variants (CNVs) are structural mutations that are understudied, potential contributors to hearing loss. Here, we present the Abnormal Wobbly Gait (AWG) mouse, the first documented mutant exhibiting waltzer-like locomotor dysfunction, hyperactivity, circling behaviour, and profound deafness caused by a spontaneous CNV deletion in cadherin 23 (Cdh23). We were unable to identify the causative mutation through a conventional whole-genome sequencing (WGS) and variant detection pipeline, but instead found a linked variant in hexokinase 1 (Hk1) that was insufficient to recapitulate the AWG phenotype when introduced into C57BL/6J mice using CRISPR-Cas9. Investigating nearby deafness-associated genes revealed a pronounced downregulation of Cdh23 mRNA and a complete absence of full-length CDH23 protein, which is critical for the development and maintenance of inner ear hair cells, in whole head extracts from AWG neonates. Manual inspection of WGS read depth plots of the Cdh23 locus revealed a putative 10.4 kb genomic deletion of exons 11 and 12 that was validated by PCR and Sanger sequencing. This study underscores the imperative to refine variant detection strategies to permit identification of pathogenic CNVs easily missed by conventional variant calling to enhance diagnostic precision and ultimately improve clinical outcomes for individuals with genetically heterogenous disorders such as hearing loss.
Collapse
Affiliation(s)
- Nicholas A Boehler
- Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON M5S 3G5, Canada
| | - Shane D I Seheult
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada
| | - Muhammad Wahid
- Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON M5S 3G5, Canada
| | - Kazuma Hase
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada
| | - Sierra F D'Amico
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada
| | - Shakshi Saini
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada
| | - Brittany Mascarenhas
- Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON M5S 3G5, Canada
| | - Matthew E Bergman
- Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON M5S 3G5, Canada
| | - Michael A Phillips
- Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON M5S 3G5, Canada
| | - Paul A Faure
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada
| | - Hai-Ying Mary Cheng
- Department of Biology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON M5S 3G5, Canada
| |
Collapse
|
2
|
Schulz T, Medvedev P. ESKEMAP: exact sketch-based read mapping. Algorithms Mol Biol 2024; 19:19. [PMID: 38704605 PMCID: PMC11069465 DOI: 10.1186/s13015-024-00261-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/19/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a "similar sequence". Traditionally, "similar sequence" was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold. RESULTS In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in O ( | t | + | p | + ℓ 2 ) time and O ( ℓ log ℓ ) space, where |t| is the number of k -mers inside the sketch of the reference, |p| is the number of k -mers inside the read's sketch and ℓ is the number of times that k -mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm's performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.
Collapse
Affiliation(s)
- Tizian Schulz
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.
- Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany.
- Graduate School "Digital Infrastructure for the Life Sciences" (DILS), Bielefeld University, Bielefeld, Germany.
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, USA.
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, USA.
| |
Collapse
|
3
|
Hu H, Scheben A, Wang J, Li F, Li C, Edwards D, Zhao J. Unravelling inversions: Technological advances, challenges, and potential impact on crop breeding. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:544-554. [PMID: 37961986 PMCID: PMC10893937 DOI: 10.1111/pbi.14224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/11/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
Inversions, a type of chromosomal structural variation, significantly influence plant adaptation and gene functions by impacting gene expression and recombination rates. However, compared with other structural variations, their roles in functional biology and crop improvement remain largely unexplored. In this review, we highlight technological and methodological advancements that have allowed a comprehensive understanding of inversion variants through the pangenome framework and machine learning algorithms. Genome editing is an efficient method for inducing or reversing inversion mutations in plants, providing an effective mechanism to modify local recombination rates. Given the potential of inversions in crop breeding, we anticipate increasing attention on inversions from the scientific community in future research and breeding applications.
Collapse
Affiliation(s)
- Haifei Hu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co‐construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering LaboratoryGuangzhouChina
| | - Armin Scheben
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborNew YorkUSA
| | - Jian Wang
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co‐construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering LaboratoryGuangzhouChina
| | - Fangping Li
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, State Key Laboratory for Conservation and Utilization of Subtropical Agro‐BioresourcesSouth China Agricultural UniversityGuangzhouChina
| | - Chengdao Li
- Western Crop Genetics Alliance, Centre for Crop & Food Innovation, Food Futures Institute, College of Science, Health, Engineering and EducationMurdoch UniversityMurdochWestern AustraliaAustralia
| | - David Edwards
- School of Biological SciencesUniversity of Western AustraliaPerthWestern AustraliaAustralia
- Australia & Centre for Applied BioinformaticsUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - Junliang Zhao
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co‐construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering LaboratoryGuangzhouChina
| |
Collapse
|
4
|
Jang MA. Genomic technologies for detecting structural variations in hematologic malignancies. Blood Res 2024; 59:1. [PMID: 38485792 PMCID: PMC10903520 DOI: 10.1007/s44313-024-00001-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 12/18/2023] [Indexed: 03/18/2024] Open
Abstract
Genomic structural variations in myeloid, lymphoid, and plasma cell neoplasms can provide key diagnostic, prognostic, and therapeutic information while elucidating the underlying disease biology. Several molecular diagnostic approaches play a central role in evaluating hematological malignancies. Traditional cytogenetic diagnostic assays, such as chromosome banding and fluorescence in situ hybridization, are essential components of the current diagnostic workup that guide clinical care for most hematologic malignancies. However, each assay has inherent limitations, including limited resolution for detecting small structural variations and low coverage, and can only detect alterations in the target regions. Recently, the rapid expansion and increasing availability of novel and comprehensive genomic technologies have led to their use in clinical laboratories for clinical management and translational research. This review aims to describe the clinical relevance of structural variations in hematologic malignancies and introduce genomic technologies that may facilitate personalized tumor characterization and treatment.
Collapse
Affiliation(s)
- Mi-Ae Jang
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-Ro, Gangnam-Gu, Seoul, 06351, Korea.
| |
Collapse
|
5
|
Dyrbekk APH, Warsame AA, Suhrke P, Ludahl MO, Zecic N, Moe JO, Lund-Iversen M, Brustugun OT. Evaluation of NTRK expression and fusions in a large cohort of early-stage lung cancer. Clin Exp Med 2024; 24:10. [PMID: 38240952 PMCID: PMC10798916 DOI: 10.1007/s10238-023-01273-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 11/29/2023] [Indexed: 01/22/2024]
Abstract
Tropomyosin receptor kinases (TRK) are attractive targets for cancer therapy. As TRK-inhibitors are approved for all solid cancers with detectable fusions involving the Neurotrophic tyrosine receptor kinase (NTRK)-genes, there has been an increased interest in optimizing testing regimes. In this project, we wanted to find the prevalence of NTRK fusions in a cohort of various histopathological types of early-stage lung cancer in Norway and to investigate the association between TRK protein expression and specific histopathological types, including their molecular and epidemiological characteristics. We used immunohistochemistry (IHC) as a screening tool for TRK expression, and next-generation sequencing (NGS) and fluorescence in situ hybridization (FISH) as confirmatory tests for underlying NTRK-fusion. Among 940 cases, 43 (4.6%) had positive TRK IHC, but in none of these could a NTRK fusion be confirmed by NGS or FISH. IHC-positive cases showed various staining intensities and patterns including cytoplasmatic or nuclear staining. IHC-positivity was more common in squamous cell carcinoma (LUSC) (10.3%) and adenoid cystic carcinoma (40.0%), where the majority showed heterogeneous staining intensity. In comparison, only 1.1% of the adenocarcinomas were positive. IHC-positivity was also more common in men, but this association could be explained by the dominance of LUSC in TRK IHC-positive cases. Protein expression was not associated with differences in time to relapse or overall survival. Our study indicates that NTRK fusion is rare in early-stage lung cancer. Due to the high level of false positive cases with IHC, Pan-TRK IHC is less suited as a screening tool for NTRK-fusions in LUSC and adenoid cystic carcinoma.
Collapse
Affiliation(s)
- Anne Pernille Harlem Dyrbekk
- University of Oslo, NO-0316, Oslo, Norway.
- Department of Pathology, Vestfold Hospital Trust, NO-3103, Tønsberg, Norway.
- Department of Cancer Genetics, Institute for Cancer Research, The Norwegian Radium Hospital, NO-0310, Oslo, Norway.
| | - Abdirashid Ali Warsame
- Department of Pathology, Oslo University Hospital, The Norwegian Radium Hospital, NO-0310, Oslo, Norway
| | - Pål Suhrke
- Department of Pathology, Vestfold Hospital Trust, NO-3103, Tønsberg, Norway
| | - Marianne Odnakk Ludahl
- Department of Microbiology/Division for Gene-Technology, Vestfold Hospital Trust, NO-3103, Tønsberg, Norway
| | - Nermin Zecic
- Department of Microbiology/Division for Gene-Technology, Vestfold Hospital Trust, NO-3103, Tønsberg, Norway
| | - Joakim Oliu Moe
- Department of Internal Medicine, Vestfold Hospital Trust, NO-3103, Tønsberg, Norway
| | - Marius Lund-Iversen
- Department of Pathology, Oslo University Hospital, The Norwegian Radium Hospital, NO-0310, Oslo, Norway
| | - Odd Terje Brustugun
- University of Oslo, NO-0316, Oslo, Norway
- Department of Cancer Genetics, Institute for Cancer Research, The Norwegian Radium Hospital, NO-0310, Oslo, Norway
- Department of Oncology, Vestre Viken Hospital Trust, NO-3004, Drammen, Norway
| |
Collapse
|
6
|
Oketch DJA, Giulietti M, Piva F. Copy Number Variations in Pancreatic Cancer: From Biological Significance to Clinical Utility. Int J Mol Sci 2023; 25:391. [PMID: 38203561 PMCID: PMC10779192 DOI: 10.3390/ijms25010391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 12/20/2023] [Accepted: 12/24/2023] [Indexed: 01/12/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is the most common type of pancreatic cancer, characterized by high tumor heterogeneity and a poor prognosis. Inter- and intra-tumoral heterogeneity in PDAC is a major obstacle to effective PDAC treatment; therefore, it is highly desirable to explore the tumor heterogeneity and underlying mechanisms for the improvement of PDAC prognosis. Gene copy number variations (CNVs) are increasingly recognized as a common and heritable source of inter-individual variation in genomic sequence. In this review, we outline the origin, main characteristics, and pathological aspects of CNVs. We then describe the occurrence of CNVs in PDAC, including those that have been clearly shown to have a pathogenic role, and further highlight some key examples of their involvement in tumor development and progression. The ability to efficiently identify and analyze CNVs in tumor samples is important to support translational research and foster precision oncology, as copy number variants can be utilized to guide clinical decisions. We provide insights into understanding the CNV landscapes and the role of both somatic and germline CNVs in PDAC, which could lead to significant advances in diagnosis, prognosis, and treatment. Although there has been significant progress in this field, understanding the full contribution of CNVs to the genetic basis of PDAC will require further research, with more accurate CNV assays such as single-cell techniques and larger cohorts than have been performed to date.
Collapse
Affiliation(s)
| | - Matteo Giulietti
- Department of Specialistic Clinical and Odontostomatological Sciences, Polytechnic University of Marche, 60131 Ancona, Italy
| | - Francesco Piva
- Department of Specialistic Clinical and Odontostomatological Sciences, Polytechnic University of Marche, 60131 Ancona, Italy
| |
Collapse
|
7
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
8
|
Medvedev P. Theoretical Analysis of Sequencing Bioinformatics Algorithms and Beyond. COMMUNICATIONS OF THE ACM 2023; 66:118-125. [PMID: 38736702 PMCID: PMC11087067 DOI: 10.1145/3571723] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
A case study reveals the theoretical analysis of algorithms is not always as helpful as standard dogma might suggest.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology and the Director of the Center for Computational Biology and Bioinformatics at Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
9
|
Ding Y, Liao Y, He J, Ma J, Wei X, Liu X, Zhang G, Wang J. Enhancing genomic mutation data storage optimization based on the compression of asymmetry of sparsity. Front Genet 2023; 14:1213907. [PMID: 37323665 PMCID: PMC10267386 DOI: 10.3389/fgene.2023.1213907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 05/24/2023] [Indexed: 06/17/2023] Open
Abstract
Background: With the rapid development of high-throughput sequencing technology and the explosive growth of genomic data, storing, transmitting and processing massive amounts of data has become a new challenge. How to achieve fast lossless compression and decompression according to the characteristics of the data to speed up data transmission and processing requires research on relevant compression algorithms. Methods: In this paper, a compression algorithm for sparse asymmetric gene mutations (CA_SAGM) based on the characteristics of sparse genomic mutation data was proposed. The data was first sorted on a row-first basis so that neighboring non-zero elements were as close as possible to each other. The data were then renumbered using the reverse Cuthill-Mckee sorting technique. Finally the data were compressed into sparse row format (CSR) and stored. We had analyzed and compared the results of the CA_SAGM, coordinate format (COO) and compressed sparse column format (CSC) algorithms for sparse asymmetric genomic data. Nine types of single-nucleotide variation (SNV) data and six types of copy number variation (CNV) data from the TCGA database were used as the subjects of this study. Compression and decompression time, compression and decompression rate, compression memory and compression ratio were used as evaluation metrics. The correlation between each metric and the basic characteristics of the original data was further investigated. Results: The experimental results showed that the COO method had the shortest compression time, the fastest compression rate and the largest compression ratio, and had the best compression performance. CSC compression performance was the worst, and CA_SAGM compression performance was between the two. When decompressing the data, CA_SAGM performed the best, with the shortest decompression time and the fastest decompression rate. COO decompression performance was the worst. With increasing sparsity, the COO, CSC and CA_SAGM algorithms all exhibited longer compression and decompression times, lower compression and decompression rates, larger compression memory and lower compression ratios. When the sparsity was large, the compression memory and compression ratio of the three algorithms showed no difference characteristics, but the rest of the indexes were still different. Conclusion: CA_SAGM was an efficient compression algorithm that combines compression and decompression performance for sparse genomic mutation data.
Collapse
Affiliation(s)
- Youde Ding
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Yuan Liao
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
| | - Ji He
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Jianfeng Ma
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Xu Wei
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Xuemei Liu
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Guiying Zhang
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Jing Wang
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
10
|
Samelak-Czajka A, Wojciechowski P, Marszalek-Zenczak M, Figlerowicz M, Zmienko A. Differences in the intraspecies copy number variation of Arabidopsis thaliana conserved and nonconserved miRNA genes. Funct Integr Genomics 2023; 23:120. [PMID: 37036577 PMCID: PMC10085913 DOI: 10.1007/s10142-023-01043-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 03/23/2023] [Accepted: 03/25/2023] [Indexed: 04/11/2023]
Abstract
MicroRNAs (miRNAs) regulate gene expression by RNA interference mechanism. In plants, miRNA genes (MIRs) which are grouped into conserved families, i.e. they are present among the different plant taxa, are involved in the regulation of many developmental and physiological processes. The roles of the nonconserved MIRs-which are MIRs restricted to one plant family, genus, or even species-are less recognized; however, many of them participate in the responses to biotic and abiotic stresses. Both over- and underproduction of miRNAs may influence various biological processes. Consequently, maintaining intracellular miRNA homeostasis seems to be crucial for the organism. Deletions and duplications in the genomic sequence may alter gene dosage and/or activity. We evaluated the extent of copy number variations (CNVs) among Arabidopsis thaliana (Arabidopsis) MIRs in over 1000 natural accessions, using population-based analysis of the short-read sequencing data. We showed that the conserved MIRs were unlikely to display CNVs and their deletions were extremely rare, whereas nonconserved MIRs presented moderate variation. Transposon-derived MIRs displayed exceptionally high diversity. Conversely, MIRs involved in the epigenetic control of transposons reactivated during development were mostly invariable. MIR overlap with the protein-coding genes also limited their variability. At the expression level, a higher rate of nonvariable, nonconserved miRNAs was detectable in Col-0 leaves, inflorescence, and siliques compared to nonconserved variable miRNAs, although the expression of both groups was much lower than that of the conserved MIRs. Our data indicate that CNV rate of Arabidopsis MIRs is related with their age, function, and genomic localization.
Collapse
Affiliation(s)
- Anna Samelak-Czajka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland
| | - Pawel Wojciechowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland
- Institute of Computing Science, Faculty of Computing and Telecommunications, Poznan University of Technology, 60-965, Poznan, Poland
| | | | - Marek Figlerowicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.
| | - Agnieszka Zmienko
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.
| |
Collapse
|
11
|
Denti L, Khorsand P, Bonizzoni P, Hormozdiari F, Chikhi R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat Methods 2023; 20:550-558. [PMID: 36550274 DOI: 10.1038/s41592-022-01674-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022]
Abstract
Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.
Collapse
Affiliation(s)
- Luca Denti
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA, USA.
- UC Davis MIND Institute, Sacramento, CA, USA.
- Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA, USA.
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
| |
Collapse
|
12
|
Molecular Diagnosis of Hypertrophic Cardiomyopathy (HCM): In the Heart of Cardiac Disease. J Clin Med 2022; 12:jcm12010225. [PMID: 36615026 PMCID: PMC9821215 DOI: 10.3390/jcm12010225] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/21/2022] [Accepted: 12/23/2022] [Indexed: 12/29/2022] Open
Abstract
Hypertrophic cardiomyopathy (HCM) is an inherited myocardial disease with the presence of left ventricular hypertrophy (LVH). The disease is characterized by high locus, allelic and phenotypic heterogeneity, even among members of the same family. The list of confirmed and potentially relevant genes implicating the disease is constantly increasing, with novel genes frequently reported. Heterozygous alterations in the five main sarcomeric genes (MYBPC3, MYH7, TNNT2, TNNI3, and MYL2) are estimated to account for more than half of confirmed cases. The genetic discoveries of recent years have shed more light on the molecular pathogenic mechanisms of HCM, contributing to substantial advances in the diagnosis of the disease. Genetic testing applying next-generation sequencing (NGS) technologies and early diagnosis prior to the clinical manifestation of the disease among family members demonstrate an important improvement in the field.
Collapse
|
13
|
Craven KE, Fischer CG, Jiang L, Pallavajjala A, Lin MT, Eshleman JR. Optimizing Insertion and Deletion Detection Using Next-Generation Sequencing in the Clinical Laboratory. J Mol Diagn 2022; 24:1217-1231. [PMID: 36162758 PMCID: PMC9808503 DOI: 10.1016/j.jmoldx.2022.08.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 07/18/2022] [Accepted: 08/31/2022] [Indexed: 01/13/2023] Open
Abstract
Detection of insertions and deletions (InDels) by short-read next-generation sequencing (NGS) technology can be challenging because of frequent misaligned reads. A systematic analysis of short InDels (1 to 30 bases) and fms-related receptor tyrosine kinase 3 (FLT3) internal tandem duplications (ITDs; 6 to 183 bases) from 46 clinical cases of solid or hematologic malignancy processed with a clinical NGS assay identified misaligned reads in every case, ranging from 3% to 100% of reads with the InDel showing mismapped bases. Mismaps also increased with InDel size. As a consequence, the clinical NGS bioinformatics pipeline undercalled the variant allele frequency by 1% to 84%, incorrectly called simultaneous single-base substitutions along with InDels, or did not report an FLT3 ITD that had been detected by capillary electrophoresis. To improve the ability of the pipeline to better detect and quantify InDels, we utilized a software program called Assembly-Based ReAligner (ABRA2) to more accurately remap reads. ABRA2 was able to correct 41% to 100% of the reads with mismapped bases and led to absolute increases in the variant allele frequency from 1% to 61% along with correction of all of the single-base substitutions except for two cases. ABRA2 could also detect multiple FLT3 ITD clones except for one 183-base ITD. Our analysis has found that ABRA2 performs well on short InDels as well as FLT3 ITDs that are <100 bases.
Collapse
Affiliation(s)
- Kelly E Craven
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Catherine G Fischer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Division of Cancer Prevention, National Cancer Institute, Rockville, Maryland
| | - LiQun Jiang
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Aparna Pallavajjala
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Ming-Tseh Lin
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - James R Eshleman
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|
14
|
Yin ZZ, Yao J, Wei FX, Chen CY, Yan HM, Zhang M. Targeted Next-Generation Sequencing Reveals a Large Novel β-Thalassemia Deletion that Removes the Entire HBB Gene. Hemoglobin 2022; 46:290-295. [PMID: 36412578 DOI: 10.1080/03630269.2022.2145964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
β-Thalassemia (β-thal) is one of the most common monogenic recessive inherited diseases worldwide. The mutation spectrum of β-thal has been increasingly broadened by various genetic testing methods. The discovery and identification of novel and rare pathogenic thalassemia variants enable better disease prevention, especially in high prevalence regions. In this study, a Chinese thalassemia family with an unclear etiology was recruited to the Thalassemia Screening Program. Blood samples collected from them were primarily screened by hematology analysis and clinical routine genetic screening. Subsequently, targeted next-generation sequencing (NGS) and Sanger sequencing were performed to find and identify a novel deletion variant. The deletion, discovered by targeted NGS, was validated through real-time quantitative polymerase chain reaction (qPCR). First, a large novel β-thal deletion (3488 bp) related to a high Hb F level, NC_000011.9: g.5245533_5249020del (Chongqing deletion) (GRCh37/hg19), was found and identified in the proband and her mother. The deletion removed the entire β-globin gene and led to absent β-globin (β0). We then validated this large novel deletion in the proband and her mother by qPCR. We first discovered and identified a large novel β-thal deletion related to elevated Hb F level, it helps broaden the spectrum of pathogenic mutants that may cause β-thal intermedia (β-TI) or β-thal major (β-TM), paving the way for effective thalassemia screening. Next-generation sequencing has the potential of finding rare and novel thalassemia mutants.
Collapse
Affiliation(s)
- Zhen-Zhen Yin
- Nanfang College, Guangzhou, People's Republic of China
| | - Jian Yao
- Nanfang College, Guangzhou, People's Republic of China
| | - Feng-Xiang Wei
- The Genetics Laboratory, Shenzhen Longgang District Maternity and Child Healthcare Hospital, Shenzhen, People's Republic of China
| | - Chu-Yan Chen
- Nanfang College, Guangzhou, People's Republic of China
| | - Hong-Mei Yan
- Guangzhou Development District Hospital, Guangzhou, People's Republic of China
| | - Ming Zhang
- Nanfang College, Guangzhou, People's Republic of China
| |
Collapse
|
15
|
Xie H, Yin H, Ye X, Liu Y, Liu N, Zhang Y, Chen X, Chen X. Detection of Small CYP11B1 Deletions and One Founder Chimeric CYP11B2/CYP11B1 Gene in 11β-Hydroxylase Deficiency. Front Endocrinol (Lausanne) 2022; 13:882863. [PMID: 35685215 PMCID: PMC9171383 DOI: 10.3389/fendo.2022.882863] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/22/2022] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE 11β-Hydroxylase deficiency (11β-OHD) caused by mutations in the CYP11B1 gene is the second most common form of congenital adrenal hyperplasia. Both point mutations and genomic rearrangements of CYP11B1 are important causes of 11β-OHD. However, the high degree of sequence identity between CYP11B1 and its homologous gene CYP11B2, presents unique challenges for molecular diagnosis of suspected 11β-OHD. The aim of this study was to detect the point mutation, indel, small deletion of CYP11B1 and chimeric CYP11B2/CYP11B1 gene in a one-tube test, improving the genetic diagnosis of 11β-OHD. METHODS Optimized custom-designed target sequencing strategy was performed in three patients with suspected 11β-OHD, in which both the coverage depth of paired-end reads and the breakpoint information of split reads from sequencing data were analysed in order to detect genomic rearrangements covering CYP11B1. Long-range PCR was peformed to validate the speculated CYP11B1 rearrangements with the breakpoint-specifc primers. RESULTS Using the optimized target sequencing approach, we detected two intragenic/intergenic deletions of CYP11B1 and one chimeric CYP11B2/CYP11B1 gene from three suspected patients with 11β-OHD besides three pathogenic heterozygous point mutation/indels. Furthermore, we mapped the precise breakpoint of this chimeric CYP11B2/CYP11B1 gene located on chr8:143994517 (hg19) and confirmed it as a founder rearrangement event in the Chinese population. CONCLUSIONS Our optimized target sequencing approach improved the genetic diagnosis of 11β-OHD.
Collapse
Affiliation(s)
- Hua Xie
- Department of Medical Genetics, Capital Institute of Pediatrics, Beijing, China
| | - Hui Yin
- Department of Endocrinology, Affiliated Children’s Hospital of Capital Institute of Pediatrics, Beijing, China
| | - Xue Ye
- Department of Endocrinology, Affiliated Children’s Hospital of Capital Institute of Pediatrics, Beijing, China
| | - Ying Liu
- Department of Endocrinology, Affiliated Children’s Hospital of Capital Institute of Pediatrics, Beijing, China
| | - Na Liu
- Bioinformation Department, Beijing Mygenostics Co., Ltd, Beijing, China
| | - Yu Zhang
- Department of Laboratory Center, Capital Institute of Pediatrics, Beijing, China
| | - Xiaoli Chen
- Department of Medical Genetics, Capital Institute of Pediatrics, Beijing, China
- *Correspondence: Xiaobo Chen, ; Xiaoli Chen,
| | - Xiaobo Chen
- Department of Endocrinology, Affiliated Children’s Hospital of Capital Institute of Pediatrics, Beijing, China
- *Correspondence: Xiaobo Chen, ; Xiaoli Chen,
| |
Collapse
|
16
|
Singh AK, Olsen MF, Lavik LAS, Vold T, Drabløs F, Sjursen W. Detecting copy number variation in next generation sequencing data from diagnostic gene panels. BMC Med Genomics 2021; 14:214. [PMID: 34465341 PMCID: PMC8406611 DOI: 10.1186/s12920-021-01059-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/16/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Detection of copy number variation (CNV) in genes associated with disease is important in genetic diagnostics, and next generation sequencing (NGS) technology provides data that can be used for CNV detection. However, CNV detection based on NGS data is in general not often used in diagnostic labs as the data analysis is challenging, especially with data from targeted gene panels. Wet lab methods like MLPA (MRC Holland) are widely used, but are expensive, time consuming and have gene-specific limitations. Our aim has been to develop a bioinformatic tool for CNV detection from NGS data in medical genetic diagnostic samples. RESULTS Our computational pipeline for detection of CNVs in NGS data from targeted gene panels utilizes coverage depth of the captured regions and calculates a copy number ratio score for each region. This is computed by comparing the mean coverage of the sample with the mean coverage of the same region in other samples, defined as a pool. The pipeline selects pools for comparison dynamically from previously sequenced samples, using the pool with an average coverage depth that is nearest to the one of the samples. A sliding window-based approach is used to analyze each region, where length of sliding window and sliding distance can be chosen dynamically to increase or decrease the resolution. This helps in detecting CNVs in small or partial exons. With this pipeline we have correctly identified the CNVs in 36 positive control samples, with sensitivity of 100% and specificity of 91%. We have detected whole gene level deletion/duplication, single/multi exonic level deletion/duplication, partial exonic deletion and mosaic deletion. Since its implementation in mid-2018 it has proven its diagnostic value with more than 45 CNV findings in routine tests. CONCLUSIONS With this pipeline as part of our diagnostic practices it is now possible to detect partial, single or multi-exonic, and intragenic CNVs in all genes in our target panel. This has helped our diagnostic lab to expand the portfolio of genes where we offer CNV detection, which previously was limited by the availability of MLPA kits.
Collapse
Affiliation(s)
- Ashish Kumar Singh
- Department of Medical Genetics, St. Olavs Hospital, Trondheim, Norway.
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU - Norwegian University of Science and Technology, Trondheim, Norway.
| | | | | | - Trine Vold
- Department of Medical Genetics, St. Olavs Hospital, Trondheim, Norway
| | - Finn Drabløs
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - Wenche Sjursen
- Department of Medical Genetics, St. Olavs Hospital, Trondheim, Norway
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
17
|
Cumer T, Boyer F, Pompanon F. Genome-Wide Detection of Structural Variations Reveals New Regions Associated with Domestication in Small Ruminants. Genome Biol Evol 2021; 13:evab165. [PMID: 34264322 PMCID: PMC8350358 DOI: 10.1093/gbe/evab165] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2021] [Indexed: 11/28/2022] Open
Abstract
During domestication processes, changes in selective pressures induce multiple phenotypical, physiological, and behavioral changes in target species. The rise of next-generation sequencing has provided a chance to study the genetics bases of these changes, most of the time based on single nucleotide polymorphisms (SNPs). However, several studies have highlighted the impact of structural variations (SVs) on individual fitness, particularly in domestic species. We aimed at unraveling the role of SVs during the domestication and later improvement of small ruminants by analyzing whole-genome sequences of 40 domestic sheep and 11 of their close wild relatives (Ovis orientalis), and 40 goats and 18 of their close wild relatives (Capra aegagrus). Using a combination of detection tools, we called 45,796 SVs in Ovis and 15,047 SVs in Capra genomes, including insertions, deletions, inversions, copy number variations, and chromosomal translocations. Most of these SVs were previously unreported in small ruminants. 69 and 45 SVs in sheep and goats, respectively, were in genomic regions with neighboring SNPs highly differentiated between wilds and domestics (i.e., putatively related to domestication). Among them, 25 and 20 SVs were close to or overlapping with genes related to physiological and morpho-anatomical traits linked with productivity (e.g., size, meat or milk quality, wool color), reproduction, or immunity. Finally, several of the SVs differentiated between wilds and domestics would not have been detected by screening only the differentiation of SNPs surrounding them, highlighting the complementarity of SVs and SNPs based approaches to detect signatures of selection.
Collapse
Affiliation(s)
- Tristan Cumer
- Université Grenoble Alpes, Université Savoie Mont-Blanc, CNRS, LECA, Grenoble, France
| | - Frédéric Boyer
- Université Grenoble Alpes, Université Savoie Mont-Blanc, CNRS, LECA, Grenoble, France
| | - François Pompanon
- Université Grenoble Alpes, Université Savoie Mont-Blanc, CNRS, LECA, Grenoble, France
| |
Collapse
|
18
|
Kan Y, Jiang L, Tang J, Guo Y, Guo F. A systematic view of computational methods for identifying driver genes based on somatic mutation data. Brief Funct Genomics 2021; 20:333-343. [PMID: 34312663 DOI: 10.1093/bfgp/elab032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 06/16/2021] [Accepted: 06/22/2021] [Indexed: 11/13/2022] Open
Abstract
Abnormal changes of driver genes are serious for human health and biomedical research. Identifying driver genes, exactly from enormous genes with mutations, promotes accurate diagnosis and treatment of cancer. A lot of works about uncovering driver genes have been developed over the past decades. By analyzing previous works, we find that computational methods are more efficient than traditional biological experiments when distinguishing driver genes from massive data. In this study, we summarize eight common computational algorithms only using somatic mutation data. We first group these methods into three categories according to mutation features they apply. Then, we conclude a general process of nominating candidate cancer driver genes. Finally, we evaluate three representative methods on 10 kinds of cancer derived from The Cancer Genome Atlas Program and five Chinese projects from the International Cancer Genome Consortium. In addition, we compare results of methods with various parameters. Evaluation is performed from four perspectives, including CGC, OG/TSG, Q-value and QQQuantile-Quantileplot. To sum up, we present algorithms using somatic mutation data in order to offer a systematic view of various mutation features and lay the foundation of methods based on integration of mutation information and other types of data.
Collapse
Affiliation(s)
- Yingxin Kan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
19
|
Upadhyay M, Derks MFL, Andersson G, Medugorac I, Groenen MAM, Crooijmans RPMA. Introgression contributes to distribution of structural variations in cattle. Genomics 2021; 113:3092-3102. [PMID: 34242710 DOI: 10.1016/j.ygeno.2021.07.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 06/24/2021] [Accepted: 07/03/2021] [Indexed: 11/19/2022]
Abstract
Structural variations (SVs) are an important source of phenotypic diversity in cattle. Here, 72 whole genome sequences representing taurine and zebu cattle were used to identify SVs. Applying multiple approaches, 16,738 SVs were identified. A comparison against the Database of Genomic Variants archives revealed that 1575 SVs were novel in our data. A novel duplication covering the entire GALNT15 gene, was observed only in N'Dama. A duplication, which was previously reported only in zebu and associated with navel length, was also observed in N'Dama. Investigation of a novel deletion located upstream of CAST13 gene and identified only in Italian cattle and zebu, revealed its introgressed origin in the former. Overall, our data highlights how the SVs distribution in cattle is also shaped by forces such as demographical differences and gene flow. The cattle SVs of this study and its meta-data can be visualized on an interactive genome browser at https://tinyurl.com/svCowArs.
Collapse
Affiliation(s)
- Maulik Upadhyay
- Animal Breeding and Genomics, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands; Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 75007 Uppsala, Sweden; Population Genomics Group, Department of Veterinary Sciences, Ludwig-Maximilians-University Munich, 80539 Munich, Germany.
| | - Martijn F L Derks
- Animal Breeding and Genomics, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Göran Andersson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 75007 Uppsala, Sweden.
| | - Ivica Medugorac
- Population Genomics Group, Department of Veterinary Sciences, Ludwig-Maximilians-University Munich, 80539 Munich, Germany.
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Richard P M A Crooijmans
- Animal Breeding and Genomics, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| |
Collapse
|
20
|
Vijg J. From DNA damage to mutations: All roads lead to aging. Ageing Res Rev 2021; 68:101316. [PMID: 33711511 PMCID: PMC10018438 DOI: 10.1016/j.arr.2021.101316] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/26/2021] [Accepted: 03/03/2021] [Indexed: 12/20/2022]
Abstract
Damage to the repository of genetic information in cells has plagued life since its very beginning 3-4 billion years ago. Initially, in the absence of an ozone layer, especially damage from solar UV radiation must have been frequent, with other sources, most notably endogenous sources related to cell metabolism, gaining in importance over time. To cope with this high frequency of damage to the increasingly long DNA molecules that came to encode the growing complexity of cellular functions in cells, DNA repair evolved as one of the earliest genetic traits. Then as now, errors during the repair of DNA damage generated mutations, which provide the substrate for evolution by natural selection. With the emergence of multicellular organisms also the soma became a target of DNA damage and mutations. In somatic cells selection against the adverse effects of DNA damage is greatly diminished, especially in postmitotic cells after the age of first reproduction. Based on an abundance of evidence, DNA damage is now considered as the single most important driver of the degenerative processes that collectively cause aging. Here I will first briefly review the evidence for DNA damage as a cause of aging since the beginning of life. Then, after discussing the possible direct adverse effects of DNA damage and its cellular responses, I will provide an overview of the considerable progress that has recently been made in analyzing a major consequence of DNA damage in humans and other complex organisms: somatic mutations and the resulting genome mosaicism. Recent advances in studying somatic mutagenesis and genome mosaicism in different human and animal tissues will be discussed with a focus on the possible mechanisms through which loss of DNA sequence integrity could cause age-related functional decline and disease.
Collapse
Affiliation(s)
- Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA; Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University, School of Medicine, Shanghai, China.
| |
Collapse
|
21
|
Liu G, Zhang J. A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 12:699510. [PMID: 34262604 PMCID: PMC8273656 DOI: 10.3389/fgene.2021.699510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022] Open
Abstract
The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey's fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.
Collapse
Affiliation(s)
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
22
|
Khorsand P, Hormozdiari F. Nebula: ultra-efficient mapping-free structural variant genotyper. Nucleic Acids Res 2021; 49:e47. [PMID: 33503255 PMCID: PMC8096284 DOI: 10.1093/nar/gkab025] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/03/2021] [Accepted: 01/11/2021] [Indexed: 11/24/2022] Open
Abstract
Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.
Collapse
Affiliation(s)
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, California, 95616, USA.,UC Davis MIND Institute, Sacramento, California, 95817, USA.,Department of Biochemistry and Molecular Medicine, UC Davis, Sacramento, California, 95817, USA
| |
Collapse
|
23
|
Khorsand P, Denti L, Bonizzoni P, Chikhi R, Hormozdiari F. Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES 2021; 1:vbab005. [PMID: 36700094 PMCID: PMC9710709 DOI: 10.1093/bioadv/vbab005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Luca Denti
- Department of Computational Biology, Institut Pasteur, Paris 75015, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, 20126, Italy,To whom correspondence should be addressed. or or
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France,To whom correspondence should be addressed. or or
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA 95616, USA,UC Davis MIND Institute, Sacramento, CA 95817, USA,Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA 95817, USA,To whom correspondence should be addressed. or or
| |
Collapse
|
24
|
Abstract
Non-Hodgkin lymphoma encompasses a diverse group of B-cell and T-cell neoplasms. Current classification is based on clinical information, histologic assessment, immunophenotypic characteristics, and molecular alterations. A wide range of genetic alterations, including large chromosomal structural rearrangements, aneuploidies, point mutations, and copy number alterations, have been reported across all types of lymphomas. Many of these are now incorporated into the World Health Organization-defined criteria for the diagnostic evaluation of patients with lymphoid proliferations and, therefore, their accurate identification is paramount for diagnosis, subclassification, and selection of treatment. In addition to their value in the diagnostic setting, many alterations that are not routinely evaluated in standard clinical practice may still define specific disease entities as they have important implications in risk stratification, as well as roles in emerging alternate therapies and disease monitoring. Because of the complexity and range of alterations, their accurate and sensitive assessment requires a careful selection of technology. Here, we discuss the most commonly used molecular techniques in current clinical practice and highlight some of the benefits and pitfalls based on the type of alteration.
Collapse
|
25
|
Thomas GWC, Wang RJ, Nguyen J, Alan Harris R, Raveendran M, Rogers J, Hahn MW. Origins and Long-Term Patterns of Copy-Number Variation in Rhesus Macaques. Mol Biol Evol 2021; 38:1460-1471. [PMID: 33226085 PMCID: PMC8042740 DOI: 10.1093/molbev/msaa303] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Mutations play a key role in the development of disease in an individual and the evolution of traits within species. Recent work in humans and other primates has clarified the origins and patterns of single-nucleotide variants, showing that most arise in the father's germline during spermatogenesis. It remains unknown whether larger mutations, such as deletions and duplications of hundreds or thousands of nucleotides, follow similar patterns. Such mutations lead to copy-number variation (CNV) within and between species, and can have profound effects by deleting or duplicating genes. Here, we analyze patterns of CNV mutations in 32 rhesus macaque individuals from 14 parent-offspring trios. We find the rate of CNV mutations per generation is low (less than one per genome) and we observe no correlation between parental age and the number of CNVs that are passed on to offspring. We also examine segregating CNVs within the rhesus macaque sample and compare them to a similar data set from humans, finding that both species have far more segregating deletions than duplications. We contrast this with long-term patterns of gene copy-number evolution between 17 mammals, where the proportion of deletions that become fixed along the macaque lineage is much smaller than the proportion of segregating deletions. These results suggest purifying selection acting on deletions, such that the majority of them are removed from the population over time. Rhesus macaques are an important biomedical model organism, so these results will aid in our understanding of this species and the disease models it supports.
Collapse
Affiliation(s)
- Gregg W C Thomas
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Richard J Wang
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - Jelena Nguyen
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - R Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN, USA
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| |
Collapse
|
26
|
Jilani M, Haspel N. Computational Methods for Detecting Large-Scale Structural Rearrangements in Chromosomes. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
27
|
Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.
Collapse
|
28
|
Liu G, Zhang J, Yuan X, Wei C. RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data. Front Genet 2020; 11:569227. [PMID: 33329705 PMCID: PMC7673372 DOI: 10.3389/fgene.2020.569227] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/04/2020] [Indexed: 12/04/2022] Open
Abstract
Copy number variations (CNVs) are significant causes of many human cancers and genetic diseases. The detection of CNVs has become a common method by which to analyze human diseases using next-generation sequencing (NGS) data. However, effective detection of insignificant CNVs is still a challenging task. In this study, we propose a new detection method, RKDOSCNV, to meet the need. RKDOSCNV uses kernel density estimation method to evaluate the local kernel density distribution of each read depth segment (RDS) based on an expanded nearest neighbor (k-nearest neighbors, reverse nearest neighbors, and shared nearest neighbors of each RDS) data set, and assigns a relative kernel density outlier score (RKDOS) for each RDS. According to the RKDOS profile, RKDOSCNV predicts the candidate CNVs by choosing a reasonable threshold, which it uses split read approach to correct the boundaries of candidate CNVs. The performance of RKDOSCNV is assessed by comparing it with several current popular methods via experiments with simulated and real data at different tumor purity levels. The experimental results verify that the performance of RKDOSCNV is superior to that of several other methods. In summary, RKDOSCNV is a simple and effective method for the detection of CNVs from whole genome sequencing (WGS) data, especially for samples with low tumor purity.
Collapse
Affiliation(s)
- Guojun Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chao Wei
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
29
|
Yang L, Niu Q, Zhang T, Zhao G, Zhu B, Chen Y, Zhang L, Gao X, Gao H, Liu GE, Li J, Xu L. Genomic sequencing analysis reveals copy number variations and their associations with economically important traits in beef cattle. Genomics 2020; 113:812-820. [PMID: 33080318 DOI: 10.1016/j.ygeno.2020.10.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/21/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
Copy number variation (CNV) represents a major source of genetic variation, which may have potentially large effects, including alternating gene regulation and dosage, as well as contributing to gene expression and risk for normal phenotypic variability. We carried out a comprehensive analysis of CNV based on whole genome sequencing in Chinese Simmental beef cattle. Totally, we found 9313 deletion and 234 duplication events, covering 147.5 Mb autosomal regions. Within them, 257 deletion events of high frequency overlapped with 193 known RefGenes. Among these genes, we observed several genes were related to economically important traits, like residual feed intake, immune responding, pregnancy rate and muscle differentiation. Using a locus-based analysis, we identified 11 deletions and 1 duplication, which were significantly associated with three traits including carcass weight, tenderloin and longissimus muscle area. Our sequencing-based study provided important insights into investigating the association of CNVs with important traits in beef cattle.
Collapse
Affiliation(s)
- Liu Yang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
| | - Qunhao Niu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tianliu Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Guoyao Zhao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Zhu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yan Chen
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Lupei Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Xue Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Huijiang Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705, USA.
| | - Junya Li
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Lingyang Xu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| |
Collapse
|
30
|
Jun Shin S, Wu Y, Hao N. A backward procedure for change‐point detection with applications to copy number variation detection. CAN J STAT 2020. [DOI: 10.1002/cjs.11535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Seung Jun Shin
- Department of StatisticsKorea UniversitySeoul South Korea
| | - Yichao Wu
- Department of Mathematics, Statistics, and Computer ScienceThe University of Illinois at ChicagoChicago IL U.S.A
| | - Ning Hao
- Department of MathematicsThe University of ArizonaTuscon AZ U.S.A
| |
Collapse
|
31
|
Thorpe RK, Smith RJH. Future directions for screening and treatment in congenital hearing loss. PRECISION CLINICAL MEDICINE 2020; 3:175-186. [PMID: 33209510 PMCID: PMC7653508 DOI: 10.1093/pcmedi/pbaa025] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/06/2020] [Accepted: 07/12/2020] [Indexed: 02/06/2023] Open
Abstract
Hearing loss is the most common neurosensory deficit. It results from a variety of heritable and acquired causes and is linked to multiple deleterious effects on a child's development that can be ameliorated by prompt identification and individualized therapies. Diagnosing hearing loss in newborns is challenging, especially in mild or progressive cases, and its management requires a multidisciplinary team of healthcare providers comprising audiologists, pediatricians, otolaryngologists, and genetic counselors. While physiologic newborn hearing screening has resulted in earlier diagnosis of hearing loss than ever before, a growing body of knowledge supports the concurrent implementation of genetic and cytomegalovirus testing to offset the limitations inherent to a singular screening modality. In this review, we discuss the contemporary role of screening for hearing loss in newborns as well as future directions in its diagnosis and treatment.
Collapse
Affiliation(s)
- Ryan K Thorpe
- Molecular Otolaryngology and Renal Research Laboratories, Carver College of Medicine, University of Iowa, 375 Newton Rd, Iowa City, IA 52242, USA
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA
| | - Richard J H Smith
- Molecular Otolaryngology and Renal Research Laboratories, Carver College of Medicine, University of Iowa, 375 Newton Rd, Iowa City, IA 52242, USA
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA
- The Interdisciplinary Graduate Program in Genetics, University of Iowa, 375 Newton Rd, Iowa City, IA 52242, USA
- Iowa Institute of Human Genetics, University of Iowa, 375 Newton Rd, Iowa City, IA 52242, USA
| |
Collapse
|
32
|
Melas M, Subbiah S, Saadat S, Rajurkar S, McDonnell KJ. The Community Oncology and Academic Medical Center Alliance in the Age of Precision Medicine: Cancer Genetics and Genomics Considerations. J Clin Med 2020; 9:E2125. [PMID: 32640668 PMCID: PMC7408957 DOI: 10.3390/jcm9072125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 06/28/2020] [Accepted: 07/02/2020] [Indexed: 12/15/2022] Open
Abstract
Recent public policy, governmental regulatory and economic trends have motivated the establishment and deepening of community health and academic medical center alliances. Accordingly, community oncology practices now deliver a significant portion of their oncology care in association with academic cancer centers. In the age of precision medicine, this alliance has acquired critical importance; novel advances in nucleic acid sequencing, the generation and analysis of immense data sets, the changing clinical landscape of hereditary cancer predisposition and ongoing discovery of novel, targeted therapies challenge community-based oncologists to deliver molecularly-informed health care. The active engagement of community oncology practices with academic partners helps with meeting these challenges; community/academic alliances result in improved cancer patient care and provider efficacy. Here, we review the community oncology and academic medical center alliance. We examine how practitioners may leverage academic center precision medicine-based cancer genetics and genomics programs to advance their patients' needs. We highlight a number of project initiatives at the City of Hope Comprehensive Cancer Center that seek to optimize community oncology and academic cancer center precision medicine interactions.
Collapse
Affiliation(s)
- Marilena Melas
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA;
| | - Shanmuga Subbiah
- Department of Medical Oncology and Therapeutics Research, City of Hope Comprehensive Cancer Center, Glendora, CA 91741, USA;
| | - Siamak Saadat
- Department of Medical Oncology and Therapeutics Research, City of Hope Comprehensive Cancer Center, Colton, CA 92324, USA;
| | - Swapnil Rajurkar
- Department of Medical Oncology and Therapeutics Research, City of Hope Comprehensive Cancer Center, Upland, CA 91786, USA;
| | - Kevin J. McDonnell
- Department of Medical Oncology and Therapeutics Research, City of Hope Comprehensive Cancer Center and Beckman Research Institute, Duarte, CA 91010, USA
- Center for Precision Medicine, City of Hope Comprehensive Cancer Center, Duarte, CA 91010, USA
| |
Collapse
|
33
|
Soylev A, Le TM, Amini H, Alkan C, Hormozdiari F. Discovery of tandem and interspersed segmental duplications using high-throughput sequencing. Bioinformatics 2020; 35:3923-3930. [PMID: 30937433 DOI: 10.1093/bioinformatics/btz237] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 01/20/2019] [Accepted: 03/29/2019] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. RESULTS We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). AVAILABILITY AND IMPLEMENTATION TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Arda Soylev
- Department of Computer Engineering, Bilkent University, Ankara.,Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Thong Minh Le
- UC-Davis Genome Center, University of California, Davis, CA, USA.,Department of Computer Science, University of California, Davis, CA, USA
| | - Hajar Amini
- Department of Neurology, School of Medicine, University of California, Davis, CA, USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara.,Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey.,Department of Computer Science, ETH Zürich, Zurich, Switzerland
| | - Fereydoun Hormozdiari
- UC-Davis Genome Center, University of California, Davis, CA, USA.,Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, USA.,MIND Institute, University of California, Davis, CA, USA
| |
Collapse
|
34
|
Wei YC, Huang GH. CONY: A Bayesian procedure for detecting copy number variations from sequencing read depths. Sci Rep 2020; 10:10493. [PMID: 32591545 PMCID: PMC7319969 DOI: 10.1038/s41598-020-64353-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 04/15/2020] [Indexed: 12/26/2022] Open
Abstract
Copy number variations (CNVs) are genomic structural mutations consisting of abnormal numbers of fragment copies. Next-generation sequencing of read-depth signals mirrors these variants. Some tools used to predict CNVs by depth have been published, but most of these tools can be applied to only a specific data type due to modeling limitations. We develop a tool for copy number variation detection by a Bayesian procedure, i.e., CONY, that adopts a Bayesian hierarchical model and an efficient reversible-jump Markov chain Monte Carlo inference algorithm for whole genome sequencing of read-depth data. CONY can be applied not only to individual samples for estimating the absolute number of copies but also to case-control pairs for detecting patient-specific variations. We evaluate the performance of CONY and compare CONY with competing approaches through simulations and by using experimental data from the 1000 Genomes Project. CONY outperforms the other methods in terms of accuracy in both single-sample and paired-samples analyses. In addition, CONY performs well regardless of whether the data coverage is high or low. CONY is useful for detecting both absolute and relative CNVs from read-depth data sequences. The package is available at https://github.com/weiyuchung/CONY.
Collapse
Affiliation(s)
- Yu-Chung Wei
- Graduate Institute of Statistics and Information Science, National Changhua University of Education, No.1 Jinde Road, Changhua City, Changhua County, 50007, Taiwan
| | - Guan-Hua Huang
- Institute of Statistics, National Chiao Tung University, 1001 University Road, Hsinchu, 30010, Taiwan.
| |
Collapse
|
35
|
Liu Y, Sun J, Zhang M, Yang G, Wang R, Xu J, Li Q, Zhang S, Le W, Hao B, Li Y, Wu J. Identification of key genes related to seedlessness by genome-wide detection of structural variation and transcriptome analysis in 'Shijiwuhe' pear. Gene 2020; 738:144480. [PMID: 32081696 DOI: 10.1016/j.gene.2020.144480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 02/14/2020] [Accepted: 02/14/2020] [Indexed: 11/30/2022]
Abstract
Seedless fruits are highly marketable because they are easier to eat than fruits with seeds. 'Shijiwuhe' is a seedless pear cultivar that is a mutant derived from an F1 hybridization population ('Bartlett' x 'Yali'). Little is known about the key genes controlling seedless pear fruit. In this study, field experiments revealed that seedless 'Shijiwuhe' pear was not due to parthenocarpy, and that it was self-incompatible. Single nucleotide polymorphisms (SNPs), small insertions and deletions (InDels) and structural variations (SVs) were characterized using DNA sequencing data between 'Shijiwuhe' and parental cultivars. A total of 1498 genes were found to be affected by SV and over 50% of SVs were located in promoter regions. Transcriptome analysis was conducted at three time points (4, 8, and 12 days after cross-pollination) during early fruit development of 'Shijiwuhe', 'Bartlett', and 'Yali'. In total, 1438 differentially expressed genes (DEGs) were found between 'Shijiwuhe' and parental cultivars 'Bartlett' and 'Yali'. We found 1193 SVs that caused differential expression of genes at 4 DACP. Among them, over 100 genes were in pathways related to seed nutrition and energy storage and 41 candidate genes encoded several important transcription factors, such as MYB, WRKY, NAC, and bHLH, which might play important roles in seed development. The qRT-PCR results also confirmed that the candidate genes with SVs showed differential expression between 'Shijiwuhe' pear and 'Bartlett' or 'Yali'. This study, which combined field experiments, SV detection, and transcriptome analysis might provide an effective way to predict the candidate genes regulating the seedless trait and important gene resources for genetic improvement of pear.
Collapse
Affiliation(s)
- Yueyuan Liu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jieying Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Mingyue Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Guangyan Yang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Runze Wang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jintao Xu
- Changli Institute of Pomology, Hebei Academy of Agricultural and Forestry Sciences, Changli, Hebei 066600, China
| | - Qingyu Li
- Yantai Academy of Agricultural Sciences, Shandong 264000, China
| | - Shaoling Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Wenquan Le
- Changli Institute of Pomology, Hebei Academy of Agricultural and Forestry Sciences, Changli, Hebei 066600, China
| | - Baofeng Hao
- Changli Institute of Pomology, Hebei Academy of Agricultural and Forestry Sciences, Changli, Hebei 066600, China
| | - Yuanjun Li
- Yantai Academy of Agricultural Sciences, Shandong 264000, China
| | - Jun Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China.
| |
Collapse
|
36
|
Karaoğlanoğlu F, Ricketts C, Ebren E, Rasekh ME, Hajirasouliha I, Alkan C. VALOR2: characterization of large-scale structural variants using linked-reads. Genome Biol 2020; 21:72. [PMID: 32192518 PMCID: PMC7083023 DOI: 10.1186/s13059-020-01975-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 02/24/2020] [Indexed: 12/31/2022] Open
Abstract
Most existing methods for structural variant detection focus on discovery and genotyping of deletions, insertions, and mobile elements. Detection of balanced structural variants with no gain or loss of genomic segments, for example, inversions and translocations, is a particularly challenging task. Furthermore, there are very few algorithms to predict the insertion locus of large interspersed segmental duplications and characterize translocations. Here, we propose novel algorithms to characterize large interspersed segmental duplications, inversions, deletions, and translocations using linked-read sequencing data. We redesign our earlier algorithm, VALOR, and implement our new algorithms in a new software package, called VALOR2.
Collapse
Affiliation(s)
- Fatih Karaoğlanoğlu
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
| | - Camir Ricketts
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, 1300 York Ave, New York, 10065 NY USA
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
| | - Ezgi Ebren
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
| | - Marzieh Eslami Rasekh
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, 02215 MA USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Bilkent University, Ankara, 06800 Turkey
| |
Collapse
|
37
|
Redmond SN, Sharma A, Sharakhov I, Tu Z, Sharakhova M, Neafsey DE. Linked-read sequencing identifies abundant microinversions and introgression in the arboviral vector Aedes aegypti. BMC Biol 2020; 18:26. [PMID: 32164699 PMCID: PMC7068900 DOI: 10.1186/s12915-020-0757-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Accepted: 02/21/2020] [Indexed: 11/17/2022] Open
Abstract
Background Aedes aegypti is the principal mosquito vector of Zika, dengue, and yellow fever viruses. Two subspecies of Ae. aegypti exhibit phenotypic divergence with regard to habitat, host preference, and vectorial capacity. Chromosomal inversions have been shown to play a major role in adaptation and speciation in dipteran insects and would be of great utility for studies of Ae. aegypti. However, the large and highly repetitive genome of Ae. aegypti makes it difficult to detect inversions with paired-end short-read sequencing data, and polytene chromosome analysis does not provide sufficient resolution to detect chromosome banding patterns indicative of inversions. Results To characterize chromosomal diversity in this species, we have carried out deep Illumina sequencing of linked-read (10X Genomics) libraries in order to discover inversion loci as well as SNPs. We analyzed individuals from colonies representing the geographic limits of each subspecies, one contact zone between subspecies, and a closely related sister species. Despite genome-wide SNP divergence and abundant microinversions, we do not find any inversions occurring as fixed differences between subspecies. Many microinversions are found in regions that have introgressed and have captured genes that could impact behavior, such as a cluster of odorant-binding proteins that may play a role in host feeding preference. Conclusions Our study shows that inversions are abundant and widely shared among subspecies of Aedes aegypti and that introgression has occurred in regions of secondary contact. This library of 32 novel chromosomal inversions demonstrates the capacity for linked-read sequencing to identify previously intractable genomic rearrangements and provides a foundation for future population genetics studies in this species.
Collapse
Affiliation(s)
- Seth N Redmond
- Institute of Vector Borne Disease, Monash University, Melbourne, Australia. .,Harvard TH Chan School of Public Health, Boston, MA, USA.
| | - Atashi Sharma
- Fralin Life Science Institute, Virginia Polytechnic and State University, Blacksburg, VA, USA
| | - Igor Sharakhov
- Fralin Life Science Institute, Virginia Polytechnic and State University, Blacksburg, VA, USA
| | - Zhijian Tu
- Fralin Life Science Institute, Virginia Polytechnic and State University, Blacksburg, VA, USA
| | - Maria Sharakhova
- Fralin Life Science Institute, Virginia Polytechnic and State University, Blacksburg, VA, USA
| | - Daniel E Neafsey
- Harvard TH Chan School of Public Health, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
38
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
39
|
Spence M, Banuelos M, Marcia RF, Sindi S. Detecting inherited and novel structural variants in low-coverage parent-child sequencing data. Methods 2020; 173:61-68. [PMID: 31271880 DOI: 10.1016/j.ymeth.2019.06.025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Revised: 06/12/2019] [Accepted: 06/24/2019] [Indexed: 11/25/2022] Open
Abstract
Structural variants (SVs) are a class of genomic variation shared by members of the same species. Though relatively rare, they represent an increasingly important class of variation, as SVs have been associated with diseases and susceptibility to some types of cancer. Common approaches to SV detection require the sequencing and mapping of fragments from a test genome to a high-quality reference genome. Candidate SVs correspond to fragments with discordant mapped configurations. However, because errors in the sequencing and mapping will also create discordant arrangements, many of these predictions will be spurious. When sequencing coverage is low, distinguishing true SVs from errors is even more challenging. In recent work, we have developed SV detection methods that exploit genome information of closely related individuals - parents and children. Our previous approaches were based on the assumption that any SV present in a child's genome must have come from one of their parents. However, using this strict restriction may have resulted in failing to predict any rare but novel variants present only in the child. In this work, we generalize our previous approaches to allow the child to carry novel variants. We consider a constrained optimization approach where variants in the child are of two types either inherited - and therefore must be present in a parent - or novel. For simplicity, we consider only a single parent and single child each of which have a haploid genome. However, even in this restricted case, our approach has the power to improve variant prediction. We present results on both simulated candidate variant regions, parent-child trios from the 1000 Genomes Project, and a subset of the 17 Platinum Genomes.
Collapse
Affiliation(s)
- Melissa Spence
- Department of Applied Mathematics, University of California, Merced, Merced, CA 95343, USA
| | - Mario Banuelos
- Department of Mathematics, California State University, Fresno, Fresno, CA 93740, USA.
| | - Roummel F Marcia
- Department of Applied Mathematics, University of California, Merced, Merced, CA 95343, USA
| | - Suzanne Sindi
- Department of Applied Mathematics, University of California, Merced, Merced, CA 95343, USA
| |
Collapse
|
40
|
Serrano C, Vivancos A, López-Pousa A, Matito J, Mancuso FM, Valverde C, Quiroga S, Landolfi S, Castro S, Dopazo C, Sebio A, Virgili AC, Menso MM, Martín-Broto J, Sansó M, García-Valverde A, Rosell J, Fletcher JA, George S, Carles J, Arribas J. Clinical value of next generation sequencing of plasma cell-free DNA in gastrointestinal stromal tumors. BMC Cancer 2020; 20:99. [PMID: 32024476 PMCID: PMC7003348 DOI: 10.1186/s12885-020-6597-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 01/31/2020] [Indexed: 02/08/2023] Open
Abstract
Background Gastrointestinal stromal tumor (GIST) initiation and evolution is commonly framed by KIT/PDGFRA oncogenic activation, and in later stages by the polyclonal expansion of resistant subpopulations harboring KIT secondary mutations after the onset of imatinib resistance. Thus, circulating tumor (ct)DNA determination is expected to be an informative non-invasive dynamic biomarker in GIST patients. Methods We performed amplicon-based next-generation sequencing (NGS) across 60 clinically relevant genes in 37 plasma samples from 18 GIST patients collected prospectively. ctDNA alterations were compared with NGS of matched tumor tissue samples (obtained either simultaneously or at the time of diagnosis) and cross-validated with droplet digital PCR (ddPCR). Results We were able to identify cfDNA mutations in five out of 18 patients had detectable in at least one timepoint. Overall, NGS sensitivity for detection of cell-free (cf)DNA mutations in plasma was 28.6%, showing high concordance with ddPCR confirmation. We found that GIST had relatively low ctDNA shedding, and mutations were at low allele frequencies. ctDNA was detected only in GIST patients with advanced disease after imatinib failure, predicting tumor dynamics in serial monitoring. KIT secondary mutations were the only mechanism of resistance found across 10 imatinib-resistant GIST patients progressing to sunitinib or regorafenib. Conclusions ctDNA evaluation with amplicon-based NGS detects KIT primary and secondary mutations in metastatic GIST patients, particularly after imatinib progression. GIST exhibits low ctDNA shedding, but ctDNA monitoring, when positive, reflects tumor dynamics.
Collapse
Affiliation(s)
- César Serrano
- Medical Oncology Department, Vall d'Hebron University Hospital, P. Vall d'Hebron 119, 08035, Barcelona, Spain. .,Preclinical Research Program, Vall d'Hebron Institute of Oncology, Barcelona, Spain.
| | - Ana Vivancos
- Cancer Genomics Group,
- Vall d'Hebron Institute of Oncology, Natzaret 115, 08035, Barcelona, Spain.
| | | | - Judit Matito
- Cancer Genomics Group,
- Vall d'Hebron Institute of Oncology, Natzaret 115, 08035, Barcelona, Spain
| | - Francesco M Mancuso
- Cancer Genomics Group,
- Vall d'Hebron Institute of Oncology, Natzaret 115, 08035, Barcelona, Spain
| | - Claudia Valverde
- Medical Oncology Department, Vall d'Hebron University Hospital, P. Vall d'Hebron 119, 08035, Barcelona, Spain
| | - Sergi Quiroga
- Radiology Department, Vall d'Hebron University Hospital, Barcelona, Spain
| | - Stefania Landolfi
- Pathology Department, Vall d'Hebron University Hospital, Barcelona, Spain
| | - Sandra Castro
- Surgical Oncology Division, Vall d'Hebron University Hospital, Barcelona, Spain
| | - Cristina Dopazo
- Surgical Oncology Division, Vall d'Hebron University Hospital, Barcelona, Spain
| | - Ana Sebio
- Medical Oncology, Sant Pau University Hospital, Barcelona, Spain
| | - Anna C Virgili
- Medical Oncology, Sant Pau University Hospital, Barcelona, Spain
| | - María M Menso
- Radiology Department, Sant Pau University Hospital, Barcelona, Spain
| | | | - Miriam Sansó
- Cancer Genomics Group,
- Vall d'Hebron Institute of Oncology, Natzaret 115, 08035, Barcelona, Spain
| | | | - Jordi Rosell
- Preclinical Research Program, Vall d'Hebron Institute of Oncology, Barcelona, Spain
| | - Jonathan A Fletcher
- Pathology Department, Brigham and Women's Hospital/Harvard Medical School, Boston, USA
| | - Suzanne George
- Center for Sarcoma and Bone Oncology, Dana-Farber Cancer Institute, Boston, USA
| | - Joan Carles
- Medical Oncology Department, Vall d'Hebron University Hospital, P. Vall d'Hebron 119, 08035, Barcelona, Spain
| | - Joaquín Arribas
- Preclinical Research Program, Vall d'Hebron Institute of Oncology, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
41
|
Liu Y, Zhang M, Sun J, Chang W, Sun M, Zhang S, Wu J. Comparison of multiple algorithms to reliably detect structural variants in pears. BMC Genomics 2020; 21:61. [PMID: 31959124 PMCID: PMC6972009 DOI: 10.1186/s12864-020-6455-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Accepted: 01/07/2020] [Indexed: 01/01/2023] Open
Abstract
Background Structural variations (SVs) have been reported to play an important role in genetic diversity and trait regulation. Many computer algorithms detecting SVs have recently been developed, but the use of multiple algorithms to detect high-confidence SVs has not been studied. The most suitable sequencing depth for detecting SVs in pear is also not known. Results In this study, a pipeline to detect SVs using next-generation and long-read sequencing data was constructed. The performances of seven types of SV detection software using next-generation sequencing (NGS) data and two types of software using long-read sequencing data (SVIM and Sniffles), which are based on different algorithms, were compared. Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (> 90%). When the results from multiple SV detection tools were combined, the SVs identified by both MetaSV and IMR/DENOM, which use NGS data, were more accurate than those identified by both SVIM and Sniffles, with mean accuracies of 98.7 and 96.5%, respectively. The software packages using long-read sequencing data required fewer CPU cores and less memory and ran faster than those using NGS data. In addition, according to the performances of assembly-based algorithms using NGS data, we found that a sequencing depth of 50× is appropriate for detecting SVs in the pear genome. Conclusion This study provides strong evidence that more than one SV detection software package, each based on a different algorithm, should be used to detect SVs with higher confidence, and that long-read sequencing data are better than NGS data for SV detection. The SV detection pipeline that we have established will facilitate the study of diversity in other crops.
Collapse
Affiliation(s)
- Yueyuan Liu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Mingyue Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jieying Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Wenjing Chang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Manyi Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Shaoling Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jun Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
42
|
Vegesna R, Tomaszkiewicz M, Medvedev P, Makova KD. Dosage regulation, and variation in gene expression and copy number of human Y chromosome ampliconic genes. PLoS Genet 2019; 15:e1008369. [PMID: 31525193 PMCID: PMC6772104 DOI: 10.1371/journal.pgen.1008369] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 10/01/2019] [Accepted: 08/13/2019] [Indexed: 12/28/2022] Open
Abstract
The Y chromosome harbors nine multi-copy ampliconic gene families expressed exclusively in testis. The gene copies within each family are >99% identical to each other, which poses a major challenge in evaluating their copy number. Recent studies demonstrated high variation in Y ampliconic gene copy number among humans. However, how this variation affects expression levels in human testis remains understudied. Here we developed a novel computational tool Ampliconic Copy Number Estimator (AmpliCoNE) that utilizes read sequencing depth information to estimate Y ampliconic gene copy number per family. We applied this tool to whole-genome sequencing data of 149 men with matched testis expression data whose samples are part of the Genotype-Tissue Expression (GTEx) project. We found that the Y ampliconic gene families with low copy number in humans were deleted or pseudogenized in non-human great apes, suggesting relaxation of functional constraints. Among the Y ampliconic gene families, higher copy number leads to higher expression. Within the Y ampliconic gene families, copy number does not influence gene expression, rather a high tolerance for variation in gene expression was observed in testis of presumably healthy men. No differences in gene expression levels were found among major Y haplogroups. Age positively correlated with expression levels of the HSFY and PRY gene families in the African subhaplogroup E1b, but not in the European subhaplogroups R1b and I1. We also found that expression of five Y ampliconic gene families is coordinated with that of their non-Y (i.e. X or autosomal) homologs. Indeed, five ampliconic gene families had consistently lower expression levels when compared to their non-Y homologs suggesting dosage regulation, while the HSFY family had higher expression levels than its X homolog and thus lacked dosage regulation.
Collapse
MESH Headings
- Animals
- Chromosomes, Human, Y/genetics
- Chromosomes, Human, Y/physiology
- DNA Copy Number Variations/genetics
- Databases, Genetic
- Dosage Compensation, Genetic/genetics
- Dosage Compensation, Genetic/physiology
- Epigenesis, Genetic/genetics
- Gene Dosage/genetics
- Gene Expression/genetics
- Gene Expression Regulation/genetics
- Genes, Y-Linked/genetics
- Genes, Y-Linked/physiology
- Heat Shock Transcription Factors/genetics
- Heat Shock Transcription Factors/metabolism
- Humans
- Male
- Multigene Family/genetics
- Sequence Analysis, DNA/methods
- Testis/metabolism
Collapse
Affiliation(s)
- Rahulsimham Vegesna
- Bioinformatics and Genomics Graduate Program, The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park, PA, United States of America
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States of America
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, United States of America
| | - Paul Medvedev
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States of America
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, United States of America
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, United States of America
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, United States of America
| | - Kateryna D. Makova
- Bioinformatics and Genomics Graduate Program, The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park, PA, United States of America
- Department of Biology, Pennsylvania State University, University Park, PA, United States of America
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, United States of America
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, United States of America
| |
Collapse
|
43
|
Zhang Z, Cheng H, Hong X, Di Narzo AF, Franzen O, Peng S, Ruusalepp A, Kovacic JC, Bjorkegren JLM, Wang X, Hao K. EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data. Nucleic Acids Res 2019; 47:e39. [PMID: 30722045 PMCID: PMC6468244 DOI: 10.1093/nar/gkz068] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 12/17/2018] [Accepted: 01/25/2019] [Indexed: 12/30/2022] Open
Abstract
The associations between diseases/traits and copy number variants (CNVs) have not been systematically investigated in genome-wide association studies (GWASs), primarily due to a lack of robust and accurate tools for CNV genotyping. Herein, we propose a novel ensemble learning framework, ensembleCNV, to detect and genotype CNVs using single nucleotide polymorphism (SNP) array data. EnsembleCNV (a) identifies and eliminates batch effects at raw data level; (b) assembles individual CNV calls into CNV regions (CNVRs) from multiple existing callers with complementary strengths by a heuristic algorithm; (c) re-genotypes each CNVR with local likelihood model adjusted by global information across multiple CNVRs; (d) refines CNVR boundaries by local correlation structure in copy number intensities; (e) provides direct CNV genotyping accompanied with confidence score, directly accessible for downstream quality control and association analysis. Benchmarked on two large datasets, ensembleCNV outperformed competing methods and achieved a high call rate (93.3%) and reproducibility (98.6%), while concurrently achieving high sensitivity by capturing 85% of common CNVs documented in the 1000 Genomes Project. Given this CNV call rate and accuracy, which are comparable to SNP genotyping, we suggest ensembleCNV holds significant promise for performing genome-wide CNV association studies and investigating how CNVs predispose to human diseases.
Collapse
Affiliation(s)
- Zhongyang Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Haoxiang Cheng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Xiumei Hong
- Center on the Early Life Origins of Disease, Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Antonio F Di Narzo
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Oscar Franzen
- Integrated Cardio Metabolic Centre, Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Shouneng Peng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Arno Ruusalepp
- Department of Cardiac Surgery, Tartu University Hospital, Tartu, Estonia
| | - Jason C Kovacic
- Cardiovascular Research Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Johan L M Bjorkegren
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre, Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Xiaobin Wang
- Center on the Early Life Origins of Disease, Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Division of General Pediatrics & Adolescent Medicine, Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Ke Hao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- The Tenth People's Hospital, Tongji University, Shanghai 200072, China
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| |
Collapse
|
44
|
Fuentes RR, Chebotarov D, Duitama J, Smith S, De la Hoz JF, Mohiyuddin M, Wing RA, McNally KL, Tatarinova T, Grigoriev A, Mauleon R, Alexandrov N. Structural variants in 3000 rice genomes. Genome Res 2019; 29:870-880. [PMID: 30992303 PMCID: PMC6499320 DOI: 10.1101/gr.241240.118] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 03/11/2019] [Indexed: 12/24/2022]
Abstract
Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5′ UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.
Collapse
Affiliation(s)
- Roven Rommel Fuentes
- International Rice Research Institute, Laguna 4031, Philippines.,Bioinformatics Group, Wageningen University and Research, 6708 PB Wageningen, the Netherlands
| | | | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia.,Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali 6713, Colombia
| | - Sean Smith
- Biology Department, Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA
| | - Juan Fernando De la Hoz
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali 6713, Colombia
| | | | - Rod A Wing
- International Rice Research Institute, Laguna 4031, Philippines.,Arizona Genomics Institute, University of Arizona, Tucson, Arizona 85721, USA.,King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | | | - Tatiana Tatarinova
- Department of Biology, University of La Verne, La Verne, California 91750, USA.,Vavilov Institute of General Genetics, Moscow 119333, Russia.,A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127051, Russia.,Laboratory of Forest Genomics, Siberian Federal University, Krasnoyarsk 660041, Russia
| | - Andrey Grigoriev
- Biology Department, Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA
| | - Ramil Mauleon
- International Rice Research Institute, Laguna 4031, Philippines
| | | |
Collapse
|
45
|
Detection of False-Positive Deletions from the Database of Genomic Variants. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8420547. [PMID: 31080831 PMCID: PMC6475568 DOI: 10.1155/2019/8420547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 02/24/2019] [Accepted: 03/04/2019] [Indexed: 11/24/2022]
Abstract
Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.
Collapse
|
46
|
Khatri B, Kang S, Shouse S, Anthony N, Kuenzel W, Kong BC. Copy number variation study in Japanese quail associated with stress related traits using whole genome re-sequencing data. PLoS One 2019; 14:e0214543. [PMID: 30921419 PMCID: PMC6438477 DOI: 10.1371/journal.pone.0214543] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 03/15/2019] [Indexed: 02/06/2023] Open
Abstract
Copy number variation (CNV) is a major driving factor for genetic variation and phenotypic diversity in animals. To detect CNVs and understand genetic components underlying stress related traits, we performed whole genome re-sequencing of pooled DNA samples of 20 birds each from High Stress (HS) and Low Stress (LS) Japanese quail lines using Illumina HiSeq 2×150 bp paired end method. Sequencing data were aligned to the quail genome and CNVnator was used to detect CNVs in the aligned data sets. The depth of coverage for the data reached to 41.4x and 42.6x for HS and LS birds, respectively. We identified 262 and 168 CNV regions affecting 1.6 and 1.9% of the reference genome that completely overlapped 454 and 493 unique genes in HS and LS birds, respectively. Ingenuity pathway analysis showed that the CNV genes were significantly enriched to phospholipase C signaling, neuregulin signaling, reelin signaling in neurons, endocrine and nervous development, humoral immune response, and carbohydrate and amino acid metabolisms in HS birds, whereas CNV genes in LS birds were enriched in cell-mediated immune response, and protein and lipid metabolisms. These findings suggest CNV genes identified in HS and LS birds could be candidate markers responsible for stress responses in birds.
Collapse
Affiliation(s)
- Bhuwan Khatri
- Department of Poultry Science, Center of Excellence for Poultry Science, University of Arkansas, Fayetteville, AR, United States of America
| | - Seong Kang
- Department of Poultry Science, Center of Excellence for Poultry Science, University of Arkansas, Fayetteville, AR, United States of America
| | - Stephanie Shouse
- Department of Poultry Science, Center of Excellence for Poultry Science, University of Arkansas, Fayetteville, AR, United States of America
| | - Nicholas Anthony
- Department of Poultry Science, Center of Excellence for Poultry Science, University of Arkansas, Fayetteville, AR, United States of America
| | - Wayne Kuenzel
- Department of Poultry Science, Center of Excellence for Poultry Science, University of Arkansas, Fayetteville, AR, United States of America
| | - Byungwhi C. Kong
- Department of Poultry Science, Center of Excellence for Poultry Science, University of Arkansas, Fayetteville, AR, United States of America
- * E-mail:
| |
Collapse
|
47
|
Genome maps across 26 human populations reveal population-specific patterns of structural variation. Nat Commun 2019; 10:1025. [PMID: 30833565 PMCID: PMC6399254 DOI: 10.1038/s41467-019-08992-7] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 02/12/2019] [Indexed: 01/10/2023] Open
Abstract
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome. Large structural variants (SV) are understudied in human genetics research because of the difficulty to detect them in the routinely generated short-read sequencing data. Here, the authors generate optical genome maps of 154 individuals from 26 populations that allow comprehensive examination of large SVs.
Collapse
|
48
|
Characterization and evolutionary dynamics of complex regions in eukaryotic genomes. SCIENCE CHINA-LIFE SCIENCES 2019; 62:467-488. [PMID: 30810961 DOI: 10.1007/s11427-018-9458-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 01/07/2023]
Abstract
Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.
Collapse
|
49
|
Wang X, Zhang H, Liu X. Defind: Detecting Genomic Deletions by Integrating Read Depth, GC Content, Mapping Quality and Paired-end Mapping Signatures of Next Generation Sequencing Data. Curr Bioinform 2019. [DOI: 10.2174/1574893613666180703110126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Background:
Accurate and exhaustive identification of genomic deletion events is the
basis for understanding their roles in phenotype variation. Developing effective algorithms to
identify deletions using next generation sequencing (NGS) data remains a challenge.
Objective:
The accurate and exhaustive identification of genomic deletion events is important; we
present a new approach, Defind, to detect deletions using NGS data from a single sample mapped
to the reference genome sequences.
Method:
The operating system(s) is Linux. Programming languages are Perl and R. We present
Defind, a new approach for detecting medium- and large-sized deletions, based on inspecting the
depth of coverage, GC content, mapping quality, and paired-end information of NGS data,
simultaneously. We carried out detailed comparisons between Defind and other deletion detection
methods using both simulation data and real data.
Results:
In simulation studies, Defind could retrieve more deletions than other methods at low to
medium sequencing coverage (e.g., 5 to 10×) with no false positives. Using real data, 94% of
deletions commonly detected by at least two other methods were also detected by Defind. In
addition, 90% of the deletions detected by Defind using the real data were positively supported by
comparative genomic hybridization results, demonstrating the efficiency of Defind.
Conclusion:
Defind performed robustly at different sequence coverage with different read length
in the simulation study. Our studies also provided a significant practical guidance to select
appropriate methods to detect genomic deletions using NGS data.
Collapse
Affiliation(s)
- Xin Wang
- College of Life Science, Nanchang University, Nanchang 330031, China
| | - Huan Zhang
- College of Life Science, Nanchang University, Nanchang 330031, China
| | - Xiaojing Liu
- College of Life Science, Nanchang University, Nanchang 330031, China
| |
Collapse
|
50
|
Gagliano SA, Sengupta S, Sidore C, Maschio A, Cucca F, Schlessinger D, Abecasis GR. Relative impact of indels versus SNPs on complex disease. Genet Epidemiol 2018; 43:112-117. [PMID: 30565766 PMCID: PMC6330128 DOI: 10.1002/gepi.22175] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 09/07/2018] [Accepted: 10/29/2018] [Indexed: 11/30/2022]
Abstract
It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single‐nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well‐phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping‐associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.
Collapse
Affiliation(s)
- Sarah A Gagliano
- Center for Statistical Genetics, and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Sebanti Sengupta
- Center for Statistical Genetics, and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Andrea Maschio
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy.,Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy
| | - David Schlessinger
- Laboratory of Genetics, National Institute on Aging, US National Institutes of Health, Baltimore, Maryland
| | - Gonçalo R Abecasis
- Center for Statistical Genetics, and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|