1
|
Korenková V, Weisz F, Perglerová A, Cacciò SM, Nohýnková E, Tůmová P. Comprehensive analysis of flavohemoprotein copy number variation in Giardia intestinalis: exploring links to metronidazole resistance. Parasit Vectors 2024; 17:336. [PMID: 39127700 DOI: 10.1186/s13071-024-06392-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 07/03/2024] [Indexed: 08/12/2024] Open
Abstract
BACKGROUND Giardiasis, caused by the protozoan parasite Giardia intestinalis, often presents a treatment challenge, particularly in terms of resistance to metronidazole. Despite extensive research, markers for metronidazole resistance have not yet been identified. METHODS This study analysed 28 clinical samples of G. intestinalis from sub-assemblage AII, characterised by varying responses to metronidazole treatment. We focussed on copy number variation (CNV) of the multi-copy flavohemoprotein gene, analysed using digital polymerase chain reaction (dPCR) and next generation sequencing (NGS). Additionally, chromosomal ploidy was tested in 18 of these samples. Flavohemoprotein CNV was also assessed in 17 samples from other sub-assemblages. RESULTS Analyses revealed variable CNVs of the flavohemoprotein gene among the isolates, with no correlation to clinical metronidazole resistance. Discrepancies in CNVs detected from NGS data were attributed to biases linked to the whole genome amplification. However, dPCR helped to clarify these discrepancies by providing more consistent CNV data. Significant differences in flavohemoprotein CNVs were observed across different G. intestinalis sub-assemblages. Notably, Giardia exhibits a propensity for aneuploidy, contributing to genomic variability within and between sub-assemblages. CONCLUSIONS The complexity of the clinical metronidazole resistance in Giardia is influenced by multiple genetic factors, including CNVs and aneuploidy. No significant differences in the CNV of the flavohemoprotein gene between isolates from metronidazole-resistant and metronidazole-sensitive cases of giardiasis were found, underscoring the need for further research to identify reliable genetic markers for resistance. We demonstrate that dPCR and NGS are robust methods for analysing CNVs and provide cross-validating results, highlighting their utility in the genetic analyses of this parasite.
Collapse
Affiliation(s)
- Vlasta Korenková
- Institute of Immunology and Microbiology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic.
| | - Filip Weisz
- Institute of Immunology and Microbiology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic
| | - Aneta Perglerová
- Institute of Immunology and Microbiology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic
| | - Simone M Cacciò
- Department of Infectious Diseases, Istituto Superiore Di Sanita, Rome, Italy
| | - Eva Nohýnková
- Institute of Immunology and Microbiology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic
| | - Pavla Tůmová
- Institute of Immunology and Microbiology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic
| |
Collapse
|
2
|
Yuan T, Dong J, Jia B, Jiang H, Zhao Z, Zhou M. DTDHM: detection of tandem duplications based on hybrid methods using next-generation sequencing data. PeerJ 2024; 12:e17748. [PMID: 39076774 PMCID: PMC11285389 DOI: 10.7717/peerj.17748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/24/2024] [Indexed: 07/31/2024] Open
Abstract
Background Tandem duplication (TD) is a common and important type of structural variation in the human genome. TDs have been shown to play an essential role in many diseases, including cancer. However, it is difficult to accurately detect TDs due to the uneven distribution of reads and the inherent complexity of next-generation sequencing (NGS) data. Methods This article proposes a method called DTDHM (detection of tandem duplications based on hybrid methods), which utilizes NGS data to detect TDs in a single sample. DTDHM builds a pipeline that integrates read depth (RD), split read (SR), and paired-end mapping (PEM) signals. To solve the problem of uneven distribution of normal and abnormal samples, DTDHM uses the K-nearest neighbor (KNN) algorithm for multi-feature classification prediction. Then, the qualified split reads and discordant reads are extracted and analyzed to achieve accurate localization of variation sites. This article compares DTDHM with three other methods on 450 simulated datasets and five real datasets. Results In 450 simulated data samples, DTDHM consistently maintained the highest F1-score. The average F1-score of DTDHM, SVIM, TARDIS, and TIDDIT were 80.0%, 56.2%, 43.4%, and 67.1%, respectively. The F1-score of DTDHM had a small variation range and its detection effect was the most stable and 1.2 times that of the suboptimal method. Most of the boundary biases of DTDHM fluctuated around 20 bp, and its boundary deviation detection ability was better than TARDIS and TIDDIT. In real data experiments, five real sequencing samples (NA19238, NA19239, NA19240, HG00266, and NA12891) were used to test DTDHM. The results showed that DTDHM had the highest overlap density score (ODS) and F1-score of the four methods. Conclusions Compared with the other three methods, DTDHM achieved excellent results in terms of sensitivity, precision, F1-score, and boundary bias. These results indicate that DTDHM can be used as a reliable tool for detecting TDs from NGS data, especially in the case of low coverage depth and tumor purity samples.
Collapse
Affiliation(s)
- Tianting Yuan
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Jinxin Dong
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Baoxian Jia
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Hua Jiang
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Zuyao Zhao
- Orthopedics Department, Liaocheng People’s Hospital, Liaocheng, China
| | - Mengjiao Zhou
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| |
Collapse
|
3
|
Sinha R, Pal RK, De RK. A novel method addressing NGS-based mappability bias for sensitive detection of DNA alterations. J Bioinform Comput Biol 2024; 22:2450009. [PMID: 39030667 DOI: 10.1142/s0219720024500094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2024]
Abstract
A turning point in cancer research is the introduction of massively parallel sequencing technology which greatly reduced the cost and time for genome sequencing. This enhanced the scope for detecting and analyzing the role of structural alterations in cancer. However, certain bias exists in NGS-based approaches, which badly affects the CNV identification process. Moreover, DNA repeats existing in CNV regions need special attention as they will degrade the performance of majority of the existing CNV detection tools, even after applying generalized bias correction method. This motivated this work, where a novel method has been designed to address the issue of DNA repeats and thereby mappability bias existing in regions of CNV. The method consists of three phases, where the first phase computes the alignment information of uniquely mapped DNA reads, considering the base quality and base mismatch parameters at nucleotide level precision. The second and the third phase use a novel approach to allocate the non-uniquely mapped reads to an optimal region of the DNA repeats based on a probabilistic membership model. The proposed method is capable of identifying CNVs present in coding, as well as non-coding region of the DNA, and is also capable of detecting CNVs existing in DNA repeat regions. The methodology achieves a sensitivity greater than [Formula: see text] during the performed simulations, and on real data, the detected variants are validated with the database of genomic variants, where the percentage overlap is also greater than 95%, and has achieved much better breakpoint prediction, as compared with other popular bias correction CNV detection methods.
Collapse
Affiliation(s)
- Rituparna Sinha
- Information Technology, Heritage Institute of Technology, Anandapur Kolkata, West Bengal, India
| | - Rajat Kumar Pal
- Computer Science and Engineering Department, University of Calcutta, Kolkata, India
| | - Rajat Kumar De
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
4
|
Duan J, Zhao X, Wu X. LoRA-TV: read depth profile-based clustering of tumor cells in single-cell sequencing. Brief Bioinform 2024; 25:bbae277. [PMID: 38877886 PMCID: PMC11179121 DOI: 10.1093/bib/bbae277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/17/2024] [Accepted: 05/29/2024] [Indexed: 06/18/2024] Open
Abstract
Single-cell sequencing has revolutionized our ability to dissect the heterogeneity within tumor populations. In this study, we present LoRA-TV (Low Rank Approximation with Total Variation), a novel method for clustering tumor cells based on the read depth profiles derived from single-cell sequencing data. Traditional analysis pipelines process read depth profiles of each cell individually. By aggregating shared genomic signatures distributed among individual cells using low-rank optimization and robust smoothing, the proposed method enhances clustering performance. Results from analyses of both simulated and real data demonstrate its effectiveness compared with state-of-the-art alternatives, as supported by improvements in the adjusted Rand index and computational efficiency.
Collapse
Affiliation(s)
- Junbo Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xinrui Zhao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xiaoming Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
5
|
Alfayyadh MM, Maksemous N, Sutherland HG, Lea RA, Griffiths LR. Unravelling the Genetic Landscape of Hemiplegic Migraine: Exploring Innovative Strategies and Emerging Approaches. Genes (Basel) 2024; 15:443. [PMID: 38674378 PMCID: PMC11049430 DOI: 10.3390/genes15040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Migraine is a severe, debilitating neurovascular disorder. Hemiplegic migraine (HM) is a rare and debilitating neurological condition with a strong genetic basis. Sequencing technologies have improved the diagnosis and our understanding of the molecular pathophysiology of HM. Linkage analysis and sequencing studies in HM families have identified pathogenic variants in ion channels and related genes, including CACNA1A, ATP1A2, and SCN1A, that cause HM. However, approximately 75% of HM patients are negative for these mutations, indicating there are other genes involved in disease causation. In this review, we explored our current understanding of the genetics of HM. The evidence presented herein summarises the current knowledge of the genetics of HM, which can be expanded further to explain the remaining heritability of this debilitating condition. Innovative bioinformatics and computational strategies to cover the entire genetic spectrum of HM are also discussed in this review.
Collapse
Affiliation(s)
| | | | | | | | - Lyn R. Griffiths
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD 4059, Australia; (M.M.A.); (N.M.); (H.G.S.); (R.A.L.)
| |
Collapse
|
6
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
7
|
Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep 2024; 14:6160. [PMID: 38486064 PMCID: PMC10940726 DOI: 10.1038/s41598-024-56604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Structural variants (SVs) are one of the significant types of DNA mutations and are typically defined as larger-than-50-bp genomic alterations that include insertions, deletions, duplications, inversions, and translocations. These modifications can profoundly impact the phenotypic characteristics and contribute to disorders like cancer, response to treatment, and infections. Four long-read aligners and five SV callers have been evaluated using three Oxford Nanopore NGS human genome datasets in terms of precision, recall, and F1-score statistical metrics, depth of coverage, and speed of analysis. The best SV caller regarding recall, precision, and F1-score when matched with different aligners at different coverage levels tend to vary depending on the dataset and the specific SV types being analyzed. However, based on our findings, Sniffles and CuteSV tend to perform well across different aligners and coverage levels, followed by SVIM, PBSV, and SVDSS in the last place. The CuteSV caller has the highest average F1-score (82.51%) and recall (78.50%), and Sniffles has the highest average precision value (94.33%). Minimap2 as an aligner and Sniffles as an SV caller act as a strong base for the pipeline of SV calling because of their high speed and reasonable accomplishment. PBSV has a lower average F1-score, precision, and recall and may generate more false positives and overlook some actual SVs. Our results are valuable in the comprehensive evaluation of popular SV callers and aligners as they provide insight into the performance of several long-read aligners and SV callers and serve as a reference for researchers in selecting the most suitable tools for SV detection.
Collapse
Affiliation(s)
- Asmaa A Helal
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Bishoy T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt.
| | - Mina T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Gamal S Mosaad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Khaled M Aboshanab
- Department of Microbiology and Immunology, Faculty of Pharmacy, Ain Shams University, Organization of African Unity St., Abassi, Cairo, 11566, Egypt.
| |
Collapse
|
8
|
Xie K, Ge X, Alvi HAK, Liu K, Song J, Yu Q. OTSUCNV: an adaptive segmentation and OTSU-based anomaly classification method for CNV detection using NGS data. BMC Genomics 2024; 25:126. [PMID: 38291375 PMCID: PMC10826217 DOI: 10.1186/s12864-024-10018-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 01/15/2024] [Indexed: 02/01/2024] Open
Abstract
Copy-number variations (CNVs), which refer to deletions and duplications of chromosomal segments, represent a significant source of variation among individuals, contributing to human evolution and being implicated in various diseases ranging from mental illness and developmental disorders to cancer. Despite the development of several methods for detecting copy number variations based on next-generation sequencing (NGS) data, achieving robust detection performance for CNVs with arbitrary coverage and amplitude remains challenging due to the inherent complexity of sequencing samples. In this paper, we propose an alternative method called OTSUCNV for CNV detection on whole genome sequencing (WGS) data. This method utilizes a newly designed adaptive sequence segmentation algorithm and an OTSU-based CNV prediction algorithm, which does not rely on any distribution assumptions or involve complex outlier factor calculations. As a result, the effective detection of CNVs is achieved with lower computational complexity. The experimental results indicate that the proposed method demonstrates outstanding performance, and hence it may be used as an effective tool for CNV detection.
Collapse
Affiliation(s)
- Kun Xie
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Xiaojun Ge
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Haque A K Alvi
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Kang Liu
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Jianfeng Song
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
| | - Qiang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
- Hangzhou Institute of Technology, Xidian University, Hangzhou, 311200, China.
| |
Collapse
|
9
|
Zhang Y, Liu W, Duan J. On the core segmentation algorithms of copy number variation detection tools. Brief Bioinform 2024; 25:bbae022. [PMID: 38340093 PMCID: PMC10858679 DOI: 10.1093/bib/bbae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/26/2023] [Indexed: 02/12/2024] Open
Abstract
Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.
Collapse
Affiliation(s)
- Yibo Zhang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| | - Wenyu Liu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| | - Junbo Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
10
|
Mandiracioglu B, Ozden F, Kaynar G, Yilmaz MA, Alkan C, Cicek AE. ECOLE: Learning to call copy number variants on whole exome sequencing data. Nat Commun 2024; 15:132. [PMID: 38167256 PMCID: PMC10762021 DOI: 10.1038/s41467-023-44116-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named ECOLE. Based on a variant of the transformer architecture, the model learns to call CNVs per exon, using high-confidence calls made on matched WGS samples. We further train and fine-tune the model with a small set of expert calls via transfer learning. We show that ECOLE achieves high performance on human expert labelled data for the first time with 68.7% precision and 49.6% recall. This corresponds to precision and recall improvements of 18.7% and 30.8% over the next best-performing methods, respectively. We also show that the same fine-tuning strategy using tumor samples enables ECOLE to detect RT-qPCR-validated variations in bladder cancer samples without the need for a control sample. ECOLE is available at https://github.com/ciceklab/ECOLE .
Collapse
Affiliation(s)
- Berk Mandiracioglu
- Department of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
| | - Furkan Ozden
- Department of Computer Science, Oxford University, Oxford, UK
| | - Gun Kaynar
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| | | | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| | - A Ercument Cicek
- Department of Computer Engineering, Bilkent University, Ankara, Turkey.
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, US.
| |
Collapse
|
11
|
Contreras-Garrido A, Galanti D, Movilli A, Becker C, Bossdorf O, Drost HG, Weigel D. Transposon dynamics in the emerging oilseed crop Thlaspi arvense. PLoS Genet 2024; 20:e1011141. [PMID: 38295109 PMCID: PMC10881000 DOI: 10.1371/journal.pgen.1011141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 02/21/2024] [Accepted: 01/17/2024] [Indexed: 02/02/2024] Open
Abstract
Genome evolution is partly driven by the mobility of transposable elements (TEs) which often leads to deleterious effects, but their activity can also facilitate genetic novelty and catalyze local adaptation. We explored how the intraspecific diversity of TE polymorphisms might contribute to the broad geographic success and adaptive capacity of the emerging oil crop Thlaspi arvense (field pennycress). We classified the TE inventory based on a high-quality genome assembly, estimated the age of retrotransposon TE families and comprehensively assessed their mobilization potential. A survey of 280 accessions from 12 regions across the Northern hemisphere allowed us to quantify over 90,000 TE insertion polymorphisms (TIPs). Their distribution mirrored the genetic differentiation as measured by single nucleotide polymorphisms (SNPs). The number and types of mobile TE families vary substantially across populations, but there are also shared patterns common to all accessions. Ty3/Athila elements are the main drivers of TE diversity in T. arvense populations, while a single Ty1/Alesia lineage might be particularly important for transcriptome divergence. The number of retrotransposon TIPs is associated with variation at genes related to epigenetic regulation, including an apparent knockout mutation in BROMODOMAIN AND ATPase DOMAIN-CONTAINING PROTEIN 1 (BRAT1), while DNA transposons are associated with variation at the HSP19 heat shock protein gene. We propose that the high rate of mobilization activity can be harnessed for targeted gene expression diversification, which may ultimately present a toolbox for the potential use of transposition in breeding and domestication of T. arvense.
Collapse
Affiliation(s)
| | - Dario Galanti
- Plant Evolutionary Ecology, University of Tübingen, Tübingen, Germany
| | - Andrea Movilli
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Claude Becker
- LMU Biocenter, Faculty of Biology, Ludwig Maximilians University Munich, Martinsried, Germany
| | - Oliver Bossdorf
- Plant Evolutionary Ecology, University of Tübingen, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Max Planck Institute for Biology Tübingen,Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| |
Collapse
|
12
|
Sinha R, Pal RK, De RK. ENLIGHTENMENT: A Scalable Annotated Database of Genomics and NGS-Based Nucleotide Level Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:155-168. [PMID: 38055361 DOI: 10.1109/tcbb.2023.3340067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
The revolution in sequencing technologies has enabled human genomes to be sequenced at a very low cost and time leading to exponential growth in the availability of whole-genome sequences. However, the complete understanding of our genome and its association with cancer is a far way to go. Researchers are striving hard to detect new variants and find their association with diseases, which further gives rise to the need for aggregation of this Big Data into a common standard scalable platform. In this work, a database named Enlightenment has been implemented which makes the availability of genomic data integrated from eight public databases, and DNA sequencing profiles of H. sapiens in a single platform. Annotated results with respect to cancer specific biomarkers, pharmacogenetic biomarkers and its association with variability in drug response, and DNA profiles along with novel copy number variants are computed and stored, which are accessible through a web interface. In order to overcome the challenge of storage and processing of NGS technology-based whole-genome DNA sequences, Enlightenment has been extended and deployed to a flexible and horizontally scalable database HBase, which is distributed over a hadoop cluster, which would enable the integration of other omics data into the database for enlightening the path towards eradication of cancer.
Collapse
|
13
|
Oketch DJA, Giulietti M, Piva F. Copy Number Variations in Pancreatic Cancer: From Biological Significance to Clinical Utility. Int J Mol Sci 2023; 25:391. [PMID: 38203561 PMCID: PMC10779192 DOI: 10.3390/ijms25010391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 12/20/2023] [Accepted: 12/24/2023] [Indexed: 01/12/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is the most common type of pancreatic cancer, characterized by high tumor heterogeneity and a poor prognosis. Inter- and intra-tumoral heterogeneity in PDAC is a major obstacle to effective PDAC treatment; therefore, it is highly desirable to explore the tumor heterogeneity and underlying mechanisms for the improvement of PDAC prognosis. Gene copy number variations (CNVs) are increasingly recognized as a common and heritable source of inter-individual variation in genomic sequence. In this review, we outline the origin, main characteristics, and pathological aspects of CNVs. We then describe the occurrence of CNVs in PDAC, including those that have been clearly shown to have a pathogenic role, and further highlight some key examples of their involvement in tumor development and progression. The ability to efficiently identify and analyze CNVs in tumor samples is important to support translational research and foster precision oncology, as copy number variants can be utilized to guide clinical decisions. We provide insights into understanding the CNV landscapes and the role of both somatic and germline CNVs in PDAC, which could lead to significant advances in diagnosis, prognosis, and treatment. Although there has been significant progress in this field, understanding the full contribution of CNVs to the genetic basis of PDAC will require further research, with more accurate CNV assays such as single-cell techniques and larger cohorts than have been performed to date.
Collapse
Affiliation(s)
| | - Matteo Giulietti
- Department of Specialistic Clinical and Odontostomatological Sciences, Polytechnic University of Marche, 60131 Ancona, Italy
| | - Francesco Piva
- Department of Specialistic Clinical and Odontostomatological Sciences, Polytechnic University of Marche, 60131 Ancona, Italy
| |
Collapse
|
14
|
Louw N, Carstens N, Lombard Z. Incorporating CNV analysis improves the yield of exome sequencing for rare monogenic disorders-an important consideration for resource-constrained settings. Front Genet 2023; 14:1277784. [PMID: 38155715 PMCID: PMC10753787 DOI: 10.3389/fgene.2023.1277784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/22/2023] [Indexed: 12/30/2023] Open
Abstract
Exome sequencing (ES) is a recommended first-tier diagnostic test for many rare monogenic diseases. It allows for the detection of both single-nucleotide variants (SNVs) and copy number variants (CNVs) in coding exonic regions of the genome in a single test, and this dual analysis is a valuable approach, especially in limited resource settings. Single-nucleotide variants are well studied; however, the incorporation of copy number variant analysis tools into variant calling pipelines has not been implemented yet as a routine diagnostic test, and chromosomal microarray is still more widely used to detect copy number variants. Research shows that combined single and copy number variant analysis can lead to a diagnostic yield of up to 58%, increasing the yield with as much as 18% from the single-nucleotide variant only pipeline. Importantly, this is achieved with the consideration of computational costs only, without incurring any additional sequencing costs. This mini review provides an overview of copy number variant analysis from exome data and what the current recommendations are for this type of analysis. We also present an overview on rare monogenic disease research standard practices in resource-limited settings. We present evidence that integrating copy number variant detection tools into a standard exome sequencing analysis pipeline improves diagnostic yield and should be considered a significantly beneficial addition, with relatively low-cost implications. Routine implementation in underrepresented populations and limited resource settings will promote generation and sharing of CNV datasets and provide momentum to build core centers for this niche within genomic medicine.
Collapse
Affiliation(s)
- Nadja Louw
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Nadia Carstens
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Genomics Platform, South African Medical Research Council, Cape Town, South Africa
| | - Zané Lombard
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | |
Collapse
|
15
|
Li C, Fan S, Zhao H, Liu X. CNV-FB: A Feature bagging strategy-based approach to detect copy number variants from NGS data. J Bioinform Comput Biol 2023; 21:2350026. [PMID: 38212874 DOI: 10.1142/s0219720023500269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Copy number variation (CNV), as a type of genomic structural variation, accounts for a large proportion of structural variation and is related to the pathogenesis and susceptibility to some human diseases, playing an important role in the development and change of human diseases. The development of next-generation sequencing technology (NGS) provides strong support for the design of CNV detection algorithms. Although a large number of methods have been developed to detect CNVs using NGS data, it is still considered a difficult problem to detect CNVs with low purity and coverage. In this paper, a new calculation method CNV-FB is proposed to detect CNVs from NGS data. The core idea of CNV-FB is to randomly sample the read depth values of the genome fragment, and then each sample is individually detected for outliers, and finally combined into a final outlier score. The CNV-FB method was applied to simulation data and real data experiments and compared with the other five methods of the same type. The results show that the CNV-FB method has a better detection effect than other methods. Therefore, the CNV-FB method may be an effective algorithm for detecting genomic mutations.
Collapse
Affiliation(s)
- Chengyou Li
- School of Computer Science, Liaocheng University, Liaocheng 252000, P. R. China
| | - Shiqiang Fan
- School of Computer Science, Liaocheng University, Liaocheng 252000, P. R. China
| | - Haiyong Zhao
- School of Computer Science, Liaocheng University, Liaocheng 252000, P. R. China
| | - Xiaotong Liu
- School of Agronomy and Agricultural Engineering, Liaocheng University, Liaocheng 252000, P. R. China
| |
Collapse
|
16
|
Ahmad SF, Chandrababu Shailaja C, Vaishnav S, Kumar A, Gaur GK, Janga SC, Ahmad SM, Malla WA, Dutt T. Read-depth based approach on whole genome resequencing data reveals important insights into the copy number variation (CNV) map of major global buffalo breeds. BMC Genomics 2023; 24:616. [PMID: 37845620 PMCID: PMC10580622 DOI: 10.1186/s12864-023-09720-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/05/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND Elucidating genome-wide structural variants including copy number variations (CNVs) have gained increased significance in recent times owing to their contribution to genetic diversity and association with important pathophysiological states. The present study aimed to elucidate the high-resolution CNV map of six different global buffalo breeds using whole genome resequencing data at two coverages (10X and 30X). Post-quality control, the sequence reads were aligned to the latest draft release of the Bubaline genome. The genome-wide CNVs were elucidated using a read-depth approach in CNVnator with different bin sizes. Adjacent CNVs were concatenated into copy number variation regions (CNVRs) in different breeds and their genomic coverage was elucidated. RESULTS Overall, the average size of CNVR was lower at 30X coverage, providing finer details. Most of the CNVRs were either deletion or duplication type while the occurrence of mixed events was lesser in number on a comparative basis in all breeds. The average CNVR size was lower at 30X coverage (0.201 Mb) as compared to 10X (0.013 Mb) with the finest variants in Banni buffaloes. The maximum number of CNVs was observed in Murrah (2627) and Pandharpuri (25,688) at 10X and 30X coverages, respectively. Whereas the minimum number of CNVs were scored in Surti at both coverages (2092 and 17,373). On the other hand, the highest and lowest number of CNVRs were scored in Jaffarabadi (833 and 10,179 events) and Surti (783 and 7553 events) at both coverages. Deletion events overnumbered duplications in all breeds at both coverages. Gene profiling of common overlapped genes and longest CNVRs provided important insights into the evolutionary history of these breeds and indicate the genomic regions under selection in respective breeds. CONCLUSION The present study is the first of its kind to elucidate the high-resolution CNV map in major buffalo populations using a read-depth approach on whole genome resequencing data. The results revealed important insights into the divergence of major global buffalo breeds along the evolutionary timescale.
Collapse
Affiliation(s)
- Sheikh Firdous Ahmad
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India.
| | - Celus Chandrababu Shailaja
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India
| | - Sakshi Vaishnav
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India
| | - Amit Kumar
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India
| | - Gyanendra Kumar Gaur
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India
| | - Sarath Chandra Janga
- Luddy School of Informatics, Computing & Engineering, Indiana University Indianapolis (IUI), Indianapolis, 46202, USA
| | - Syed Mudasir Ahmad
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and AH, Sher-e-Kashmir University of Agricultural Sciences and Technology, Srinagar, Jammu & Kashmir, 190006, India.
| | - Waseem Akram Malla
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India
| | - Triveni Dutt
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India
| |
Collapse
|
17
|
Kobayashi K, Kawazu M, Yoshimoto S, Ueno T, Omura G, Saito Y, Ando M, Ryo E, Sakyo A, Yoshida A, Yatabe Y, Mano H, Mori T. Genome Doubling Shapes High-Grade Transformation and Novel EWSR1::LARP4 Fusion Shows SOX10 Immunostaining in Hyalinizing Clear Cell Carcinoma of Salivary Gland. J Transl Med 2023; 103:100213. [PMID: 37479138 DOI: 10.1016/j.labinv.2023.100213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/04/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023] Open
Abstract
Hyalinizing clear cell carcinoma (HCCC) is a rare indolent malignant tumor of minor salivary gland origin with EWSR1::ATF1 rearrangement. Pathologically, the tumor cells possess a clear cytoplasm in a background of hyalinized stroma. Generally, the tumor cells are positive for p63 and p40 and negative for s100 and α-smooth muscle actin, suggesting that they differentiate into squamous epithelium and not into myoepithelium. In this study, we performed a detailed histopathological and genomic analysis of 6 cases of HCCC, including 2 atypical subtypes-a case of "high-grade transformation" and 1 "possessing a novel partner gene for EWSR1." We performed a sequential analysis of the primary and recurrent tumor by whole-exome sequencing, RNA sequencing, Sanger sequencing, and fluorescence in situ hybridization to investigate the effect of genomic changes on histopathology and clinical prognosis. A fusion gene involving the EWSR1 gene was detected in all cases. Five cases, including the "high-grade transformation," harbored a known EWSR1::ATF1 fusion gene; however, 1 case harbored a novel EWSR1::LARP4 fusion gene. This novel EWSR1::LARP4-fused HCCC has a SOX10-positive staining, which is different from the EWSR1::ATF1-fused HCCC. According to whole-exome sequencing and fluorescence in situ hybridization analysis, the "whole-genome doubling" and focal deletion involving CDKN2A, CDKN2B, and PTEN were detected in HCCC with "high-grade transformation." Conclusively, we identified a novel partner gene for EWSR1, LARP4, in indolent HCCC. Importantly, "high-grade transformation" and poor prognosis were caused by whole-genome doubling and subsequent genomic aberrations.
Collapse
Affiliation(s)
- Kenya Kobayashi
- Department of Otolaryngology, Head and Neck Surgery, The University of Tokyo, Tokyo, Japan; Department of Head and Neck Surgery, National Cancer Center Hospital, Tokyo, Japan
| | - Masahito Kawazu
- Division of Cell Therapy, Chiba Cancer Center, Chiba, Japan; Division of Cell Signaling, National Cancer Center Research Institute, Tokyo, Japan
| | - Seiichi Yoshimoto
- Department of Head and Neck Surgery, National Cancer Center Hospital, Tokyo, Japan
| | - Toshihide Ueno
- Division of Cell Signaling, National Cancer Center Research Institute, Tokyo, Japan
| | - Go Omura
- Department of Head and Neck Surgery, National Cancer Center Hospital, Tokyo, Japan
| | - Yuki Saito
- Department of Otolaryngology, Head and Neck Surgery, The University of Tokyo, Tokyo, Japan
| | - Mizuo Ando
- Department of Otolaryngology, Head and Neck Surgery, Okayama University Graduate School of Medicine, Okayama, Japan
| | - Eigitsu Ryo
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Airi Sakyo
- Department of Diagnostic Pathology, National Cancer Center Hospital, Tokyo, Japan
| | - Akihiko Yoshida
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan; Department of Diagnostic Pathology, National Cancer Center Hospital, Tokyo, Japan
| | - Yasushi Yatabe
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan; Department of Diagnostic Pathology, National Cancer Center Hospital, Tokyo, Japan
| | - Hiroyuki Mano
- Division of Cell Signaling, National Cancer Center Research Institute, Tokyo, Japan
| | - Taisuke Mori
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan; Department of Diagnostic Pathology, National Cancer Center Hospital, Tokyo, Japan.
| |
Collapse
|
18
|
Liu G, Yang H, He Z. Detection of copy number variations based on a local distance using next-generation sequencing data. Front Genet 2023; 14:1147761. [PMID: 37811148 PMCID: PMC10556732 DOI: 10.3389/fgene.2023.1147761] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open
Abstract
As one of the main types of structural variation in the human genome, copy number variation (CNV) plays an important role in the occurrence and development of human cancers. Next-generation sequencing (NGS) technology can provide base-level resolution, which provides favorable conditions for the accurate detection of CNVs. However, it is still a very challenging task to accurately detect CNVs from cancer samples with different purity and low sequencing coverage. Local distance-based CNV detection (LDCNV), an innovative computational approach to predict CNVs using NGS data, is proposed in this work. LDCNV calculates the average distance between each read depth (RD) and its k nearest neighbors (KNNs) to define the distance of KNNs of each RD, and the average distance between the KNNs for each RD to define their internal distance. Based on the above definitions, a local distance score is constructed using the ratio between the distance of KNNs and the internal distance of KNNs for each RD. The local distance scores are used to fit a normal distribution to evaluate the significance level of each RDS, and then use the hypothesis test method to predict the CNVs. The performance of the proposed method is verified with simulated and real data and compared with several popular methods. The experimental results show that the proposed method is superior to various other techniques. Therefore, the proposed method can be helpful for cancer diagnosis and targeted drug development.
Collapse
Affiliation(s)
- Guojun Liu
- School of Mathematics, Xi’an University of Finance and Economics, Xi’an, China
| | - Hongzhi Yang
- Department of Radiology, XD Group Hospital, Xi’an, China
| | - Zongzhen He
- School of Mathematics, Xi’an University of Finance and Economics, Xi’an, China
| |
Collapse
|
19
|
Katche EI, Schierholt A, Schiessl SV, He F, Lv Z, Batley J, Becker HC, Mason AS. Genetic factors inherited from both diploid parents interact to affect genome stability and fertility in resynthesized allotetraploid Brassica napus. G3 (BETHESDA, MD.) 2023; 13:jkad136. [PMID: 37313757 PMCID: PMC10411605 DOI: 10.1093/g3journal/jkad136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/15/2023]
Abstract
Established allopolyploids are known to be genomically stable and fertile. However, in contrast, most newly resynthesized allopolyploids are infertile and meiotically unstable. Identifying the genetic factors responsible for genome stability in newly formed allopolyploid is key to understanding how 2 genomes come together to form a species. One hypothesis is that established allopolyploids may have inherited specific alleles from their diploid progenitors which conferred meiotic stability. Resynthesized Brassica napus lines are often unstable and infertile, unlike B. napus cultivars. We tested this hypothesis by characterizing 41 resynthesized B. napus lines produced by crosses between 8 Brassica rapa and 8 Brassica oleracea lines for copy number variation resulting from nonhomologous recombination events and fertility. We resequenced 8 B. rapa and 5 B. oleracea parent accessions and analyzed 19 resynthesized lines for allelic variation in a list of meiosis gene homologs. SNP genotyping was performed using the Illumina Infinium Brassica 60K array for 3 individuals per line. Self-pollinated seed set and genome stability (number of copy number variants) were significantly affected by the interaction between both B. rapa and B. oleracea parental genotypes. We identified 13 putative meiosis gene candidates which were significantly associated with frequency of copy number variants and which contained putatively harmful mutations in meiosis gene haplotypes for further investigation. Our results support the hypothesis that allelic variants inherited from parental genotypes affect genome stability and fertility in resynthesized rapeseed.
Collapse
Affiliation(s)
- Elizabeth Ihien Katche
- Plant Breeding Department, University of Bonn, Bonn 53115, Germany
- Department of Plant Breeding, Justus Liebig University, Giessen 35392, Germany
| | - Antje Schierholt
- Department of Crop Sciences, Division of Plant Breeding Methodology, Georg-August University Göttingen, Göttingen 37073, Germany
| | - Sarah-Veronica Schiessl
- Department of Plant Breeding, Justus Liebig University, Giessen 35392, Germany
- Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt am Main D-60325, Germany
| | - Fei He
- Plant Breeding Department, University of Bonn, Bonn 53115, Germany
| | - Zhenling Lv
- Plant Breeding Department, University of Bonn, Bonn 53115, Germany
- Department of Plant Breeding, Justus Liebig University, Giessen 35392, Germany
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - Heiko C Becker
- Department of Crop Sciences, Division of Plant Breeding Methodology, Georg-August University Göttingen, Göttingen 37073, Germany
| | - Annaliese S Mason
- Plant Breeding Department, University of Bonn, Bonn 53115, Germany
- Department of Plant Breeding, Justus Liebig University, Giessen 35392, Germany
| |
Collapse
|
20
|
Laufer VA, Glover TW, Wilson TE. Applications of advanced technologies for detecting genomic structural variation. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 792:108475. [PMID: 37931775 PMCID: PMC10792551 DOI: 10.1016/j.mrrev.2023.108475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/08/2023]
Abstract
Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.
Collapse
Affiliation(s)
- Vincent A Laufer
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas W Glover
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas E Wilson
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
21
|
Simonin M, Andrieu GP, Birsen R, Balsat M, Hypolite G, Courtois L, Graux C, Grardel N, Cayuela JM, Huguet F, Chalandon Y, Le Bris Y, Macintyre E, Gandemer V, Petit A, Rousselot P, Baruchel A, Bouscary D, Hermine O, Boissel N, Asnafi V. Prognostic value and oncogenic landscape of TP53 alterations in adult and pediatric T-ALL. Blood 2023; 141:1353-1358. [PMID: 36599110 DOI: 10.1182/blood.2022017755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 01/06/2023] Open
Affiliation(s)
- Mathieu Simonin
- Laboratory of Onco-Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker Enfants-Malades, Université de Paris Cité, Paris, France
- Institut Necker-Enfants Malades, INSERM U1151, Paris, France
- Department of Pediatric Hematology and Oncology, Assistance Publique-Hôpitaux de Paris, Armand Trousseau Hospital, Sorbonne Université, Paris, France
| | - Guillaume P Andrieu
- Laboratory of Onco-Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker Enfants-Malades, Université de Paris Cité, Paris, France
- Institut Necker-Enfants Malades, INSERM U1151, Paris, France
| | - Rudy Birsen
- Department of Hematology, Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université de Paris Cité, Paris, France
- Institut Cochin, INSERM U1016, Paris, France
| | - Marie Balsat
- Hospices Civils de Lyon, Service d'Hématologie Clinique, Centre Hospitalier Lyon-Sud, Pierre-Bénite, France
| | - Guillaume Hypolite
- Laboratory of Onco-Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker Enfants-Malades, Université de Paris Cité, Paris, France
- Institut Necker-Enfants Malades, INSERM U1151, Paris, France
| | - Lucien Courtois
- Laboratory of Onco-Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker Enfants-Malades, Université de Paris Cité, Paris, France
- Institut Necker-Enfants Malades, INSERM U1151, Paris, France
| | - Carlos Graux
- CHU UCLouvain Namur-Godinne, service d'Hématologie, Yvoir, Belgium
| | - Nathalie Grardel
- Laboratory of Hematology, CHRU Lille, Lille, France
- INSERM U1172, Lille, France
| | - Jean-Michel Cayuela
- Laboratory of Hematology and EA3518, Saint-Louis University Hospital, Université de Paris Cité, Paris, France
| | - Françoise Huguet
- Department of Hematology, CHRU-Institut Universitaire de Cancer Toulouse-Oncopole, Toulouse, France
| | - Yves Chalandon
- Division of Hematology, Department of Oncology, University Hospital of Geneva and Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss Group for Clinical Cancer Research, Bern, Switzerland
| | - Yannick Le Bris
- Hematology Biology, Nantes University Hospital and Nantes-Angers Cancer and Immunology Research Center, Nantes, France
| | - Elizabeth Macintyre
- Laboratory of Onco-Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker Enfants-Malades, Université de Paris Cité, Paris, France
- Department of Pediatric Hematology and Oncology, Assistance Publique-Hôpitaux de Paris, Armand Trousseau Hospital, Sorbonne Université, Paris, France
| | - Virginie Gandemer
- Department of Pediatric Hematology and Oncology, University Hospital of Rennes, Rennes, France
| | - Arnaud Petit
- Department of Pediatric Hematology and Oncology, Assistance Publique-Hôpitaux de Paris, Armand Trousseau Hospital, Sorbonne Université, Paris, France
| | - Philippe Rousselot
- Department of Hematology, Centre Hospitalier de Versailles, Le Chesnay, France
- Université Paris-Saclay, Communauté Paris-Saclay, France
| | - André Baruchel
- Department of Pediatric Hematology and Immunology, Assistance Publique-Hôpitaux de Paris, Robert Debré Hospital, Université de Paris Cité, Paris, France
| | - Didier Bouscary
- Department of Hematology, Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université de Paris Cité, Paris, France
- Institut Cochin, INSERM U1016, Paris, France
| | - Olivier Hermine
- Department of Hematology, INSERM U1163, IMAGINE Institute, Paris University, Necker Hospital, Paris, France
| | - Nicolas Boissel
- Université Paris Cité, Institut de Recherche Saint-Louis, URP-3518, Publique-Hôpitaux de Paris, Saint-Louis University Hospital, Paris, France
| | - Vahid Asnafi
- Laboratory of Onco-Hematology, Assistance Publique-Hôpitaux de Paris, Hôpital Necker Enfants-Malades, Université de Paris Cité, Paris, France
- Institut Necker-Enfants Malades, INSERM U1151, Paris, France
| |
Collapse
|
22
|
Liu G, Yang H, Yuan X. A shortest path-based approach for copy number variation detection from next-generation sequencing data. Front Genet 2023; 13:1084974. [PMID: 36733945 PMCID: PMC9887524 DOI: 10.3389/fgene.2022.1084974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/18/2023] Open
Abstract
Copy number variation (CNV) is one of the main structural variations in the human genome and accounts for a considerable proportion of variations. As CNVs can directly or indirectly cause cancer, mental illness, and genetic disease in humans, their effective detection in humans is of great interest in the fields of oncogene discovery, clinical decision-making, bioinformatics, and drug discovery. The advent of next-generation sequencing data makes CNV detection possible, and a large number of CNV detection tools are based on next-generation sequencing data. Due to the complexity (e.g., bias, noise, alignment errors) of next-generation sequencing data and CNV structures, the accuracy of existing methods in detecting CNVs remains low. In this work, we design a new CNV detection approach, called shortest path-based Copy number variation (SPCNV), to improve the detection accuracy of CNVs. SPCNV calculates the k nearest neighbors of each read depth and defines the shortest path, shortest path relation, and shortest path cost sets based on which further calculates the mean shortest path cost of each read depth and its k nearest neighbors. We utilize the ratio between the mean shortest path cost for each read depth and the mean of the mean shortest path cost of its k nearest neighbors to construct a relative shortest path score formula that is able to determine a score for each read depth. Based on the score profile, a boxplot is then applied to predict CNVs. The performance of the proposed method is verified by simulation data experiments and compared against several popular methods of the same type. Experimental results show that the proposed method achieves the best balance between recall and precision in each set of simulated samples. To further verify the performance of the proposed method in real application scenarios, we then select real sample data from the 1,000 Genomes Project to conduct experiments. The proposed method achieves the best F1-scores in almost all samples. Therefore, the proposed method can be used as a more reliable tool for the routine detection of CNVs.
Collapse
Affiliation(s)
- Guojun Liu
- School of Statistics, Xi’an University of Finance and Economics, Xi’an, China,*Correspondence: Guojun Liu, ; Xiguo Yuan,
| | - Hongzhi Yang
- Medical Imaging Center, Xidian Group Hospital, Xi’an, China
| | - Xiguo Yuan
- Hangzhou Institute of Technology, Xidian University, Hangzhou, China,*Correspondence: Guojun Liu, ; Xiguo Yuan,
| |
Collapse
|
23
|
Söylev A, Çokoglu SS, Koptekin D, Alkan C, Somel M. CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data. PLoS Comput Biol 2022; 18:e1010788. [PMID: 36516232 PMCID: PMC9873172 DOI: 10.1371/journal.pcbi.1010788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 01/24/2023] [Accepted: 12/03/2022] [Indexed: 12/15/2022] Open
Abstract
To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1×) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at ≥1×, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44×-26× (median 4×) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage.
Collapse
Affiliation(s)
- Arda Söylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- * E-mail: (AS); (MS)
| | | | - Dilek Koptekin
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| | - Mehmet Somel
- Department of Biology, Middle East Technical University, Ankara, Turkey
- * E-mail: (AS); (MS)
| |
Collapse
|
24
|
Gao GF, Oh C, Saksena G, Deng D, Westlake LC, Hill BA, Reich M, Schumacher SE, Berger AC, Carter SL, Cherniack AD, Meyerson M, Tabak B, Beroukhim R, Getz G. Tangent normalization for somatic copy-number inference in cancer genome analysis. Bioinformatics 2022; 38:4677-4686. [PMID: 36040167 PMCID: PMC9563697 DOI: 10.1093/bioinformatics/btac586] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 07/28/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Somatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas, the Broad Institute Genome Characterization Center developed the Tangent normalization method to generate copy-number profiles using data from single-nucleotide polymorphism (SNP) arrays and whole-exome sequencing (WES) technologies for over 10 000 pairs of tumors and matched normal samples. Here, we describe the Tangent method, which uses a unique linear combination of normal samples as a reference for each tumor sample, to subtract systematic errors that vary across samples. We also describe a modification of Tangent, called Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available. RESULTS Tangent normalization substantially increases signal-to-noise ratios (SNRs) compared to conventional normalization methods in both SNP array and WES analyses. Tangent and Pseudo-Tangent normalizations improve the SNR by reducing noise with minimal effect on signal and exceed the contribution of other steps in the analysis such as choice of segmentation algorithm. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data. AVAILABILITY AND IMPLEMENTATION Tangent is available at https://github.com/broadinstitute/tangent and as a Docker image (https://hub.docker.com/r/broadinstitute/tangent). Tangent is also the normalization method for the copy-number pipeline in Genome Analysis Toolkit 4 (GATK4). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Galen F Gao
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Coyin Oh
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gordon Saksena
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Davy Deng
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | | | - Barbara A Hill
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael Reich
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Division of Medical Genetics, University of California, San Diego, La, Jolla, CA, USA
| | - Steven E Schumacher
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ashton C Berger
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Scott L Carter
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Matthew Meyerson
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Barbara Tabak
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Rameen Beroukhim
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gad Getz
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
25
|
Ming C, Wang M, Wang Q, Neff R, Wang E, Shen Q, Reddy JS, Wang X, Allen M, Ertekin‐Taner N, De Jager PL, Bennett DA, Haroutunian V, Schadt E, Zhang B. Whole genome sequencing-based copy number variations reveal novel pathways and targets in Alzheimer's disease. Alzheimers Dement 2022; 18:1846-1867. [PMID: 34918867 PMCID: PMC9264340 DOI: 10.1002/alz.12507] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 09/21/2021] [Accepted: 09/21/2021] [Indexed: 01/28/2023]
Abstract
INTRODUCTION A few copy number variations (CNVs) have been reported for Alzheimer's disease (AD). However, there is a lack of a systematic investigation of CNVs in AD based on whole genome sequencing (WGS) data. METHODS We used four methods to identify consensus CNVs from the WGS data of 1,411 individuals and further investigated their functional roles in AD using the matched transcriptomic and clinicopathological data. RESULTS We identified 3,012 rare AD-specific CNVs whose residing genes are enriched for cellular glucuronidation and neuron projection pathways. Genes whose mRNA expressions are significantly correlated with common CNVs are involved in major histocompatibility complex class II receptor activity. Integration of CNVs, gene expression, and clinical and pathological traits further pinpoints a key CNV that potentially regulates immune response in AD. DISCUSSION We identify CNVs as potential genetic regulators of immune response in AD. The identified CNVs and their downstream gene networks reveal novel pathways and targets for AD.
Collapse
Affiliation(s)
- Chen Ming
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Minghui Wang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Qian Wang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ryan Neff
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Erming Wang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Qi Shen
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Joseph S. Reddy
- Department of Quantitative Health SciencesMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Xue Wang
- Department of Quantitative Health SciencesMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Mariet Allen
- Department of NeuroscienceMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Nilüfer Ertekin‐Taner
- Department of NeuroscienceMayo Clinic FloridaJacksonvilleFloridaUSA
- Department of NeurologyMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Philip L. De Jager
- Center for Translational & Computational NeuroimmunologyDepartment of Neurology and the Taub InstituteColumbia University Medical CenterNew YorkNew YorkUSA
- The Broad Institute of MIT and HarvardCambridgeMassachusettsUSA
| | - David A. Bennett
- Rush Alzheimer's Disease CenterRush University Medical CenterChicagoIllinoisUSA
| | - Vahram Haroutunian
- Nash Family Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of PsychiatryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Alzheimer's Disease Research CenterIcahn School of Medicine at Mount SinaiNew YorkNew York
- PsychiatryJJ Peters VA Medical CenterBronxNew YorkUSA
| | - Eric Schadt
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
26
|
Davoudi P, Do DN, Rathgeber B, Colombo SM, Sargolzaei M, Plastow G, Wang Z, Karimi K, Hu G, Valipour S, Miar Y. Genome-wide detection of copy number variation in American mink using whole-genome sequencing. BMC Genomics 2022; 23:649. [PMID: 36096727 PMCID: PMC9468235 DOI: 10.1186/s12864-022-08874-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 09/05/2022] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Copy number variations (CNVs) represent a major source of genetic diversity and contribute to the phenotypic variation of economically important traits in livestock species. In this study, we report the first genome-wide CNV analysis of American mink using whole-genome sequence data from 100 individuals. The analyses were performed by three complementary software programs including CNVpytor, DELLY and Manta. RESULTS A total of 164,733 CNVs (144,517 deletions and 20,216 duplications) were identified representing 5378 CNV regions (CNVR) after merging overlapping CNVs, covering 47.3 Mb (1.9%) of the mink autosomal genome. Gene Ontology and KEGG pathway enrichment analyses of 1391 genes that overlapped CNVR revealed potential role of CNVs in a wide range of biological, molecular and cellular functions, e.g., pathways related to growth (regulation of actin cytoskeleton, and cAMP signaling pathways), behavior (axon guidance, circadian entrainment, and glutamatergic synapse), lipid metabolism (phospholipid binding, sphingolipid metabolism and regulation of lipolysis in adipocytes), and immune response (Wnt signaling, Fc receptor signaling, and GTPase regulator activity pathways). Furthermore, several CNVR-harbored genes associated with fur characteristics and development (MYO5A, RAB27B, FGF12, SLC7A11, EXOC2), and immune system processes (SWAP70, FYN, ORAI1, TRPM2, and FOXO3). CONCLUSIONS This study presents the first genome-wide CNV map of American mink. We identified 5378 CNVR in the mink genome and investigated genes that overlapped with CNVR. The results suggest potential links with mink behaviour as well as their possible impact on fur quality and immune response. Overall, the results provide new resources for mink genome analysis, serving as a guideline for future investigations in which genomic structural variations are present.
Collapse
Affiliation(s)
- Pourya Davoudi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Duy Ngoc Do
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Bruce Rathgeber
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Stefanie M Colombo
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Mehdi Sargolzaei
- Department of Pathobiology, University of Guelph, Guelph, ON, Canada
- Select Sires Inc., Plain City, OH, USA
| | - Graham Plastow
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Zhiquan Wang
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Karim Karimi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Guoyu Hu
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Shafagh Valipour
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| | - Younes Miar
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada.
| |
Collapse
|
27
|
Sharma D, Denmat SHL, Matzke NJ, Hannan K, Hannan RD, O'Sullivan JM, Ganley ARD. A new method for determining ribosomal DNA copy number shows differences between Saccharomyces cerevisiae populations. Genomics 2022; 114:110430. [PMID: 35830947 DOI: 10.1016/j.ygeno.2022.110430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 05/23/2022] [Accepted: 07/04/2022] [Indexed: 11/26/2022]
Abstract
Ribosomal DNA genes (rDNA) encode the major ribosomal RNAs and in eukaryotes typically form tandem repeat arrays. Species have characteristic rDNA copy numbers, but there is substantial intra-species variation in copy number that results from frequent rDNA recombination. Copy number differences can have phenotypic consequences, however difficulties in quantifying copy number mean we lack a comprehensive understanding of how copy number evolves and the consequences. Here we present a genomic sequence read approach to estimate rDNA copy number based on modal coverage to help overcome limitations with existing mean coverage-based approaches. We validated our method using Saccharomyces cerevisiae strains with known rDNA copy numbers. Application of our pipeline to a global sample of S. cerevisiae isolates showed that different populations have different rDNA copy numbers. Our results demonstrate the utility of the modal coverage method, and highlight the high level of rDNA copy number variation within and between populations.
Collapse
Affiliation(s)
- Diksha Sharma
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Sylvie Hermann-Le Denmat
- School of Biological Sciences, University of Auckland, Auckland, New Zealand; Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France
| | - Nicholas J Matzke
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Katherine Hannan
- ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, ACT 2601, Australia; Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Ross D Hannan
- ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, ACT 2601, Australia; Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Victoria 3010, Australia; Division of Research, Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia; Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria 3010, Australia; Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3168, Australia
| | - Justin M O'Sullivan
- Liggins Institute, University of Auckland, Auckland, New Zealand; Maurice Wilkins Center, University of Auckland, New Zealand; MRC Lifecourse Unit, University of Southampton, United Kingdom; Brain Research New Zealand, The University of Auckland, Auckland, New Zealand
| | - Austen R D Ganley
- School of Biological Sciences, University of Auckland, Auckland, New Zealand.
| |
Collapse
|
28
|
Mateus ID, Auxier B, Ndiaye MMS, Cruz J, Lee SJ, Sanders IR. Reciprocal recombination genomic signatures in the symbiotic arbuscular mycorrhizal fungi Rhizophagus irregularis. PLoS One 2022; 17:e0270481. [PMID: 35776745 PMCID: PMC9249182 DOI: 10.1371/journal.pone.0270481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 06/12/2022] [Indexed: 11/24/2022] Open
Abstract
Arbuscular mycorrhizal fungi (AMF) are part of the most widespread fungal-plant symbiosis. They colonize at least 80% of plant species, promote plant growth and plant diversity. These fungi are multinucleated and contain either one or two haploid nuclear genotypes (monokaryon and dikaryon) identified by the alleles at a putative mating-type locus. This taxon has been considered as an ancient asexual scandal because of the lack of observable sexual structures. Despite identification of a putative mating-type locus and functional activation of genes related to mating when two isolates co-exist, it remains unknown if the AMF life cycle involves a sexual or parasexual stage. We used publicly available genome sequences to test if Rhizophagus irregularis dikaryon genomes display signatures of sexual reproduction in the form of reciprocal recombination patterns, or if they display exclusively signatures of parasexual reproduction involving gene conversion. We used short-read and long-read sequence data to identify nucleus-specific alleles within dikaryons and then compared them to orthologous gene sequences from related monokaryon isolates displaying the same putative MAT-types as the dikaryon. We observed that the two nucleus-specific alleles of the dikaryon A5 are more related to the homolog sequences of monokaryon isolates displaying the same putative MAT-type than between each other. We also observed that these nucleus-specific alleles displayed reciprocal recombination signatures. These results confirm that dikaryon and monokaryon isolates displaying the same putative MAT-type are related in their life-cycle. These results suggest that a genetic exchange mechanism, involving reciprocal recombination in dikaryon genomes, allows AMF to generate genetic diversity.
Collapse
Affiliation(s)
- Ivan D. Mateus
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- * E-mail:
| | - Ben Auxier
- Laboratory of Genetics, Wageningen University, Wageningen, The Netherlands
| | - Mam M. S. Ndiaye
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Joaquim Cruz
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Soon-Jae Lee
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Ian R. Sanders
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
29
|
Sarwal V, Niehus S, Ayyala R, Kim M, Sarkar A, Chang S, Lu A, Rajkumar N, Darfci-Maher N, Littman R, Chhugani K, Soylev A, Comarova Z, Wesel E, Castellanos J, Chikka R, Distler MG, Eskin E, Flint J, Mangul S. A comprehensive benchmarking of WGS-based deletion structural variant callers. Brief Bioinform 2022; 23:6618239. [PMID: 35753701 DOI: 10.1093/bib/bbac221] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 04/30/2022] [Accepted: 05/11/2022] [Indexed: 01/10/2023] Open
Abstract
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA.,Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
| | - Sebastian Niehus
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany.,Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
| | - Ram Ayyala
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Minyoung Kim
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
| | - Aditya Sarkar
- School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, Mandi, Himachal Pradesh 175001, India
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Angela Lu
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Neha Rajkumar
- Department of Bioengineering, Department of Bioengineering, University of California Los Angeles, Los Angeles, CA, 90095
| | - Nicholas Darfci-Maher
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Russell Littman
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Karishma Chhugani
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| | - Arda Soylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Zoia Comarova
- Department Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, United States
| | - Emily Wesel
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Jacqueline Castellanos
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Rahul Chikka
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Margaret G Distler
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, 695 Charles E. Young Drive South, Box 708822, Los Angeles, CA, 90095, USA.,Department of Computational Medicine, David Geffen School of Medicine at UCLA, 73-235 CHS, Los Angeles, CA, 90095, USA
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90095, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| |
Collapse
|
30
|
Prodanov T, Bansal V. Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing. Nat Commun 2022; 13:3221. [PMID: 35680869 PMCID: PMC9184528 DOI: 10.1038/s41467-022-30930-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/20/2022] [Indexed: 11/09/2022] Open
Abstract
The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.
Collapse
Affiliation(s)
- Timofey Prodanov
- Bioinformatics and Systems Biology Graduate Program, University of California, La Jolla, San Diego, CA, 92093, USA
| | - Vikas Bansal
- Department of Pediatrics, School of Medicine, University of California, La Jolla, San Diego, CA, 92093, USA.
| |
Collapse
|
31
|
Wang X, Junqing L, Huang T. CNVABNN: An AdaBoost algorithm and neural networks-based detection of copy number variations from NGS data. Comput Biol Chem 2022; 99:107720. [DOI: 10.1016/j.compbiolchem.2022.107720] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 06/22/2022] [Accepted: 06/23/2022] [Indexed: 11/03/2022]
|
32
|
Späth GF, Bussotti G. GIP: an open-source computational pipeline for mapping genomic instability from protists to cancer cells. Nucleic Acids Res 2022; 50:e36. [PMID: 34928370 PMCID: PMC8989552 DOI: 10.1093/nar/gkab1237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 11/01/2021] [Accepted: 12/03/2021] [Indexed: 11/25/2022] Open
Abstract
Genome instability has been recognized as a key driver for microbial and cancer adaptation and thus plays a central role in many diseases. Genome instability encompasses different types of genomic alterations, yet most available genome analysis software are limited to just one type of mutation. To overcome this limitation and better understand the role of genetic changes in enhancing pathogenicity we established GIP, a novel, powerful bioinformatic pipeline for comparative genome analysis. Here, we show its application to whole genome sequencing datasets of Leishmania, Plasmodium, Candida and cancer. Applying GIP on available data sets validated our pipeline and demonstrated the power of our tool to drive biological discovery. Applied to Plasmodium vivax genomes, our pipeline uncovered the convergent amplification of erythrocyte binding proteins and identified a nullisomic strain. Re-analyzing genomes of drug adapted Candida albicans strains revealed correlated copy number variations of functionally related genes, strongly supporting a mechanism of epistatic adaptation through interacting gene-dosage changes. Our results illustrate how GIP can be used for the identification of aneuploidy, gene copy number variations, changes in nucleic acid sequences, and chromosomal rearrangements. Altogether, GIP can shed light on the genetic bases of cell adaptation and drive disease biomarker discovery.
Collapse
Affiliation(s)
- Gerald F Späth
- Institut Pasteur, Université de Paris, INSERM U1201, Unité de Parasitologie moléculaire et Signalisation, Paris, France
| | - Giovanni Bussotti
- Institut Pasteur, Université de Paris, INSERM U1201, Unité de Parasitologie moléculaire et Signalisation, Paris, France
- Institut Pasteur, Université de Paris, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| |
Collapse
|
33
|
Wei P, Gao Y, Zhang J, Lin J, Liu H, Chen K, Lin W, Wang X, Wang C, Liu C. Diagnosis of lung squamous cell carcinoma based on metagenomic Next-Generation Sequencing. BMC Pulm Med 2022; 22:108. [PMID: 35346137 PMCID: PMC8958490 DOI: 10.1186/s12890-022-01894-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 03/15/2022] [Indexed: 11/10/2022] Open
Abstract
Background The clinical treatment of patients suspected of pulmonary infections often rely on empirical antibiotics. However, preliminary diagnoses were based on clinical manifestations and conventional microbiological tests, which could later be proved wrong. In this case, we presented a patient whose initial diagnosis was lung abscess, but antibiotic treatments had no effect, and metagenomic Next-Generation Sequencing (mNGS) indicated presence of neoplasm.
Case presentation A 62-year-old female was diagnosed with lung abscess at three different health facilities. However, mNGS of bronchoalveolar lavage fluid did not support pulmonary infections. Rather, the copy number variation analysis using host DNA sequences suggested neoplasm. Using H&E staining and immunohistochemistry of lung biopsy, the patient was eventually diagnosed with lung squamous cell carcinoma. Conclusions mNGS not only detects pathogens and helps diagnose infectious diseases, but also has potential in detecting neoplasm via host chromosomal copy number analysis. This might be beneficial for febrile patients with unknown or complex etiology, especially when infectious diseases were initially suspected but empirical antibiotic regimen failed.
Collapse
Affiliation(s)
- Ping Wei
- Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, Fujian, China.,The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China
| | - Yang Gao
- School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Jing Zhang
- The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China
| | - Jianlong Lin
- The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China
| | - Huibin Liu
- The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China
| | - Keqiang Chen
- The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China
| | - Weikai Lin
- The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China
| | - Xiaojia Wang
- Hangzhou Matridx Biotechnology Co., Ltd, Bd 2-4, 2073 Jinchang Rd, Hangzhou, 311100, Zhejiang, China
| | - Chune Wang
- The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350003, Fujian, China. .,Director of Respiratory Department, The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Fuzhou, 350108, Fujian, China.
| | - Chao Liu
- Hangzhou Matridx Biotechnology Co., Ltd, Bd 2-4, 2073 Jinchang Rd, Hangzhou, 311100, Zhejiang, China. .,Director of Medical Department, Hangzhou Matridx Biotechnology Co., Ltd, Hangzhou, 311100, China.
| |
Collapse
|
34
|
Gordeeva V, Sharova E, Arapidi G. Progress in Methods for Copy Number Variation Profiling. Int J Mol Sci 2022; 23:ijms23042143. [PMID: 35216262 PMCID: PMC8879278 DOI: 10.3390/ijms23042143] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/09/2022] [Accepted: 02/11/2022] [Indexed: 02/04/2023] Open
Abstract
Copy number variations (CNVs) are the predominant class of structural genomic variations involved in the processes of evolutionary adaptation, genomic disorders, and disease progression. Compared with single-nucleotide variants, there have been challenges associated with the detection of CNVs owing to their diverse sizes. However, the field has seen significant progress in the past 20–30 years. This has been made possible due to the rapid development of molecular diagnostic methods which ensure a more detailed view of the genome structure, further complemented by recent advances in computational methods. Here, we review the major approaches that have been used to routinely detect CNVs, ranging from cytogenetics to the latest sequencing technologies, and then cover their specific features.
Collapse
Affiliation(s)
- Veronika Gordeeva
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Correspondence:
| | - Elena Sharova
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
| | - Georgij Arapidi
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia
| |
Collapse
|
35
|
Genome sequencing-based coverage analyses facilitate high-resolution detection of deletions linked to phenotypes of gamma-irradiated wheat mutants. BMC Genomics 2022; 23:111. [PMID: 35139819 PMCID: PMC8827196 DOI: 10.1186/s12864-022-08344-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 01/20/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gamma-irradiated mutants of Triticum aestivum L., hexaploid wheat, provide novel and agriculturally important traits and are used as breeding materials. However, the identification of causative genomic regions of mutant phenotypes is challenging because of the large and complicated genome of hexaploid wheat. Recently, the combined use of high-quality reference genome sequences of common wheat and cost-effective resequencing technologies has made it possible to evaluate genome-wide polymorphisms, even in complex genomes. RESULTS To investigate whether the genome sequencing approach can effectively detect structural variations, such as deletions, frequently caused by gamma irradiation, we selected a grain-hardness mutant from the gamma-irradiated population of Japanese elite wheat cultivar "Kitahonami." The Hardness (Ha) locus, including the puroindoline protein-encoding genes Pina-D1 and Pinb-D1 on the short arm of chromosome 5D, primarily regulates the grain hardness variation in common wheat. We performed short-read genome sequencing of wild-type and grain-hardness mutant plants, and subsequently aligned their short reads to the reference genome of the wheat cultivar "Chinese Spring." Genome-wide comparisons of depth-of-coverage between wild-type and mutant strains detected ~ 130 Mbp deletion on the short arm of chromosome 5D in the mutant genome. Molecular markers for this deletion were applied to the progeny populations generated by a cross between the wild-type and the mutant. A large deletion in the region including the Ha locus was associated with the mutant phenotype, indicating that the genome sequencing is a powerful and efficient approach for detecting a deletion marker of a gamma-irradiated mutant phenotype. In addition, we investigated a pre-harvest sprouting tolerance mutant and identified a 67.8 Mbp deletion on chromosome 3B where Viviparous-B1 and GRAS family transcription factors are located. Co-dominant markers designed to detect the deletion-polymorphism confirmed the association with low germination rate, leading to pre-harvest sprouting tolerance. CONCLUSIONS Short read-based genome sequencing of gamma-irradiated mutants facilitates the identification of large deletions linked to mutant phenotypes when combined with segregation analyses in progeny populations. This method allows effective application of mutants with agriculturally important traits in breeding using marker-assisted selection.
Collapse
|
36
|
Multi-omic profiling of peritoneal metastases in gastric cancer identifies molecular subtypes and therapeutic vulnerabilities. NATURE CANCER 2022; 2:962-977. [PMID: 35121863 DOI: 10.1038/s43018-021-00240-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Accepted: 06/25/2021] [Indexed: 12/24/2022]
Abstract
Peritoneal metastasis, a hallmark of incurable advanced gastric cancer (GC), presently has no curative therapy and its molecular features have not been examined extensively. Here we present a comprehensive multi-omic analysis of malignant ascitic fluid samples and their corresponding tumor cell lines from 98 patients, including whole-genome sequencing, RNA sequencing, DNA methylation and enhancer landscape. We identify a higher frequency of receptor tyrosine kinase and mitogen-activated protein kinase pathway alterations compared to primary GC; moreover, approximately half of the gene alterations are potentially treatable with targeted therapy. Our analyses also stratify ascites-disseminated GC into two distinct molecular subtypes: one displaying active super enhancers (SEs) at the ELF3, KLF5 and EHF loci, and a second subtype bearing transforming growth factor-β (TGF-β) pathway activation through SMAD3 SE activation and high expression of transcriptional enhancer factor TEF-1 (TEAD1). In the TGF-β subtype, inhibition of the TEAD pathway circumvents therapy resistance, suggesting a potential molecular-guided therapeutic strategy for this subtype of intractable GC.
Collapse
|
37
|
Liang J, Liu X, Yang P, Yao Z, Qu K, Wang H, Zhang Z, Liang H, Cheng B, Li Z, Ru B, Zhang J, Qi Z, Wang E, Lei C, Chen H, Huang B, Huang Y. Copy number variation of GAL3ST1 gene is associated with growth traits of Chinese cattle. Anim Biotechnol 2022:1-7. [DOI: 10.1080/10495398.2021.1996385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Juntong Liang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xian Liu
- Henan Provincial Animal Husbandry General Station, Zhengzhou, China
| | - Peng Yang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Zhi Yao
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Kaixing Qu
- Yunnan Academy of Grassland and Animal Science, Kunming, China
| | - Hongli Wang
- Jiaxian Animal Husbandry Bureau, Jiaxian, China
| | - Zijing Zhang
- Institute of Animal Husbandry and Veterinary Science, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | | | | | - Zhiming Li
- Henan Provincial Animal Husbandry General Station, Zhengzhou, China
| | - Baorui Ru
- Henan Provincial Animal Husbandry General Station, Zhengzhou, China
| | - Jicai Zhang
- Yunnan Academy of Grassland and Animal Science, Kunming, China
| | - Zengfang Qi
- Jiaxian Animal Husbandry Bureau, Jiaxian, China
| | - Eryao Wang
- Institute of Animal Husbandry and Veterinary Science, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Hong Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Bizhi Huang
- Yunnan Academy of Grassland and Animal Science, Kunming, China
| | - Yongzhen Huang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| |
Collapse
|
38
|
Sinha R, Pal RK, De RK. GenSeg and MR-GenSeg: A Novel Segmentation Algorithm and its Parallel MapReduce Based Approach for Identifying Genomic Regions With Copy Number Variations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:443-454. [PMID: 32750860 DOI: 10.1109/tcbb.2020.3000661] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying intragenic as well as intergenic sequences of the DNA, having structural alterations, is a significantly important research area, since this may be the root cause of many neurological and autoimmune diseases, including cancer. Working with whole genome NGS data has provided a new insight in this regard, but has lead to huge explosion of data that is growing exponentially. Hence, the challenges lie in efficient means of storage and processing this big data. In this study, we have developed a novel segmentation algorithm, called GenSeg, and its parallel MapReduce based algorithm, called MR-GenSeg, for detecting copy number variations. In order to annotate CNVs (variants), segments formed by GenSeg/MR-GenSeg have been represented in a novel way using a binary tree, where each node is a CNV event. GenSeg considers each position specific data of whole genome DNA sequence, so that precise identification of breakpoints is possible. GenSeg/MR-GenSeg has been compared with twelve popular CNV detection algorithms, where it has outperformed the others in terms of sensitivity, and has achieved a good F-score value. MR-GenSeg has excelled in terms of SpeedUp, when compared with these algorithms. The effect of CNVs on immunoglobulin (IG) genes has also been analysed in this study. Availability: The source codes are available at https://github.com/rituparna-sinha/MapReduce-GENSEG.
Collapse
|
39
|
Identification of Copy Number Alterations from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:55-74. [DOI: 10.1007/978-3-030-91836-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
40
|
A Retrospective Statistical Validation Approach for Panel of Normal-Based Single-Nucleotide Variant Detection in Tumor Sequencing. J Mol Diagn 2022; 24:41-47. [PMID: 34974877 DOI: 10.1016/j.jmoldx.2021.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 08/28/2021] [Accepted: 09/28/2021] [Indexed: 11/22/2022] Open
Abstract
An important step of somatic variant calling algorithms for deep sequencing data is quantifying the errors. For targeted sequencing in which hotspot mutations are of interest, site-specific error estimation allows more accurate calling. The site-specific error rates are often estimated from a panel of normal samples, which has limited size and is subject to sampling bias and variance. We propose a novel statistical validation method for single-nucleotide variation (SNV) calling based on historical data. The validation method extracts the high-quality reads from the Binary Alignment/Map (BAM) files, finds the negative samples in the data, and builds a statistical model to call individual samples. It is particularly useful in detecting low-frequency variants that may be missed by traditional panel of normal-based SNV methods. The proposed method makes it possible to launch a simple and parallel validation pipeline for SNV calling and improve the detection limit.
Collapse
|
41
|
Jo H, Yagishita S, Hayashi Y, Ryu S, Suzuki M, Kohsaka S, Ueno T, Matsumoto Y, Horinouchi H, Ohe Y, Watanabe SI, Motoi N, Yatabe Y, Mano H, Takahashi K, Hamada A. Comparative study on the efficacy and exposure of molecular target agents in non-small cell lung cancer PDX models with driver genetic alterations. Mol Cancer Ther 2021; 21:359-370. [PMID: 34911818 DOI: 10.1158/1535-7163.mct-21-0371] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 08/11/2021] [Accepted: 12/13/2021] [Indexed: 11/16/2022]
Abstract
Patient-derived xenografts (PDXs) can adequately reflect clinical drug efficacy. However, the methods for evaluating drug efficacy are not fully established. We selected five non-small cell lung cancer (NSCLC) PDXs with genetic alterations from established PDXs and the corresponding molecular targeted therapy was administered orally for 21 consecutive days. Genetic analysis, measurement of drug concentrations in blood and tumors using liquid chromatography and tandem mass spectrometry, and analysis of drug distribution in tumors using matrix-assisted laser desorption/ionization mass spectrometry were performed. Fifteen (20%) PDXs were established using samples collected from 76 NSCLC patients with genetic alterations. The genetic alterations observed in original patients were largely maintained in PDXs. We compared the drug efficacy in original patients and PDX models; the efficacies against certain PDXs correlated with the clinical effects, while those against the others did not. We determined blood and intratumor concentrations in the PDX model, but both concentrations were low, and no evident correlation with the drug efficacy could be observed. The intratumoral spatial distribution of the drugs was both homogeneous and heterogeneous for each drug, and the distribution was independent of the expression of the target protein. The evaluation of drug efficacy in PDXs enabled partial reproduction of the therapeutic effect in original patients. A more detailed analysis of systemic and intratumoral pharmacokinetics may help clarify the mode of action of drugs. Further development of evaluation methods and indices to improve the prediction accuracy of clinical efficacy is warranted.
Collapse
Affiliation(s)
- Hitomi Jo
- Division of Molecular Pharmacology, National Cancer Center Research Institute
| | - Shigehiro Yagishita
- Division of Molecular Pharmacology, National Cancer Center Research Institute
| | - Yoshiharu Hayashi
- Division of Molecular Pharmacology, National Cancer Center Research Institute
| | - Shoraku Ryu
- Division of Molecular Pharmacology, National Cancer Center Research Institute
| | - Mikiko Suzuki
- Division of Molecular Pharmacology, National Cancer Center Research Institute
| | - Shinji Kohsaka
- Division of Cellular Signaling, National Cancer Center Research Institute
| | - Toshihide Ueno
- Division of Cellular Signaling, National Cancer Center Research Institute
| | | | | | - Yuichiro Ohe
- Department of Thoracic Oncology, National Cancer Center Hospital
| | | | - Noriko Motoi
- Department of Pathology, National Cancer Center Hospital
| | - Yasushi Yatabe
- Department of Diagnostic Pathology, National Cancer Center Hospital
| | | | | | - Akinobu Hamada
- Division of Molecular Pharmacology, National Cancer Center Research Institute
| |
Collapse
|
42
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
43
|
Mwapagha LM, Chibanga V, Shipanga H, Parker MI. New insights from Whole Genome Sequencing: BCLAF1 deletion as a structural variant that predisposes cells towards cellular transformation. Oncol Rep 2021; 46:229. [PMID: 34490482 DOI: 10.3892/or.2021.8180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 07/16/2021] [Indexed: 11/06/2022] Open
Abstract
Cancer arises from a multi‑step cellular transformation process where some mutations may be inherited, while others are acquired during the process of malignant transformation. Aberrations in the BCL2 associated transcription factor 1 (BCLAF1) gene have previously been identified in patients with cancer and the aim of the present study was to identify structural variants (SVs) and the effects of BCLAF1 gene silencing on cell transformation. Whole‑genome sequencing was performed on DNA isolated from tumour biopsies with a histologically confirmed diagnosis of oesophageal squamous cell carcinoma (OSCC). Paired‑end sequencing was performed on the Illumina HiSeq2000, with 300 bp reads. Reads were aligned to the Homo sapiens reference genome (NCBI37) using ELAND and CASAVA software. SVs reported from the alignment were collated with gene loci, using the variant effect predictor of Ensembl. The affected genes were subsequently cross‑checked against the Genetic Association Database for disease and cancer associations. BCLAF1 deletion was identified as a noteworthy SV that could be associated with OSCC. Transient small interfering RNA‑mediated knockdown of BCLAF1 resulted in the altered expression of several downstream genes, including downregulation of the proapoptotic genes Caspase‑3 and BAX and the DNA damage repair genes exonuclease 1, ATR‑interacting protein and transcription regulator protein BACH1. BCLAF1 deficiency also attenuated P53 gene expression. Inhibition of BCLAF1 expression also resulted in increased colony formation. These results provide evidence that the abrogation of BCLAF1 expression results in the dysregulation of several cancer signalling pathways and abnormal cell proliferation.
Collapse
Affiliation(s)
- Lamech M Mwapagha
- Department of Integrative Biomedical Sciences, Division of Medical Biochemistry and Structural Biology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape 7925, South Africa
| | - Vimbaishe Chibanga
- Department of Integrative Biomedical Sciences, Division of Medical Biochemistry and Structural Biology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape 7925, South Africa
| | - Hendrina Shipanga
- Department of Integrative Biomedical Sciences, Division of Medical Biochemistry and Structural Biology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape 7925, South Africa
| | - M Iqbal Parker
- Department of Integrative Biomedical Sciences, Division of Medical Biochemistry and Structural Biology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape 7925, South Africa
| |
Collapse
|
44
|
Huang T, Li J, Jia B, Sang H. CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 12:700874. [PMID: 34484298 PMCID: PMC8415314 DOI: 10.3389/fgene.2021.700874] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/19/2021] [Indexed: 11/20/2022] Open
Abstract
Copy number variation (CNV), is defined as repetitions or deletions of genomic segments of 1 Kb to 5 Mb, and is a major trigger for human disease. The high-throughput and low-cost characteristics of next-generation sequencing technology provide the possibility of the detection of CNVs in the whole genome, and also greatly improve the clinical practicability of next-generation sequencing (NGS) testing. However, current methods for the detection of CNVs are easily affected by sequencing and mapping errors, and uneven distribution of reads. In this paper, we propose an improved approach, CNV-MEANN, for the detection of CNVs, involving changing the structure of the neural network used in the MFCNV method. This method has three differences relative to the MFCNV method: (1) it utilizes a new feature, mapping quality, to replace two features in MFCNV, (2) it considers the influence of the loss categories of CNV on disease prediction, and refines the output structure, and (3) it uses a mind evolutionary algorithm to optimize the backpropagation (neural network) neural network model, and calculates individual scores for each genome bin to predict CNVs. Using both simulated and real datasets, we tested the performance of CNV-MEANN and compared its performance with those of seven widely used CNV detection methods. Experimental results demonstrated that the CNV-MEANN approach outperformed other methods with respect to sensitivity, precision, and F1-score. The proposed method was able to detect many CNVs that other approaches could not, and it reduced the boundary bias. CNV-MEANN is expected to be an effective method for the analysis of changes in CNVs in the genome.
Collapse
Affiliation(s)
- Tihao Huang
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Junqing Li
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Baoxian Jia
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Hongyan Sang
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| |
Collapse
|
45
|
Yuan X, Li J, Bai J, Xi J. A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1811-1820. [PMID: 31880558 DOI: 10.1109/tcbb.2019.2961886] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Copy number variation (CNV) is a major type of genomic structural variations that play an important role in human disorders. Next generation sequencing (NGS) has fueled the advancement in algorithm design to detect CNVs at base-pair resolution. However, accurate detection of CNVs of low amplitudes remains a challenging task. This paper proposes a new computational method, CNV-LOF, to identify CNVs of full-range amplitudes from NGS data. CNV-LOF is distinctly different from traditional methods, which mainly consider aberrations from a global perspective and rely on some assumed distribution of NGS read depths. In contrast, CNV-LOF takes a local view on the read depths and assigns an outlier factor to each genome segment. With the outlier factor profile, CNV-LOF uses a boxplot procedure to declare CNVs without the reliance of any distribution assumptions. Simulation experiments indicate that CNV-LOF outperforms five existing methods with respect to F1-measure, sensitivity, and precision. CNV-LOF is further validated on real sequencing samples, yielding highly consistent results with peer methods. CNV-LOF is able to detect CNVs of low and moderate amplitudes where the other existing methods fail, and it is expected to become a routine approach for the discovery of novel CNVs on whole sequencing genome.
Collapse
|
46
|
Singh AK, Olsen MF, Lavik LAS, Vold T, Drabløs F, Sjursen W. Detecting copy number variation in next generation sequencing data from diagnostic gene panels. BMC Med Genomics 2021; 14:214. [PMID: 34465341 PMCID: PMC8406611 DOI: 10.1186/s12920-021-01059-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/16/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Detection of copy number variation (CNV) in genes associated with disease is important in genetic diagnostics, and next generation sequencing (NGS) technology provides data that can be used for CNV detection. However, CNV detection based on NGS data is in general not often used in diagnostic labs as the data analysis is challenging, especially with data from targeted gene panels. Wet lab methods like MLPA (MRC Holland) are widely used, but are expensive, time consuming and have gene-specific limitations. Our aim has been to develop a bioinformatic tool for CNV detection from NGS data in medical genetic diagnostic samples. RESULTS Our computational pipeline for detection of CNVs in NGS data from targeted gene panels utilizes coverage depth of the captured regions and calculates a copy number ratio score for each region. This is computed by comparing the mean coverage of the sample with the mean coverage of the same region in other samples, defined as a pool. The pipeline selects pools for comparison dynamically from previously sequenced samples, using the pool with an average coverage depth that is nearest to the one of the samples. A sliding window-based approach is used to analyze each region, where length of sliding window and sliding distance can be chosen dynamically to increase or decrease the resolution. This helps in detecting CNVs in small or partial exons. With this pipeline we have correctly identified the CNVs in 36 positive control samples, with sensitivity of 100% and specificity of 91%. We have detected whole gene level deletion/duplication, single/multi exonic level deletion/duplication, partial exonic deletion and mosaic deletion. Since its implementation in mid-2018 it has proven its diagnostic value with more than 45 CNV findings in routine tests. CONCLUSIONS With this pipeline as part of our diagnostic practices it is now possible to detect partial, single or multi-exonic, and intragenic CNVs in all genes in our target panel. This has helped our diagnostic lab to expand the portfolio of genes where we offer CNV detection, which previously was limited by the availability of MLPA kits.
Collapse
Affiliation(s)
- Ashish Kumar Singh
- Department of Medical Genetics, St. Olavs Hospital, Trondheim, Norway.
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU - Norwegian University of Science and Technology, Trondheim, Norway.
| | | | | | - Trine Vold
- Department of Medical Genetics, St. Olavs Hospital, Trondheim, Norway
| | - Finn Drabløs
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - Wenche Sjursen
- Department of Medical Genetics, St. Olavs Hospital, Trondheim, Norway
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
47
|
Wold J, Koepfli KP, Galla SJ, Eccles D, Hogg CJ, Le Lec MF, Guhlin J, Santure AW, Steeves TE. Expanding the conservation genomics toolbox: Incorporating structural variants to enhance genomic studies for species of conservation concern. Mol Ecol 2021; 30:5949-5965. [PMID: 34424587 PMCID: PMC9290615 DOI: 10.1111/mec.16141] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/28/2021] [Accepted: 08/18/2021] [Indexed: 12/28/2022]
Abstract
Structural variants (SVs) are large rearrangements (>50 bp) within the genome that impact gene function and the content and structure of chromosomes. As a result, SVs are a significant source of functional genomic variation, that is, variation at genomic regions underpinning phenotype differences, that can have large effects on individual and population fitness. While there are increasing opportunities to investigate functional genomic variation in threatened species via single nucleotide polymorphism (SNP) data sets, SVs remain understudied despite their potential influence on fitness traits of conservation interest. In this future-focused Opinion, we contend that characterizing SVs offers the conservation genomics community an exciting opportunity to complement SNP-based approaches to enhance species recovery. We also leverage the existing literature-predominantly in human health, agriculture and ecoevolutionary biology-to identify approaches for readily characterizing SVs and consider how integrating these into the conservation genomics toolbox may transform the way we manage some of the world's most threatened species.
Collapse
Affiliation(s)
- Jana Wold
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, Front Royal, Virginia, USA.,Centre for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA.,Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Stephanie J Galla
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Department of Biological Sciences, Boise State University, Boise, Idaho, USA
| | - David Eccles
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Marissa F Le Lec
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand
| | - Joseph Guhlin
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand.,Genomics Aotearoa, Dunedin, Otago, New Zealand
| | - Anna W Santure
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
48
|
SILO: A Computational Method for Detecting Copy Number Gain in Clinical Specimens Analyzed on a Next-Generation Sequencing Platform. J Mol Diagn 2021; 23:1241-1248. [PMID: 34365010 DOI: 10.1016/j.jmoldx.2021.07.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 05/07/2021] [Accepted: 07/07/2021] [Indexed: 12/28/2022] Open
Abstract
Next-generation sequencing (NGS) has proved to be a beneficial approach for genotyping solid tumor specimens and for identifying clinically actionable mutations. However, copy number variations (CNVs), which can be equally important, are often challenging to detect from NGS data. Current bioinformatics methods for CNV detection from NGS often require comparison of tumor/normal pairs and/or the sequencing of whole genome or whole exome. These approaches are currently impractical for routine clinical practice. However, clinical practice does involve repeated use of the same gene panel on a large number of specimens over a long period of time. We take advantage of this repetitiveness and present SILO: a procedure for CNV detection based on NGS on a gene panel. The SILO algorithm analyzes coverage depth of the aligned reads from a sample and predicts CNV by comparing this depth to the average depth seen in a large training set of other samples. Such comparison is robust and can reliably detect copy number gain, although it is found to be unreliable in detecting copy number losses. Successful validation of SILO on NGS data from the Ion Torrent platform with two panels is presented: a small hotspot panel and a larger cancer gene panel.
Collapse
|
49
|
Jugas R, Sedlar K, Vitek M, Nykrynova M, Barton V, Bezdicek M, Lengerova M, Skutkova H. CNproScan: Hybrid CNV detection for bacterial genomes. Genomics 2021; 113:3103-3111. [PMID: 34224809 DOI: 10.1016/j.ygeno.2021.06.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 06/13/2021] [Accepted: 06/30/2021] [Indexed: 10/20/2022]
Abstract
Discovering copy number variation (CNV) in bacteria is not in the spotlight compared to the attention focused on CNV detection in eukaryotes. However, challenges arising from bacterial drug resistance bring further interest to the topic of CNV and its role in drug resistance. General CNV detection methods do not consider bacteria's features and there is space to improve detection accuracy. Here, we present a CNV detection method called CNproScan focused on bacterial genomes. CNproScan implements a hybrid approach and other bacteria-focused features and depends only on NGS data. We benchmarked our method and compared it to the previously published methods and we can resolve to achieve a higher detection rate together with providing other beneficial features, such as CNV classification. Compared with other methods, CNproScan can detect much shorter CNV events.
Collapse
Affiliation(s)
- Robin Jugas
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic.
| | - Karel Sedlar
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Martin Vitek
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Marketa Nykrynova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Barton
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Matej Bezdicek
- Department of Internal Medicine-Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Martina Lengerova
- Department of Internal Medicine-Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Helena Skutkova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
50
|
Liu G, Zhang J. A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 12:699510. [PMID: 34262604 PMCID: PMC8273656 DOI: 10.3389/fgene.2021.699510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022] Open
Abstract
The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey's fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.
Collapse
Affiliation(s)
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|