1
|
Louw N, Carstens N, Lombard Z. Incorporating CNV analysis improves the yield of exome sequencing for rare monogenic disorders-an important consideration for resource-constrained settings. Front Genet 2023; 14:1277784. [PMID: 38155715 PMCID: PMC10753787 DOI: 10.3389/fgene.2023.1277784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/22/2023] [Indexed: 12/30/2023] Open
Abstract
Exome sequencing (ES) is a recommended first-tier diagnostic test for many rare monogenic diseases. It allows for the detection of both single-nucleotide variants (SNVs) and copy number variants (CNVs) in coding exonic regions of the genome in a single test, and this dual analysis is a valuable approach, especially in limited resource settings. Single-nucleotide variants are well studied; however, the incorporation of copy number variant analysis tools into variant calling pipelines has not been implemented yet as a routine diagnostic test, and chromosomal microarray is still more widely used to detect copy number variants. Research shows that combined single and copy number variant analysis can lead to a diagnostic yield of up to 58%, increasing the yield with as much as 18% from the single-nucleotide variant only pipeline. Importantly, this is achieved with the consideration of computational costs only, without incurring any additional sequencing costs. This mini review provides an overview of copy number variant analysis from exome data and what the current recommendations are for this type of analysis. We also present an overview on rare monogenic disease research standard practices in resource-limited settings. We present evidence that integrating copy number variant detection tools into a standard exome sequencing analysis pipeline improves diagnostic yield and should be considered a significantly beneficial addition, with relatively low-cost implications. Routine implementation in underrepresented populations and limited resource settings will promote generation and sharing of CNV datasets and provide momentum to build core centers for this niche within genomic medicine.
Collapse
Affiliation(s)
- Nadja Louw
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Nadia Carstens
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Genomics Platform, South African Medical Research Council, Cape Town, South Africa
| | - Zané Lombard
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | |
Collapse
|
2
|
Kwon HJ, Park UH, Goh CJ, Park D, Lim YG, Lee IK, Do WJ, Lee KJ, Kim H, Yun SY, Joo J, Min NY, Lee S, Um SW, Lee MS. Enhancing Lung Cancer Classification through Integration of Liquid Biopsy Multi-Omics Data with Machine Learning Techniques. Cancers (Basel) 2023; 15:4556. [PMID: 37760525 PMCID: PMC10526503 DOI: 10.3390/cancers15184556] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 08/30/2023] [Accepted: 09/07/2023] [Indexed: 09/29/2023] Open
Abstract
Early detection of lung cancer is crucial for patient survival and treatment. Recent advancements in next-generation sequencing (NGS) analysis enable cell-free DNA (cfDNA) liquid biopsy to detect changes, like chromosomal rearrangements, somatic mutations, and copy number variations (CNVs), in cancer. Machine learning (ML) analysis using cancer markers is a highly promising tool for identifying patterns and anomalies in cancers, making the development of ML-based analysis methods essential. We collected blood samples from 92 lung cancer patients and 80 healthy individuals to analyze the distinction between them. The detection of lung cancer markers Cyfra21 and carcinoembryonic antigen (CEA) in blood revealed significant differences between patients and controls. We performed machine learning analysis to obtain AUC values via Adaptive Boosting (AdaBoost), Multi-Layer Perceptron (MLP), and Logistic Regression (LR) using cancer markers, cfDNA concentrations, and CNV screening. Furthermore, combining the analysis of all multi-omics data for ML showed higher AUC values compared with analyzing each element separately, suggesting the potential for a highly accurate diagnosis of cancer. Overall, our results from ML analysis using multi-omics data obtained from blood demonstrate a remarkable ability of the model to distinguish between lung cancer and healthy individuals, highlighting the potential for a diagnostic model against lung cancer.
Collapse
Affiliation(s)
- Hyuk-Jung Kwon
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
- Department of Computer Science and Engineering, Incheon National University (INU), Incheon 22012, Republic of Korea
| | - Ui-Hyun Park
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Chul Jun Goh
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Dabin Park
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Yu Gyeong Lim
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Isaac Kise Lee
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
- Department of Computer Science and Engineering, Incheon National University (INU), Incheon 22012, Republic of Korea
- NGENI Foundation, San Diego, CA 92123, USA
| | - Woo-Jung Do
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Kyoung Joo Lee
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Hyojung Kim
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Seon-Young Yun
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Joungsu Joo
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Na Young Min
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Sunghoon Lee
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
| | - Sang-Won Um
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul 06351, Republic of Korea;
| | - Min-Seob Lee
- Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of Korea; (H.-J.K.); (U.-H.P.); (C.J.G.); (D.P.); (Y.G.L.); (I.K.L.); (W.-J.D.); (K.J.L.); (H.K.); (N.Y.M.)
- Diagnomics, Inc., 5795 Kearny Villa Rd., San Diego, CA 92123, USA
| |
Collapse
|
3
|
Lin Q, Tam PKH, Tang CSM. Artificial intelligence-based approaches for the detection and prioritization of genomic mutations in congenital surgical diseases. Front Pediatr 2023; 11:1203289. [PMID: 37593442 PMCID: PMC10429173 DOI: 10.3389/fped.2023.1203289] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 07/17/2023] [Indexed: 08/19/2023] Open
Abstract
Genetic mutations are critical factors leading to congenital surgical diseases and can be identified through genomic analysis. Early and accurate identification of genetic mutations underlying these conditions is vital for clinical diagnosis and effective treatment. In recent years, artificial intelligence (AI) has been widely applied for analyzing genomic data in various clinical settings, including congenital surgical diseases. This review paper summarizes current state-of-the-art AI-based approaches used in genomic analysis and highlighted some successful applications that deepen our understanding of the etiology of several congenital surgical diseases. We focus on the AI methods designed for the detection of different variant types and the prioritization of deleterious variants located in different genomic regions, aiming to uncover susceptibility genomic mutations contributed to congenital surgical disorders.
Collapse
Affiliation(s)
- Qiongfen Lin
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Paul Kwong-Hang Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Faculty of Medicine, Macau University of Science and Technology, Macau, Macau SAR, China
| | - Clara Sze-Man Tang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Dr Li Dak-Sum Research Centree, The University of Hong Kong - Karolinska Institutet Collaboration in Regenerative Medicine, Hong Kong, Hong Kong SAR, China
| |
Collapse
|
4
|
Farrell M, Dietterich TE, Harner MK, Bruno LM, Filmyer DM, Shaughnessy RA, Lichtenstein ML, Britt AM, Biondi TF, Crowley JJ, Lázaro-Muñoz G, Forsingdal AE, Nielsen J, Didriksen M, Berg JS, Wen J, Szatkiewicz J, Mary Xavier R, Sullivan PF, Josiassen RC. Increased Prevalence of Rare Copy Number Variants in Treatment-Resistant Psychosis. Schizophr Bull 2023; 49:881-892. [PMID: 36454006 PMCID: PMC10318882 DOI: 10.1093/schbul/sbac175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
BACKGROUND It remains unknown why ~30% of patients with psychotic disorders fail to respond to treatment. Previous genomic investigations of treatment-resistant psychosis have been inconclusive, but some evidence suggests a possible link between rare disease-associated copy number variants (CNVs) and worse clinical outcomes in schizophrenia. Here, we identified schizophrenia-associated CNVs in patients with treatment-resistant psychotic symptoms and then compared the prevalence of these CNVs to previously published schizophrenia cases not selected for treatment resistance. METHODS CNVs were identified using chromosomal microarray (CMA) and whole exome sequencing (WES) in 509 patients with treatment-resistant psychosis (a lack of clinical response to ≥3 adequate antipsychotic medication trials over at least 5 years of psychiatric hospitalization). Prevalence of schizophrenia-associated CNVs in this sample was compared to that in a previously published large schizophrenia cohort study. RESULTS Integrating CMA and WES data, we identified 47 cases (9.2%) with at least one CNV of known or possible neuropsychiatric risk. 4.7% (n = 24) carried a known neurodevelopmental risk CNV. The prevalence of well-replicated schizophrenia-associated CNVs was 4.1%, with duplications of the 16p11.2 and 15q11.2-q13.1 regions, and deletions of the 22q11.2 chromosomal region as the most frequent CNVs. Pairwise loci-based analysis identified duplications of 15q11.2-q13.1 to be independently associated with treatment resistance. CONCLUSIONS These findings suggest that CNVs may uniquely impact clinical phenotypes beyond increasing risk for schizophrenia and may potentially serve as biological entry points for studying treatment resistance. Further investigation will be necessary to elucidate the spectrum of phenotypic characteristics observed in adult psychiatric patients with disease-associated CNVs.
Collapse
Affiliation(s)
- Martilias Farrell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | - Lisa M Bruno
- Translational Neuroscience, LLC, Conshohocken, PA, USA
| | | | | | | | - Allison M Britt
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tamara F Biondi
- Office of the Vice Chancellor for Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - James J Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Gabriel Lázaro-Muñoz
- Center for Bioethics, Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | | | - Jacob Nielsen
- Division of Neuroscience, H. Lundbeck A/S, Valby, Denmark
| | | | - Jonathan S Berg
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jin Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Rose Mary Xavier
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | | |
Collapse
|
5
|
Ye B, Tang X, Liao S, Ding K. A comparison of algorithms for identifying copy number variants in family-based whole-exome sequencing data and its implications in inheritance pattern analysis. Gene 2023; 861:147237. [PMID: 36731620 DOI: 10.1016/j.gene.2023.147237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 12/27/2022] [Accepted: 01/26/2023] [Indexed: 01/31/2023]
Abstract
There remain challenges in accurately identifying constitutional or germline copy number variants (gCNVs) based on whole-exome sequencing data that have implications for genetic diagnosis for 'rare undiagnosed disease' in the clinical setting. Although multiple algorithms have been proposed, a systematic comparison of these algorithms for calling gCNVs and analyzing inherited pattern have yet to be fully conducted. Therefore, we empirically compared seven exome-based algorithms, including XHMM, CLAMMS, CODEX2, ExomeDepth, DECoN, CN.MOPS, and GATK gCNV, for calling gCNVs in 151 individuals from 44 pedigrees, together with the gold standard of genotyping-derived gCNVs in the same cohort for the performance assessment. These algorithms demonstrated varied powers in identifying gCNVs, although the distribution of gCNVs size was similar. The number of shared gCNVs across these algorithms was limited (e.g., only four gCNVs shared among seven algorithms); however, several algorithms showed varying degrees of consistency (e.g., 1,843 gCNVs shared between DECoN and ExomeDepth). CLAMMS and CODEX2 outperformed the remaining algorithms according to a relatively higher F-score (i.e., 0.145 and 0.152, respectively). In addition, these algorithms exhibited different Mendelian inconsistencies of gCNVs and significant challenges remained in inheritance pattern analysis. In conclusion, selecting good algorithms may have important implications in gCNVs-based inheritance pattern analysis for family-based studies.
Collapse
Affiliation(s)
- Bo Ye
- Department of Bioinformatics, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, PR China
| | - Xia Tang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, PR China
| | - Shixiu Liao
- Medical Genetic Institute of Henan Province, Henan Provincial People's Hospital, Henan Key Laboratory of Genetic Diseases and Functional Genomics, Henan Provincial People's Hospital of Henan University, People's Hospital of Zhengzhou University, Zhengzhou, Henan Province 450003, PR China.
| | - Keyue Ding
- Medical Genetic Institute of Henan Province, Henan Provincial People's Hospital, Henan Key Laboratory of Genetic Diseases and Functional Genomics, Henan Provincial People's Hospital of Henan University, People's Hospital of Zhengzhou University, Zhengzhou, Henan Province 450003, PR China; Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, United States.
| |
Collapse
|
6
|
Tan R, Shen Y. Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning. Nucleic Acids Res 2022; 50:e123. [PMID: 36124672 PMCID: PMC9756945 DOI: 10.1093/nar/gkac788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/08/2022] [Accepted: 09/01/2022] [Indexed: 12/24/2022] Open
Abstract
Exome sequencing is widely used in genetic studies of human diseases and clinical genetic diagnosis. Accurate detection of copy number variants (CNVs) is important to fully utilize exome sequencing data. However, exome data are noisy. None of the existing methods alone can achieve both high precision and recall rate. A common practice is to perform heuristic filtration followed by manual inspection of read depth of putative CNVs. This approach does not scale in large studies. To address this issue, we developed a transfer learning method, CNV-espresso, for in silico confirming rare CNVs from exome sequencing data. CNV-espresso encodes candidate CNVs from exome data as images and uses pretrained convolutional neural network models to classify copy number states. We trained CNV-espresso using an offspring-parents trio exome sequencing dataset, with inherited CNVs as positives and CNVs with Mendelian errors as negatives. We evaluated the performance using additional samples that have both exome and whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS data as a proxy of ground truth, CNV-espresso significantly improves precision while keeping recall almost intact, especially for CNVs that span a small number of exons. CNV-espresso can effectively replace manual inspection of CNVs in large-scale exome sequencing studies.
Collapse
Affiliation(s)
- Renjie Tan
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032, USA
| |
Collapse
|
7
|
Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022; 39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/14/2022] [Indexed: 10/28/2022]
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
8
|
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
9
|
Özden F, Alkan C, Çiçek AE. Polishing copy number variant calls on exome sequencing data via deep learning. Genome Res 2022; 32:1170-1182. [PMID: 35697522 PMCID: PMC9248885 DOI: 10.1101/gr.274845.120] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 05/13/2022] [Indexed: 11/24/2022]
Abstract
Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.
Collapse
Affiliation(s)
- Furkan Özden
- Department of Computer Engineering, Bilkent University, 06800 Ankara, Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, 06800 Ankara, Turkey
| | - A Ercüment Çiçek
- Department of Computer Engineering, Bilkent University, 06800 Ankara, Turkey
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
10
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK
- University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia
- Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
11
|
Gordeeva V, Sharova E, Arapidi G. Progress in Methods for Copy Number Variation Profiling. Int J Mol Sci 2022; 23:ijms23042143. [PMID: 35216262 PMCID: PMC8879278 DOI: 10.3390/ijms23042143] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/09/2022] [Accepted: 02/11/2022] [Indexed: 02/04/2023] Open
Abstract
Copy number variations (CNVs) are the predominant class of structural genomic variations involved in the processes of evolutionary adaptation, genomic disorders, and disease progression. Compared with single-nucleotide variants, there have been challenges associated with the detection of CNVs owing to their diverse sizes. However, the field has seen significant progress in the past 20–30 years. This has been made possible due to the rapid development of molecular diagnostic methods which ensure a more detailed view of the genome structure, further complemented by recent advances in computational methods. Here, we review the major approaches that have been used to routinely detect CNVs, ranging from cytogenetics to the latest sequencing technologies, and then cover their specific features.
Collapse
Affiliation(s)
- Veronika Gordeeva
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Correspondence:
| | - Elena Sharova
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
| | - Georgij Arapidi
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia
| |
Collapse
|
12
|
Artificial Intelligence and Cardiovascular Genetics. Life (Basel) 2022; 12:life12020279. [PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/26/2022] [Accepted: 02/09/2022] [Indexed: 12/13/2022] Open
Abstract
Polygenic diseases, which are genetic disorders caused by the combined action of multiple genes, pose unique and significant challenges for the diagnosis and management of affected patients. A major goal of cardiovascular medicine has been to understand how genetic variation leads to the clinical heterogeneity seen in polygenic cardiovascular diseases (CVDs). Recent advances and emerging technologies in artificial intelligence (AI), coupled with the ever-increasing availability of next generation sequencing (NGS) technologies, now provide researchers with unprecedented possibilities for dynamic and complex biological genomic analyses. Combining these technologies may lead to a deeper understanding of heterogeneous polygenic CVDs, better prognostic guidance, and, ultimately, greater personalized medicine. Advances will likely be achieved through increasingly frequent and robust genomic characterization of patients, as well the integration of genomic data with other clinical data, such as cardiac imaging, coronary angiography, and clinical biomarkers. This review discusses the current opportunities and limitations of genomics; provides a brief overview of AI; and identifies the current applications, limitations, and future directions of AI in genomics.
Collapse
|
13
|
Singh R, Kumar K, Bharadwaj C, Verma PK. Broadening the horizon of crop research: a decade of advancements in plant molecular genetics to divulge phenotype governing genes. PLANTA 2022; 255:46. [PMID: 35076815 DOI: 10.1007/s00425-022-03827-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 01/08/2022] [Indexed: 06/14/2023]
Abstract
Advancements in sequencing, genotyping, and computational technologies during the last decade (2011-2020) enabled new forward-genetic approaches, which subdue the impediments of precise gene mapping in varied crops. The modern crop improvement programs rely heavily on two major steps-trait-associated QTL/gene/marker's identification and molecular breeding. Thus, it is vital for basic and translational crop research to identify genomic regions that govern the phenotype of interest. Until the advent of next-generation sequencing, the forward-genetic techniques were laborious and time-consuming. Over the last 10 years, advancements in the area of genome assembly, genotyping, large-scale data analysis, and statistical algorithms have led faster identification of genomic variations regulating the complex agronomic traits and pathogen resistance. In this review, we describe the latest developments in genome sequencing and genotyping along with a comprehensive evaluation of the last 10-year headways in forward-genetic techniques that have shifted the focus of plant research from model plants to diverse crops. We have classified the available molecular genetic methods under bulk-segregant analysis-based (QTL-seq, GradedPool-Seq, QTG-Seq, Exome QTL-seq, and RapMap), target sequence enrichment-based (RenSeq, AgRenSeq, and TACCA), and mutation-based groups (MutMap, NIKS algorithm, MutRenSeq, MutChromSeq), alongside improvements in classical mapping and genome-wide association analyses. Newer methods for outcrossing, heterozygous, and polyploid plant genetics have also been discussed. The use of k-mers has enriched the nature of genetic variants which can be utilized to identify the phenotype-causing genes, independent of reference genomes. We envisage that the recent methods discussed herein will expand the repertoire of useful alleles and help in developing high-yielding and climate-resilient crops.
Collapse
Affiliation(s)
- Ritu Singh
- Plant Immunity Laboratory, National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Kamal Kumar
- Plant Immunity Laboratory, National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Chellapilla Bharadwaj
- Division of Genetics, ICAR-Indian Agricultural Research Institute (IARI), New Delhi, 110020, India
| | - Praveen Kumar Verma
- Plant Immunity Laboratory, National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi, 110067, India.
- Plant Immunity Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.
| |
Collapse
|
14
|
Identification of Copy Number Alterations from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:55-74. [DOI: 10.1007/978-3-030-91836-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
15
|
Kim MJ, Lee S, Yun H, Cho SI, Kim B, Lee JS, Chae JH, Sun C, Park SS, Seong MW. Consistent count region-copy number variation (CCR-CNV): an expandable and robust tool for clinical diagnosis of copy number variation at the exon level using next-generation sequencing data. Genet Med 2021; 24:663-672. [PMID: 34906491 DOI: 10.1016/j.gim.2021.10.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 10/29/2021] [Indexed: 11/29/2022] Open
Abstract
PURPOSE Despite the importance of exonic copy number variations (CNVs) in human genetic diseases, reliable next-generation sequencing-based methods for detecting them are unavailable. We developed an expandable and robust exonic CNV detection tool called consistent count region (CCR)-CNV. METHODS In total, about 1000 samples of the truth set were used for validating CCR-CNV. We compared CCR-CNV performance with 2 well-known CNV tools. Finally, to overcome the limitations of CCR-CNV, we devised a combined approach. RESULTS The mean sensitivity and specificity of CCR-CNV alone were above 95%, which was superior to that of other CNV tools, such as DECoN and Atlas-CNV. However, low covered region and positive predictive value and high false discovery rate act as obstacles to its use in clinical settings. The combined approach showed much improved performance than CCR-CNV alone. CONCLUSION In this study, we present a novel diagnostic tool that allows the identification of exonic CNVs with high confidence using various reagents and clinical next-generation sequencing platforms. We validated this method using the largest multiple ligation-dependent probe amplification-confirmed data set, including sufficient copy normal control data. The approach, combined with existing CNV tools, allows the implementation of CCR-CNV in clinical settings.
Collapse
Affiliation(s)
- Man Jin Kim
- Department of Genomic Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea; Department of Laboratory Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea; Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
| | - Hongseok Yun
- Department of Genomic Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea; Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
| | - Sung Im Cho
- Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
| | - Boram Kim
- Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
| | - Jee-Soo Lee
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Jong Hee Chae
- Department of Genomic Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea; Department of Pediatrics, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | | | - Sung Sup Park
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Moon-Woo Seong
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea.
| |
Collapse
|
16
|
Jensen M, Tyryshkina A, Pizzo L, Smolen C, Das M, Huber E, Krishnan A, Girirajan S. Combinatorial patterns of gene expression changes contribute to variable expressivity of the developmental delay-associated 16p12.1 deletion. Genome Med 2021; 13:163. [PMID: 34657631 PMCID: PMC8522054 DOI: 10.1186/s13073-021-00982-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 09/28/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Recent studies have suggested that individual variants do not sufficiently explain the variable expressivity of phenotypes observed in complex disorders. For example, the 16p12.1 deletion is associated with developmental delay and neuropsychiatric features in affected individuals, but is inherited in > 90% of cases from a mildly-affected parent. While children with the deletion are more likely to carry additional "second-hit" variants than their parents, the mechanisms for how these variants contribute to phenotypic variability are unknown. METHODS We performed detailed clinical assessments, whole-genome sequencing, and RNA sequencing of lymphoblastoid cell lines for 32 individuals in five large families with multiple members carrying the 16p12.1 deletion. We identified contributions of the 16p12.1 deletion and "second-hit" variants towards a range of expression changes in deletion carriers and their family members, including differential expression, outlier expression, alternative splicing, allele-specific expression, and expression quantitative trait loci analyses. RESULTS We found that the deletion dysregulates multiple autism and brain development genes such as FOXP1, ANK3, and MEF2. Carrier children also showed an average of 5323 gene expression changes compared with one or both parents, which matched with 33/39 observed developmental phenotypes. We identified significant enrichments for 13/25 classes of "second-hit" variants in genes with expression changes, where 4/25 variant classes were only enriched when inherited from the noncarrier parent, including loss-of-function SNVs and large duplications. In 11 instances, including for ZEB2 and SYNJ1, gene expression was synergistically altered by both the deletion and inherited "second-hits" in carrier children. Finally, brain-specific interaction network analysis showed strong connectivity between genes carrying "second-hits" and genes with transcriptome alterations in deletion carriers. CONCLUSIONS Our results suggest a potential mechanism for how "second-hit" variants modulate expressivity of complex disorders such as the 16p12.1 deletion through transcriptomic perturbation of gene networks important for early development. Our work further shows that family-based assessments of transcriptome data are highly relevant towards understanding the genetic mechanisms associated with complex disorders.
Collapse
Affiliation(s)
- Matthew Jensen
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA
- Bioinformatics and Genomics Program, Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA
| | - Anastasia Tyryshkina
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA
- Neuroscience Program, Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA
| | - Lucilla Pizzo
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA
| | - Corrine Smolen
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA
- Bioinformatics and Genomics Program, Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA
| | - Maitreya Das
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA
| | - Emily Huber
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, PA, 16802, University Park, USA.
- Bioinformatics and Genomics Program, Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
- Neuroscience Program, Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Anthropology, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
17
|
Herai RH, Szeto RA, Trujillo CA, Muotri AR. Response to Comment on "Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment". Science 2021; 374:eabi9881. [PMID: 34648331 DOI: 10.1126/science.abi9881] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Maricic et al. performed an undisclosed in silico-only whole-exome sequencing analysis of our data and found genomic alterations previously undetected in some clones. Some of the predicted alterations, if true, could change the original genotype of the clones. We failed to experimentally validate all but one of these genomic alterations, which did not affect our previous results or data interpretation.
Collapse
Affiliation(s)
- Roberto H Herai
- Experimental Multiuser Laboratory (LEM), Graduate Program in Health Sciences, School of Medicine, Pontifícia Universidade Católica do Paraná, Curitiba, PR 80215-901, Brazil
| | - Ryan A Szeto
- Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92037, USA
| | - Cleber A Trujillo
- Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92037, USA
| | - Alysson R Muotri
- Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92037, USA.,Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, Center for Academic Research and Training in Anthropogeny (CARTA), Kavli Institute for Brain and Mind, Archealization Center (ArchC), University of California, San Diego, La Jolla, CA 92037, USA
| |
Collapse
|
18
|
Xu Y, Su GH, Ma D, Xiao Y, Shao ZM, Jiang YZ. Technological advances in cancer immunity: from immunogenomics to single-cell analysis and artificial intelligence. Signal Transduct Target Ther 2021; 6:312. [PMID: 34417437 PMCID: PMC8377461 DOI: 10.1038/s41392-021-00729-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 07/06/2021] [Accepted: 07/18/2021] [Indexed: 02/07/2023] Open
Abstract
Immunotherapies play critical roles in cancer treatment. However, given that only a few patients respond to immune checkpoint blockades and other immunotherapeutic strategies, more novel technologies are needed to decipher the complicated interplay between tumor cells and the components of the tumor immune microenvironment (TIME). Tumor immunomics refers to the integrated study of the TIME using immunogenomics, immunoproteomics, immune-bioinformatics, and other multi-omics data reflecting the immune states of tumors, which has relied on the rapid development of next-generation sequencing. High-throughput genomic and transcriptomic data may be utilized for calculating the abundance of immune cells and predicting tumor antigens, referring to immunogenomics. However, as bulk sequencing represents the average characteristics of a heterogeneous cell population, it fails to distinguish distinct cell subtypes. Single-cell-based technologies enable better dissection of the TIME through precise immune cell subpopulation and spatial architecture investigations. In addition, radiomics and digital pathology-based deep learning models largely contribute to research on cancer immunity. These artificial intelligence technologies have performed well in predicting response to immunotherapy, with profound significance in cancer therapy. In this review, we briefly summarize conventional and state-of-the-art technologies in the field of immunogenomics, single-cell and artificial intelligence, and present prospects for future research.
Collapse
Affiliation(s)
- Ying Xu
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Guan-Hua Su
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ding Ma
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Yi Xiao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| | - Zhi-Ming Shao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
- Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
| | - Yi-Zhou Jiang
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| |
Collapse
|
19
|
Yeom KH, Pan Z, Lin CH, Lim HY, Xiao W, Xing Y, Black DL. Tracking pre-mRNA maturation across subcellular compartments identifies developmental gene regulation through intron retention and nuclear anchoring. Genome Res 2021; 31:1106-1119. [PMID: 33832989 PMCID: PMC8168582 DOI: 10.1101/gr.273904.120] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 04/01/2021] [Indexed: 12/24/2022]
Abstract
Steps of mRNA maturation are important gene regulatory events that occur in distinct cellular locations. However, transcriptomic analyses often lose information on the subcellular distribution of processed and unprocessed transcripts. We generated extensive RNA-seq data sets to track mRNA maturation across subcellular locations in mouse embryonic stem cells, neuronal progenitor cells, and postmitotic neurons. We find disparate patterns of RNA enrichment between the cytoplasmic, nucleoplasmic, and chromatin fractions, with some genes maintaining more polyadenylated RNA in chromatin than in the cytoplasm. We bioinformatically defined four regulatory groups for intron retention, including complete cotranscriptional splicing, complete intron retention in the cytoplasmic RNA, and two intron groups present in nuclear and chromatin transcripts but fully excised in cytoplasm. We found that introns switch their regulatory group between cell types, including neuronally excised introns repressed by polypyrimidine track binding protein 1 (PTBP1). Transcripts for the neuronal gamma-aminobutyric acid (GABA) B receptor, 1 (Gabbr1) are highly expressed in mESCs but are absent from the cytoplasm. Instead, incompletely spliced Gabbr1 RNA remains sequestered on chromatin, where it is bound by PTBP1, similar to certain long noncoding RNAs. Upon neuronal differentiation, Gabbr1 RNA becomes fully processed and exported for translation. Thus, splicing repression and chromatin anchoring of RNA combine to allow posttranscriptional regulation of Gabbr1 over development. For this and other genes, polyadenylated RNA abundance does not indicate functional gene expression. Our data sets provide a rich resource for analyzing many other aspects of mRNA maturation in subcellular locations and across development.
Collapse
Affiliation(s)
- Kyu-Hyeon Yeom
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Zhicheng Pan
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, California 90095, USA.,Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Chia-Ho Lin
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Han Young Lim
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA.,Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Wen Xiao
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Douglas L Black
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
20
|
Bis-Brewer DM, Gan-Or Z, Sleiman P, Hakonarson H, Fazal S, Courel S, Cintra V, Tao F, Estiar MA, Tarnopolsky M, Boycott KM, Yoon G, Suchowersky O, Dupré N, Cheng A, Lloyd TE, Rouleau G, Schüle R, Züchner S. Assessing non-Mendelian inheritance in inherited axonopathies. Genet Med 2020; 22:2114-2119. [PMID: 32741968 PMCID: PMC7710562 DOI: 10.1038/s41436-020-0924-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 07/22/2020] [Accepted: 07/22/2020] [Indexed: 01/05/2023] Open
Abstract
PURPOSE Inherited axonopathies (IA) are rare, clinically and genetically heterogeneous diseases that lead to length-dependent degeneration of the long axons in central (hereditary spastic paraplegia [HSP]) and peripheral (Charcot-Marie-Tooth type 2 [CMT2]) nervous systems. Mendelian high-penetrance alleles in over 100 different genes have been shown to cause IA; however, about 50% of IA cases do not receive a genetic diagnosis. A more comprehensive spectrum of causative genes and alleles is warranted, including causative and risk alleles, as well as oligogenic multilocus inheritance. METHODS Through international collaboration, IA exome studies are beginning to be sufficiently powered to perform a pilot rare variant burden analysis. After extensive quality control, our cohort contained 343 CMT cases, 515 HSP cases, and 935 non-neurological controls. We assessed the cumulative mutational burden across disease genes, explored the evidence for multilocus inheritance, and performed an exome-wide rare variant burden analysis. RESULTS We replicated the previously described mutational burden in a much larger cohort of CMT cases, and observed the same effect in HSP cases. We identified a preliminary risk allele for CMT in the EXOC4 gene (p value= 6.9 × 10-6, odds ratio [OR] = 2.1) and explored the possibility of multilocus inheritance in IA. CONCLUSION Our results support the continuing emergence of complex inheritance mechanisms in historically Mendelian disorders.
Collapse
Affiliation(s)
- Dana M Bis-Brewer
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.
| | - Ziv Gan-Or
- Department of Human Genetics, McGill University, Montréal, QC, Canada.,Montreal Neurological Institute and Hospital, McGill University, Montréal, QC, Canada.,Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
| | - Patrick Sleiman
- Center for Applied Genomics, The Children's Hospital of Philadelphia; Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia; Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sarah Fazal
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Steve Courel
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Vivian Cintra
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Feifei Tao
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Mehrdad A Estiar
- Department of Human Genetics, McGill University, Montréal, QC, Canada.,Montreal Neurological Institute and Hospital, McGill University, Montréal, QC, Canada
| | - Mark Tarnopolsky
- Neuromuscular and Neurometabolics Division, Department of Pediatrics, McMaster University, Hamilton, ON, Canada
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Grace Yoon
- Division of Clinical and Metabolic Genetics, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada.,Division of Neurology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
| | - Oksana Suchowersky
- Department of Medicine, Medical Genetics and Pediatrics, University of Alberta, Edmonton, AB, Canada
| | - Nicolas Dupré
- Division of Neurosciences, CHU de Québec, Université Laval, Québec City, QC, Canada.,Department of Medicine, Faculty of Medicine, Université Laval, Québec City, QC, Canada
| | - Andrew Cheng
- Department of Neurology and Neuroscience, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Thomas E Lloyd
- Department of Neurology and Neuroscience, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Guy Rouleau
- Department of Human Genetics, McGill University, Montréal, QC, Canada.,Montreal Neurological Institute and Hospital, McGill University, Montréal, QC, Canada.,Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
| | - Rebecca Schüle
- Center for Neurology and Hertie Institute für Clinical Brain Research, University of Tübingen, German Center for Neurodegenerative Diseases, Tübingen, Germany
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
21
|
Ross JP, Dion PA, Rouleau GA. Exome sequencing in genetic disease: recent advances and considerations. F1000Res 2020; 9:F1000 Faculty Rev-336. [PMID: 32431803 PMCID: PMC7205110 DOI: 10.12688/f1000research.19444.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/30/2020] [Indexed: 12/14/2022] Open
Abstract
Over the past decade, exome sequencing (ES) has allowed significant advancements to the field of disease research. By targeting the protein-coding regions of the genome, ES combines the depth of knowledge on protein-altering variants with high-throughput data generation and ease of analysis. New discoveries continue to be made using ES, and medical science has benefitted both theoretically and clinically from its continued use. In this review, we describe recent advances and successes of ES in disease research. Through selected examples of recent publications, we explore how ES continues to be a valuable tool to find variants that might explain disease etiology or provide insight into the biology underlying the disease. We then discuss shortcomings of ES in terms of variant discoveries made by other sequencing technologies that would be missed because of the scope and techniques of ES. We conclude with a brief outlook on the future of ES, suggesting that although newer and more thorough sequencing methods will soon supplant ES, its results will continue to be useful for disease research.
Collapse
Affiliation(s)
- Jay P. Ross
- Department of Human Genetics, McGill University, 3640 University, Montréal, QC, H3A 0C7, Canada
- Montreal Neurological Institute and Hospital, McGill University, 3801 University, Montréal, QC, H3A 2B4, Canada
| | - Patrick A. Dion
- Montreal Neurological Institute and Hospital, McGill University, 3801 University, Montréal, QC, H3A 2B4, Canada
- Department of Neurology and Neurosurgery, McGill University, 3801 University, Montréal, QC, H3A 2B4, Canada
| | - Guy A. Rouleau
- Department of Human Genetics, McGill University, 3640 University, Montréal, QC, H3A 0C7, Canada
- Montreal Neurological Institute and Hospital, McGill University, 3801 University, Montréal, QC, H3A 2B4, Canada
- Department of Neurology and Neurosurgery, McGill University, 3801 University, Montréal, QC, H3A 2B4, Canada
| |
Collapse
|
22
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
23
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
24
|
Xing Y, Dabney AR, Li X, Wang G, Gill CA, Casola C. SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes. Front Genet 2020; 11:82. [PMID: 32153642 PMCID: PMC7046838 DOI: 10.3389/fgene.2020.00082] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 01/24/2020] [Indexed: 01/26/2023] Open
Abstract
Copy number variants are duplications and deletions of the genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested Simulator of Exome Copy Number Variants (SECNVs), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.
Collapse
Affiliation(s)
- Yue Xing
- Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, United States
- Department of Statistics, Texas A&M University, College Station, TX, United States
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States
| | - Alan R. Dabney
- Department of Statistics, Texas A&M University, College Station, TX, United States
| | - Xiao Li
- Department of Molecular and Cellular Medicine, Texas A&M University, College Station, TX, United States
| | - Guosong Wang
- Department of Animal Science, Texas A&M University, College Station, TX, United States
| | - Clare A. Gill
- Department of Animal Science, Texas A&M University, College Station, TX, United States
| | - Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University, College Station, TX, United States
| |
Collapse
|
25
|
Wijfjes RY, Smit S, de Ridder D. Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data. BMC Genomics 2019; 20:818. [PMID: 31699036 PMCID: PMC6836508 DOI: 10.1186/s12864-019-6153-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 09/30/2019] [Indexed: 01/27/2023] Open
Abstract
Background Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. Results To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. Conclusions Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
Collapse
Affiliation(s)
- Raúl Y Wijfjes
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands.
| | - Sandra Smit
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
26
|
Long-read sequencing for rare human genetic diseases. J Hum Genet 2019; 65:11-19. [PMID: 31558760 DOI: 10.1038/s10038-019-0671-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/28/2019] [Accepted: 09/03/2019] [Indexed: 12/19/2022]
Abstract
During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Collapse
|