1
|
Weißbach S, Milkovits J, Pastore S, Heine M, Gerber S, Todorov H. Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain. BMC Bioinformatics 2024; 25:293. [PMID: 39237879 PMCID: PMC11378610 DOI: 10.1186/s12859-024-05919-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 08/28/2024] [Indexed: 09/07/2024] Open
Abstract
BACKGROUND Gene expression and alternative splicing are strictly regulated processes that shape brain development and determine the cellular identity of differentiated neural cell populations. Despite the availability of multiple valuable datasets, many functional implications, especially those related to alternative splicing, remain poorly understood. Moreover, neuroscientists working primarily experimentally often lack the bioinformatics expertise required to process alternative splicing data and produce meaningful and interpretable results. Notably, re-analyzing publicly available datasets and integrating them with in-house data can provide substantial novel insights. However, such analyses necessitate developing harmonized data handling and processing pipelines which in turn require considerable computational resources and in-depth bioinformatics expertise. RESULTS Here, we present Cortexa-a comprehensive web portal that incorporates RNA-sequencing datasets from the mouse cerebral cortex (longitudinal or cell-specific) and the hippocampus. Cortexa facilitates understandable visualization of the expression and alternative splicing patterns of individual genes. Our platform provides SplicePCA-a tool that allows users to integrate their alternative splicing dataset and compare it to cell-specific or developmental neocortical splicing patterns. All standardized gene expression and alternative splicing datasets can be downloaded for further in-depth downstream analysis without the need for extensive preprocessing. CONCLUSIONS Cortexa provides a robust and readily available resource for unraveling the complexity of gene expression and alternative splicing regulatory processes in the mouse brain. The data portal is available at https://cortexa-rna.com/.
Collapse
Affiliation(s)
- Stephan Weißbach
- Institute of Developmental Biology and Neurobiology (iDN), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany
| | - Jonas Milkovits
- Institute of Developmental Biology and Neurobiology (iDN), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
| | - Stefan Pastore
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg University Mainz, 55128, Mainz, Germany
| | - Martin Heine
- Institute of Developmental Biology and Neurobiology (iDN), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
| | - Susanne Gerber
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany.
| | - Hristo Todorov
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany.
| |
Collapse
|
2
|
Gauthier MA, Kadam A, Leveque G, Golabi N, Zeitouni A, Richardson K, Mascarella M, Sadeghi N, Loganathan SK. Long-read sequencing of oropharyngeal squamous cell carcinoma tumors reveal diverse patterns of high-risk Human Papillomavirus integration. Front Oncol 2023; 13:1264646. [PMID: 37916168 PMCID: PMC10616875 DOI: 10.3389/fonc.2023.1264646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 09/27/2023] [Indexed: 11/03/2023] Open
Abstract
Introduction In North America and in most European countries, Human Papillomavirus (HPV) is responsible for over 70% of oropharyngeal squamous cell carcinomas. The burden of OPSCC, in high-income countries, has been steadily increasing over the past 20 years. As a result, in the USA and in the UK, the burden of HPV-related oropharyngeal squamous cell carcinoma in men has now surpassed that of cervical cancer in women. However, the oncogenic impact of high-risk HPV integration in oropharyngeal squamous cell carcinomas hasn't been extensively studied. The present study aimed to explore the patterns of HPV integration in oropharyngeal squamous cell carcinomas and to assess the feasibility and reliability of long-read sequencing technology in detecting viral integration events in oropharyngeal head and neck cancers. Methods A cohort of eight HPV-positive OPSCC pre-treatment patient tumors (four males and four females), were selected. All patients received a p16INK4A positive OPSCC diagnosis and were treated at the McGill University Health Centre, a quaternary center in Montreal. A minimum of 20mg of tumor tissue was used for DNA extraction. Extracted DNA was subjected to Nanopore long-read sequencing to detect and analyze for the presence of high-risk HPV sequences. PCR and Sanger sequencing experiments were performed to confirm Nanopore long-read sequencing readings. Results Nanopore long-read sequencing showed that seven out of eight patient samples displayed either integrated or episomal high-risk HPV sequences. Out of these seven samples, four displayed verifiable integration events upon bioinformatic analysis. Integration confirmation experiments were designed for all four samples using PCR-based methods. Sanger sequencing was also performed. Four distinct HPV integration patterns were identified: concatemer chromosomal integration in a single chromosome, bi-chromosomal concatemer integration, single chromosome complete integration and bi-chromosomal complete integration. HPV concatemer integration also proved more common than full HPV integration events. Conclusion and relevance Long-read sequencing technologies can be effectively used to assess HPV integration patterns in OPSCC tumors. Clinically, more research should be conducted on the prognostication value of high-risk HPV integration in OPSCC tumors using long-read sequencing technology.
Collapse
Affiliation(s)
- Marc-Andre Gauthier
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Adway Kadam
- Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
- Department of Experimental Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Gary Leveque
- Canadian Centre for Computational Genomics, McGill University, Montreal, QC, Canada
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Nahid Golabi
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Anthony Zeitouni
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Keith Richardson
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Marco Mascarella
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Nader Sadeghi
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
- Department of Experimental Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Department of Oncology, McGill University, Montreal, QC, Canada
| | - Sampath Kumar Loganathan
- Department of Otolaryngology, Head and Neck Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
- Department of Experimental Surgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
- Department of Experimental Medicine, Department of Biochemistry and Goodman Cancer Research Institute, McGill University, Montreal, QC, Canada
| |
Collapse
|
3
|
Corominas J, Smeekens SP, Nelen MR, Yntema HG, Kamsteeg EJ, Pfundt R, Gilissen C. Clinical exome sequencing - mistakes and caveats. Hum Mutat 2022; 43:1041-1055. [PMID: 35191116 PMCID: PMC9541396 DOI: 10.1002/humu.24360] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 01/11/2022] [Accepted: 02/18/2022] [Indexed: 11/30/2022]
Abstract
Massive parallel sequencing technology has become the predominant technique for genetic diagnostics and research. Many genetic laboratories have wrestled with the challenges of setting up genetic testing workflows based on a completely new technology. The learning curve we went through as a laboratory was accompanied by growing pains while we gained new knowledge and expertise. Here we discuss some important mistakes that have been made in our laboratory through 10 years of clinical exome sequencing but that have given us important new insights on how to adapt our working methods. We provide these examples and the lessons that we learned to help other laboratories avoid to make the same mistakes.
Collapse
Affiliation(s)
- Jordi Corominas
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Sanne P Smeekens
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Marcel R Nelen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Helger G Yntema
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
4
|
Gerber S, Pospisil L, Sys S, Hewel C, Torkamani A, Horenko I. Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics. Front Artif Intell 2022; 4:739432. [PMID: 35072059 PMCID: PMC8766632 DOI: 10.3389/frai.2021.739432] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Mislabeling of cases as well as controls in case–control studies is a frequent source of strong bias in prognostic and diagnostic tests and algorithms. Common data processing methods available to the researchers in the biomedical community do not allow for consistent and robust treatment of labeled data in the situations where both, the case and the control groups, contain a non-negligible proportion of mislabeled data instances. This is an especially prominent issue in studies regarding late-onset conditions, where individuals who may convert to cases may populate the control group, and for screening studies that often have high false-positive/-negative rates. To address this problem, we propose a method for a simultaneous robust inference of Lasso reduced discriminative models and of latent group-specific mislabeling risks, not requiring any exactly labeled data. We apply it to a standard breast cancer imaging dataset and infer the mislabeling probabilities (being rates of false-negative and false-positive core-needle biopsies) together with a small set of simple diagnostic rules, outperforming the state-of-the-art BI-RADS diagnostics on these data. The inferred mislabeling rates for breast cancer biopsies agree with the published purely empirical studies. Applying the method to human genomic data from a healthy-ageing cohort reveals a previously unreported compact combination of single-nucleotide polymorphisms that are strongly associated with a healthy-ageing phenotype for Caucasians. It determines that 7.5% of Caucasians in the 1000 Genomes dataset (selected as a control group) carry a pattern characteristic of healthy ageing.
Collapse
Affiliation(s)
- Susanne Gerber
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- *Correspondence: Susanne Gerber, ; Illia Horenko,
| | - Lukas Pospisil
- Faculty of Informatics, Institute of Computational Science, Università Della Svizzera Italiana, Lugano, Switzerland
| | - Stanislav Sys
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Charlotte Hewel
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Ali Torkamani
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States
| | - Illia Horenko
- Faculty of Informatics, Institute of Computational Science, Università Della Svizzera Italiana, Lugano, Switzerland
- *Correspondence: Susanne Gerber, ; Illia Horenko,
| |
Collapse
|
5
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|