1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Wang L, Yin N, Shi W, Xie Y, Yi J, Tang Z, Tang J, Xiang J. Splicing inhibition mediated by reduced splicing factors and helicases is associated with the cellular response of lung cancer cells to cisplatin. Comput Struct Biotechnol J 2024; 23:648-658. [PMID: 38283853 PMCID: PMC10819863 DOI: 10.1016/j.csbj.2023.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/17/2023] [Accepted: 12/26/2023] [Indexed: 01/30/2024] Open
Abstract
Lung cancer's mortality is predominantly linked to post-chemotherapy recurrence, driven by the reactivation of dormant cancer cells. Despite the critical role of these reactivated cells in cancer recurrence and metastasis, the molecular mechanisms governing their therapeutic selection remain poorly understood. In this study, we conducted an integrative analysis by combining PacBio single molecule real-time (SMRT) sequencing with short reads Illumina RNA-seq. Our study revealed that cisplatin-induced dormant and reactivated cancer cells exhibited a noteworthy reduction in gene transcripts and alternative splicing events. Particularly, the differential alternative splicing events were found to be overlapping with the differentially expression genes and enriched in genes related to cell cycle and cell division. Utilizing ENCORI database and correlation analysis, we identified key splicing factors, including SRSF7, SRSF3, PRPF8, and HNRNPC, as well as RNA helicase such as EIF4A3, DDX39A, DDX11, and BRIP1, which were associated with the observed reduction in alternative splicing and subsequent decrease in gene expression. Our study demonstrated that lung cancer cells reduce gene transcripts through diminished alternative splicing events mediated by specific splicing factors and RNA helicase in response to the chemotherapeutic stress. These findings provide insights into the molecular mechanisms underlying the therapeutic selection and reactivation of dormant cancer cells. This discovery opens a potential avenue for the development of therapeutic strategies aimed at preventing cancer recurrence following chemotherapy.
Collapse
Affiliation(s)
- Lujuan Wang
- Hunan Key Laboratory of Tumor Models and Individualized Medicine, The Second Xiangya Hospital, Changsha, Hunan 410011, China
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan 410011, China
- NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan 410013, China
| | - Na Yin
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan 410011, China
- NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan 410013, China
| | - Wenhua Shi
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan 410011, China
- NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan 410013, China
| | - Yaohuan Xie
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan 410011, China
- NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan 410013, China
| | - Junqi Yi
- Hunan Key Laboratory of Early Diagnosis and Precise Treatment of Lung Cancer, The Second Xiangya Hospital, Changsha, Hunan 410011, China
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Ziying Tang
- Hunan Key Laboratory of Early Diagnosis and Precise Treatment of Lung Cancer, The Second Xiangya Hospital, Changsha, Hunan 410011, China
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Jingqun Tang
- Hunan Key Laboratory of Early Diagnosis and Precise Treatment of Lung Cancer, The Second Xiangya Hospital, Changsha, Hunan 410011, China
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Juanjuan Xiang
- Hunan Key Laboratory of Early Diagnosis and Precise Treatment of Lung Cancer, The Second Xiangya Hospital, Changsha, Hunan 410011, China
- Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan 410011, China
- NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan 410013, China
| |
Collapse
|
3
|
Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024; 724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
Data acquisition for transcriptomic studies used to be the bottleneck in the transcriptomic analytical pipeline. However, recent developments in transcriptome profiling technologies have increased researchers' ability to obtain data, resulting in a shift in focus to data analysis. Incorporating machine learning to traditional analytical methods allows the possibility of handling larger volumes of complex data more efficiently. Many bioinformaticians, especially those unfamiliar with ML in the study of human transcriptomics and complex biological systems, face a significant barrier stemming from their limited awareness of the current landscape of ML utilisation in this field. To address this gap, this review endeavours to introduce those individuals to the general types of ML, followed by a comprehensive range of more specific techniques, demonstrated through examples of their incorporation into analytical pipelines for human transcriptome investigations. Important computational aspects such as data pre-processing, task formulation, results (performance of ML models), and validation methods are encompassed. In hope of better practical relevance, there is a strong focus on studies published within the last five years, almost exclusively examining human transcriptomes, with outcomes compared with standard non-ML tools.
Collapse
Affiliation(s)
- Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
4
|
Gai W, Wang G, Lam WKJ, Yuen LYP, Jiang P, Yu SCY, Leung TY, Lau SL, Lo YMD, Chan KCA. Universal Targeted Haplotyping by Droplet Digital PCR Sequencing and Its Applications in Noninvasive Prenatal Testing and Pharmacogenetics Analysis. Clin Chem 2024; 70:1046-1055. [PMID: 38873917 DOI: 10.1093/clinchem/hvae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 05/01/2024] [Indexed: 06/15/2024]
Abstract
BACKGROUND The analysis of haplotypes of variants is important for pharmacogenomics analysis and noninvasive prenatal testing for monogenic diseases. However, there is a lack of robust methods for targeted haplotyping. METHODS We developed digital PCR haplotype sequencing (dHapSeq) for targeted haplotyping of variants, which is a method that compartmentalizes long DNA molecules into droplets. Within one droplet, 2 target regions are PCR amplified from one template molecule, and their amplicons are fused together. The fused products are then sequenced to determine the phase relationship of the single nucleotide polymorphism (SNP) alleles. The entire haplotype of 10s of SNPs can be deduced after the phase relationship of individual SNPs are determined in a pairwise manner. We applied dHapSeq to noninvasive prenatal testing in 4 families at risk for thalassemia and utilized it to detect NUDT15 diplotypes for predicting drug tolerance in pediatric acute lymphoblastic leukemia (72 cases and 506 controls). RESULTS For SNPs within 40 kb, phase relation can be determined with 100% accuracy. In 7 trio families, the haplotyping results for 97 SNPs spanning 185 kb determined by dHapSeq were concordant with the results deduced from the genotypes of both parents and the fetus. In 4 thalassemia families, a 19.3-kb Southeast Asian deletion was successfully phased with 97 downstream SNPs, enabling noninvasive determination of fetal inheritance using relative haplotype dosage analysis. In the NUDT15 analysis, the variant status and phase of the variants were successfully determined in all cases and controls. CONCLUSIONS The dHapSeq represents a robust and scalable haplotyping approach with numerous clinical and research applications.
Collapse
Affiliation(s)
- Wanxia Gai
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - Guangya Wang
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - W K Jacky Lam
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
- State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - Liz Y P Yuen
- Division of Genetic and Genomic Pathology, Department of Pathology, Hong Kong Children's Hospital, Kowloon, Hong Kong SAR, China
| | - Peiyong Jiang
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
- State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - Stephanie C Y Yu
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - Tak Y Leung
- Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - So Ling Lau
- Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - Y M Dennis Lo
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
- State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| | - K C Allen Chan
- Centre for Novostics, Hong Kong Science Park, Pak Shek Kok, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
- State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong SAR, China
| |
Collapse
|
5
|
Lysenkova Wiklander M, Arvidsson G, Bunikis I, Lundmark A, Raine A, Marincevic-Zuniga Y, Gezelius H, Bremer A, Feuk L, Ameur A, Nordlund J. A multiomic characterization of the leukemia cell line REH using short- and long-read sequencing. Life Sci Alliance 2024; 7:e202302481. [PMID: 38777370 PMCID: PMC11111970 DOI: 10.26508/lsa.202302481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 05/02/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024] Open
Abstract
The B-cell acute lymphoblastic leukemia (ALL) cell line REH, with the t(12;21) ETV6::RUNX1 translocation, is known to have a complex karyotype defined by a series of large-scale chromosomal rearrangements. Taken from a 15-yr-old at relapse, the cell line offers a practical model for the study of pediatric B-ALL. In recent years, short- and long-read DNA and RNA sequencing have emerged as a complement to karyotyping techniques in the resolution of structural variants in an oncological context. Here, we explore the integration of long-read PacBio and Oxford Nanopore whole-genome sequencing, IsoSeq RNA sequencing, and short-read Illumina sequencing to create a detailed genomic and transcriptomic characterization of the REH cell line. Whole-genome sequencing clarified the molecular traits of disrupted ALL-associated genes including CDKN2A, PAX5, BTG1, VPREB1, and TBL1XR1, as well as the glucocorticoid receptor NR3C1 Meanwhile, transcriptome sequencing identified seven fusion genes within the genomic breakpoints. Together, our extensive whole-genome investigation makes high-quality open-source data available to the leukemia genomics community.
Collapse
Affiliation(s)
- Mariya Lysenkova Wiklander
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Gustav Arvidsson
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Anders Lundmark
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Amanda Raine
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Yanara Marincevic-Zuniga
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Henrik Gezelius
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Anna Bremer
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- https://ror.org/01apvbh93 Department of Clinical Genetics, Uppsala University Hospital, Uppsala, Sweden
| | - Lars Feuk
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Adam Ameur
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Jessica Nordlund
- https://ror.org/048a87296 Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 SciLifeLab, Uppsala University, Uppsala, Sweden
- https://ror.org/048a87296 National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| |
Collapse
|
6
|
Shekhar R, O'Grady T, Keil N, Feswick A, Amador DM, Tibbetts S, Flemington E, Renne R. High-density resolution of the Kaposi's sarcoma associated herpesvirus transcriptome identifies novel transcript isoforms generated by long-range transcription and alternative splicing. Nucleic Acids Res 2024; 52:7720-7739. [PMID: 38922687 PMCID: PMC11260491 DOI: 10.1093/nar/gkae540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/14/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
Kaposi's sarcoma-associated herpesvirus is the etiologic agent of Kaposi's sarcoma and two B-cell malignancies. Recent advancements in sequencing technologies have led to high resolution transcriptomes for several human herpesviruses that densely encode genes on both strands. However, for KSHV progress remained limited due to the overall low percentage of KSHV transcripts, even during lytic replication. To address this challenge, we have developed a target enrichment method to increase the KSHV-specific reads for both short- and long-read sequencing platforms. Furthermore, we combined this approach with the Transcriptome Resolution through Integration of Multi-platform Data (TRIMD) pipeline developed previously to annotate transcript structures. TRIMD first builds a scaffold based on long-read sequencing and validates each transcript feature with supporting evidence from Illumina RNA-Seq and deepCAGE sequencing data. Our stringent innovative approach identified 994 unique KSHV transcripts, thus providing the first high-density KSHV lytic transcriptome. We describe a plethora of novel coding and non-coding KSHV transcript isoforms with alternative untranslated regions, splice junctions and open-reading frames, thus providing deeper insights on gene expression regulation of KSHV. Interestingly, as described for Epstein-Barr virus, we identified transcription start sites that augment long-range transcription and may increase the number of latency-associated genes potentially expressed in KS tumors.
Collapse
Affiliation(s)
- Ritu Shekhar
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | - Tina O'Grady
- Department of Pathology, Tulane University, New Orleans, LA, USA
| | - Netanya Keil
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
- UF Genetics Institute, University of Florida, Gainesville, FL, USA
| | - April Feswick
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | - David A Moraga Amador
- UF Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
| | - Scott A Tibbetts
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
- UF Health Cancer Center, University of Florida, Gainesville, FL, USA
- UF Genetics Institute, University of Florida, Gainesville, FL, USA
| | | | - Rolf Renne
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
- UF Health Cancer Center, University of Florida, Gainesville, FL, USA
- UF Genetics Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
7
|
Dyshlovoy SA, Paigin S, Afflerbach AK, Lobermeyer A, Werner S, Schüller U, Bokemeyer C, Schuh AH, Bergmann L, von Amsberg G, Joosse SA. Applications of Nanopore sequencing in precision cancer medicine. Int J Cancer 2024. [PMID: 39031959 DOI: 10.1002/ijc.35100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 04/25/2024] [Accepted: 06/25/2024] [Indexed: 07/22/2024]
Abstract
Oxford Nanopore Technologies sequencing, also referred to as Nanopore sequencing, stands at the forefront of a revolution in clinical genetics, offering the potential for rapid, long read, and real-time DNA and RNA sequencing. This technology is currently making sequencing more accessible and affordable. In this comprehensive review, we explore its potential regarding precision cancer diagnostics and treatment. We encompass a critical analysis of clinical cases where Nanopore sequencing was successfully applied to identify point mutations, splice variants, gene fusions, epigenetic modifications, non-coding RNAs, and other pivotal biomarkers that defined subsequent treatment strategies. Additionally, we address the challenges of clinical applications of Nanopore sequencing and discuss the current efforts to overcome them.
Collapse
Affiliation(s)
- Sergey A Dyshlovoy
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Oxford, UK
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefanie Paigin
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Institute of Pathology and Neuropathology, University Hospital Tübingen, Tübingen, Germany
| | - Ann-Kristin Afflerbach
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Annabelle Lobermeyer
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Werner
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Ulrich Schüller
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
- Institute for Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Department of Paediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Carsten Bokemeyer
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Anna H Schuh
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Oxford, UK
| | - Lina Bergmann
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Gunhild von Amsberg
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Martini-Klinik, Prostate Cancer Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Simon A Joosse
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Mildred Scheel Cancer Career Center HaTriCS4, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
8
|
Trang KB, Chesi A, Toikumo S, Pippin JA, Pahl MC, O’Brien JM, Amundadottir LT, Brown KM, Yang W, Welles J, Santoleri D, Titchenell PM, Seale P, Zemel BS, Wagley Y, Hankenson KD, Kaestner KH, Anderson SA, Kayser MS, Wells AD, Kranzler HR, Kember RL, Grant SF. Shared and unique 3D genomic features of substance use disorders across multiple cell types. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.18.24310649. [PMID: 39072016 PMCID: PMC11275669 DOI: 10.1101/2024.07.18.24310649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Recent genome-wide association studies (GWAS) have revealed shared genetic components among alcohol, opioid, tobacco and cannabis use disorders. However, the extent of the underlying shared causal variants and effector genes, along with their cellular context, remain unclear. We leveraged our existing 3D genomic datasets comprising high-resolution promoter-focused Capture-C/Hi-C, ATAC-seq and RNA-seq across >50 diverse human cell types to focus on genomic regions that coincide with GWAS loci. Using stratified LD regression, we determined the proportion of genomewide SNP heritability attributable to the features assayed across our cell types by integrating recent GWAS summary statistics for the relevant traits: alcohol use disorder (AUD), tobacco use disorder (TUD), opioid use disorder (OUD) and cannabis use disorder (CanUD). Statistically significant enrichments (P<0.05) were observed in 14 specific cell types, with heritability reaching 9.2-fold for iPSC-derived cortical neurons and neural progenitors, confirming that they are crucial cell types for further functional exploration. Additionally, several pancreatic cell types, notably pancreatic beta cells, showed enrichment for TUD, with heritability enrichments up to 4.8-fold, suggesting genomic overlap with metabolic processes. Further investigation revealed significant positive genetic correlations between T2D with both TUD and CanUD (FDR<0.05) and a significant negative genetic correlation with AUD. Interestingly, after partitioning the heritability for each cell type's cis-regulatory elements, the correlation between T2D and TUD for pancreatic beta cells was greater (r=0.2) than the global genetic correlation value. Our study provides new genomic insights into substance use disorders and implicates cell types where functional follow-up studies could reveal causal variant-gene mechanisms underpinning these disorders.
Collapse
Affiliation(s)
- Khanh B. Trang
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Alessandra Chesi
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sylvanus Toikumo
- Mental Illness Research, Education and Clinical Center, Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - James A. Pippin
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Matthew C. Pahl
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Joan M. O’Brien
- Scheie Eye Institute, Department of Ophthalmology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, PA, USA
- Penn Medicine Center for Ophthalmic Genetics in Complex Disease, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, PA, USA
| | - Laufey T. Amundadottir
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Kevin M. Brown
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Wenli Yang
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jaclyn Welles
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Dominic Santoleri
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Paul M. Titchenell
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Patrick Seale
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Babette S. Zemel
- Division of Gastroenterology, Hepatology, and Nutrition, Children’s Hospital of Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yadav Wagley
- Department of Orthopedic Surgery, University of Michigan Medical School Ann Arbor, MI, USA
| | - Kurt D. Hankenson
- Department of Orthopedic Surgery, University of Michigan Medical School Ann Arbor, MI, USA
| | - Klaus H. Kaestner
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Stewart A. Anderson
- Department of Child and Adolescent Psychiatry, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew S. Kayser
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Chronobiology Sleep Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew D. Wells
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Henry R. Kranzler
- Mental Illness Research, Education and Clinical Center, Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rachel L. Kember
- Mental Illness Research, Education and Clinical Center, Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Struan F.A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Endocrinology and Diabetes, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| |
Collapse
|
9
|
Loving R, Sullivan DK, Reese F, Rebboah E, Sakr J, Rezaie N, Liang HY, Filimban G, Kawauchi S, Oakes C, Trout D, Williams BA, MacGregor G, Wold BJ, Mortazavi A, Pachter L. Long-read sequencing transcriptome quantification with lr-kallisto. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.19.604364. [PMID: 39071335 PMCID: PMC11275803 DOI: 10.1101/2024.07.19.604364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
RNA abundance quantification has become routine and affordable thanks to high-throughput "short-read" technologies that provide accurate molecule counts at the gene level. Similarly accurate and affordable quantification of definitive full-length, transcript isoforms has remained a stubborn challenge, despite its obvious biological significance across a wide range of problems. "Long-read" sequencing platforms now produce data-types that can, in principle, drive routine definitive isoform quantification. However some particulars of contemporary long-read datatypes, together with isoform complexity and genetic variation, present bioinformatic challenges. We show here, using ONT data, that fast and accurate quantification of long-read data is possible and that it is improved by exome capture. To perform quantifications we developed lr-kallisto, which adapts the kallisto bulk and single-cell RNA-seq quantification methods for long-read technologies.
Collapse
|
10
|
Tang X, Berger MF, Solit DB. Precision oncology: current and future platforms for treatment selection. Trends Cancer 2024:S2405-8033(24)00135-3. [PMID: 39030146 DOI: 10.1016/j.trecan.2024.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/20/2024] [Accepted: 06/21/2024] [Indexed: 07/21/2024]
Abstract
Genomic profiling of hundreds of cancer-associated genes is now a component of routine cancer care. DNA sequencing can identify mutations, mutational signatures, and structural alterations predictive of therapy response and assess for heritable cancer risk, but it has been less useful for identifying predictive biomarkers of sensitivity to cytotoxic chemotherapies, antibody drug conjugates, and immunotherapies. The clinical adoption of molecular profiling platforms such as RNA sequencing better suited to identifying those patients most likely to respond to immunotherapies and drug combinations will be critical to expanding the benefits of precision oncology. This review discusses the potential advantages of innovative molecular and functional profiling platforms designed to replace or complement targeted DNA sequencing and the major hurdles to their clinical adoption.
Collapse
Affiliation(s)
- Xinran Tang
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Graduate School of Medical Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Michael F Berger
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David B Solit
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
11
|
Record CJ, Reilly MM. Lessons and pitfalls of whole genome sequencing. Pract Neurol 2024; 24:263-274. [PMID: 38548322 DOI: 10.1136/pn-2023-004083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2024] [Indexed: 07/18/2024]
Abstract
Whole-genome sequencing (WGS) has recently become the first-line genetic investigation for many suspected genetic neurological disorders. While its diagnostic capabilities are innumerable, as with any test, it has its limitations. Clinicians should be aware of where WGS is extremely reliable (detecting single-nucleotide variants), where its reliability is much improved (detecting copy number variants and small repeat expansions) and where it may miss/misinterpret a variant (large repeat expansions, balanced structural variants or low heteroplasmy mitochondrial DNA variants). Bioinformatic technology and virtual gene panels are constantly evolving, and it is important to know what genes and what types of variant are being tested; the current National Health Service Genomic Medicine Service WGS offers more than early iterations of the 100 000 Genomes Project analysis. Close communication between clinician and laboratory, ideally through a multidisciplinary team meeting, is encouraged where there is diagnostic uncertainty.
Collapse
Affiliation(s)
- Christopher J Record
- Centre for Neuromuscular Diseases, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Mary M Reilly
- Centre for Neuromuscular Diseases, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| |
Collapse
|
12
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
13
|
Campanelli A, Pibiri GE, Fan J, Patro R. Where the patterns are: repetition-aware compression for colored de Bruijn graphs ⋆. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.09.602727. [PMID: 39026859 PMCID: PMC11257547 DOI: 10.1101/2024.07.09.602727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
We describe lossless compressed data structures for the colored de Bruijn graph (or, c-dBG). Given a collection of reference sequences, a c-dBG can be essentially regarded as a map from k -mers to their color sets . The color set of a k -mer is the set of all identifiers, or colors , of the references that contain the k -mer. While these maps find countless applications in computational biology (e.g., basic query, reading mapping, abundance estimation, etc.), their memory usage represents a serious challenge for large-scale sequence indexing. Our solutions leverage on the intrinsic repetitiveness of the color sets when indexing large collections of related genomes. Hence, the described algorithms factorize the color sets into patterns that repeat across the entire collection and represent these patterns once, instead of redundantly replicating their representation as would happen if the sets were encoded as atomic lists of integers. Experimental results across a range of datasets and query workloads show that these representations substantially improve over the space effectiveness of the best previous solutions (sometimes, even dramatically, yielding indexes that are smaller by an order of magnitude). Despite the space reduction, these indexes only moderately impact the efficiency of the queries compared to the fastest indexes. Software The implementation of the indexes used for all experiments in this work is written in C++17 and is available at https://github.com/jermp/fulgor .
Collapse
|
14
|
Kolesnikov A, Cook D, Nattestad M, Brambrink L, McNulty B, Gorzynski J, Goenka S, Ashley EA, Jain M, Miga KH, Paten B, Chang PC, Carroll A, Shafin K. Local read haplotagging enables accurate long-read small variant calling. Nat Commun 2024; 15:5907. [PMID: 39003259 PMCID: PMC11246426 DOI: 10.1038/s41467-024-50079-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/28/2024] [Indexed: 07/15/2024] Open
Abstract
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.
Collapse
Affiliation(s)
| | - Daniel Cook
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | | | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Miten Jain
- Northeastern university, Boston, MA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| | - Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| |
Collapse
|
15
|
Tay AP, Didi K, Wickramarachchi A, Bauer DC, Wilson LOW, Maselko M. Synsor: a tool for alignment-free detection of engineered DNA sequences. Front Bioeng Biotechnol 2024; 12:1375626. [PMID: 39070163 PMCID: PMC11272466 DOI: 10.3389/fbioe.2024.1375626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/18/2024] [Indexed: 07/30/2024] Open
Abstract
DNA sequences of nearly any desired composition, length, and function can be synthesized to alter the biology of an organism for purposes ranging from the bioproduction of therapeutic compounds to invasive pest control. Yet despite offering many great benefits, engineered DNA poses a risk due to their possible misuse or abuse by malicious actors, or their unintentional introduction into the environment. Monitoring the presence of engineered DNA in biological or environmental systems is therefore crucial for routine and timely detection of emerging biological threats, and for improving public acceptance of genetic technologies. To address this, we developed Synsor, a tool for identifying engineered DNA sequences in high-throughput sequencing data. Synsor leverages the k-mer signature differences between naturally occurring and engineered DNA sequences and uses an artificial neural network to classify whether a DNA sequence is natural or engineered. By querying suspected sequences against the model, Synsor can identify sequences that are likely to have been engineered. Using natural plasmid and engineered vector sequences, we showed that Synsor identifies engineered DNA with >99% accuracy. We demonstrate how Synsor can be used to detect potential genetically engineered organisms and locate where engineered DNA is being introduced into the environment by analysing genomic and metagenomic data from yeast and wastewater samples, respectively. Synsor is therefore a powerful tool that will streamline the process of identifying engineered DNA in poorly characterized biological or environmental systems, thereby allowing for enhanced monitoring of emerging biological threats.
Collapse
Affiliation(s)
- Aidan P. Tay
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Kieran Didi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| | - Denis C. Bauer
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Laurence O. W. Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Maciej Maselko
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
- Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| |
Collapse
|
16
|
Nolan DJ, DaRoza J, Brody R, Ganta K, Luzuriaga K, Huston C, Rosenthal S, Lamers SL, Rose R. Comparing Gold-Standard Sanger Sequencing with Two Next-Generation Sequencing Platforms of HIV-1 gp160 Single Genome Amplicons. AIDS Res Hum Retroviruses 2024. [PMID: 38940749 DOI: 10.1089/aid.2024.0012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Our goal was to assess the accuracy of next generation sequencing (NGS) compared with Sanger. We performed single genome amplification (SGA) of HIV-1 gp160 on extracted tissue DNA from two HIV+ individuals. Amplicons (n = 30) were sequenced with Sanger or reamplified with barcoded primers and pooled before sequencing using Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB). For each amplicon, a consensus sequence for NGS reads was obtained by (1) mapping reads to the Sanger sequence when available ("reference-based") or (2) mapping reads to a "pseudo-reference" sequence, i.e., a consensus sequence of a subset of NGS reads ("reference-free"). PB reads were clustered based on genetic similarity. A Sanger consensus sequence was obtained for 23/30 amplicons, for which all NGS consensus sequences were identical (n = 9) or nearly identical (n = 14) compared with Sanger. For the nine mismatches between Sanger/NGS, the nucleotide in the NGS sequence matched all other sequences from that patient. Of the 7/30 amplicons without a Sanger sequence, NGS sequences had ≥35 ambiguous calls in five amplicons and 0 ambiguities in two amplicons. Analysis of the electropherograms showed failure of a single sequencing primer for the latter two amplicons (consistent with a single template) and overlapping peaks for the other five (consistent with multiple templates). Clustering results closely followed the Sanger/NGS consensus results, where amplicons derived from a single template also had a single cluster and vice versa (with one exception, which could be the result of barcode misidentification). Representative sequences from the clusters contained 2-13 differences compared with Sanger/NGS. In summary, we show that both ONT and PB can produce amplicon consensus sequences with similar or higher accuracy compared with Sanger and, importantly, without the need for a known reference sequence. Clustering could be useful in some circumstances to predict or confirm the presence of multiple starting templates.
Collapse
Affiliation(s)
| | | | - Robin Brody
- Molecular Medicine, UMass Chan Medical School, Worcester, Massachusetts, USA
| | - Krishna Ganta
- Molecular Medicine, UMass Chan Medical School, Worcester, Massachusetts, USA
| | - Katherine Luzuriaga
- Molecular Medicine, UMass Chan Medical School, Worcester, Massachusetts, USA
| | | | | | | | | |
Collapse
|
17
|
Westhaeusser F, Fuhlert P, Dietrich E, Lennartz M, Khatri R, Kaiser N, Röbeck P, Bülow R, von Stillfried S, Witte A, Ladjevardi S, Drotte A, Severgardh P, Baumbach J, Puelles VG, Häggman M, Brehler M, Boor P, Walhagen P, Dragomir A, Busch C, Graefen M, Bengtsson E, Sauter G, Zimmermann M, Bonn S. Robust, credible, and interpretable AI-based histopathological prostate cancer grading. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.09.24310082. [PMID: 39040171 PMCID: PMC11261944 DOI: 10.1101/2024.07.09.24310082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Background Prostate cancer (PCa) is among the most common cancers in men and its diagnosis requires the histopathological evaluation of biopsies by human experts. While several recent artificial intelligence-based (AI) approaches have reached human expert-level PCa grading, they often display significantly reduced performance on external datasets. This reduced performance can be caused by variations in sample preparation, for instance the staining protocol, section thickness, or scanner used. Another limiting factor of contemporary AI-based PCa grading is the prediction of ISUP grades, which leads to the perpetuation of human annotation errors. Methods We developed the prostate cancer aggressiveness index (PCAI), an AI-based PCa detection and grading framework that is trained on objective patient outcome, rather than subjective ISUP grades. We designed PCAI as a clinical application, containing algorithmic modules that offer robustness to data variation, medical interpretability, and a measure of prediction confidence. To train and evaluate PCAI, we generated a multicentric, retrospective, observational trial consisting of six cohorts with 25,591 patients, 83,864 images, and 5 years of median follow-up from 5 different centers and 3 countries. This includes a high-variance dataset of 8,157 patients and 28,236 images with variations in sample thickness, staining protocol, and scanner, allowing for the systematic evaluation and optimization of model robustness to data variation. The performance of PCAI was assessed on three external test cohorts from two countries, comprising 2,255 patients and 9,437 images. Findings Using our high-variance datasets, we show how differences in sample processing, particularly slide thickness and staining time, significantly reduce the performance of AI-based PCa grading by up to 6.2 percentage points in the concordance index (C-index). We show how a select set of algorithmic improvements, including domain adversarial training, conferred robustness to data variation, interpretability, and a measure of credibility to PCAI. These changes lead to significant prediction improvement across two biopsy cohorts and one TMA cohort, systematically exceeding expert ISUP grading in C-index and AUROC by up to 22 percentage points. Interpretation Data variation poses serious risks for AI-based histopathological PCa grading, even when models are trained on large datasets. Algorithmic improvements for model robustness, interpretability, credibility, and training on high-variance data as well as outcome-based severity prediction gives rise to robust models with above ISUP-level PCa grading performance.
Collapse
Affiliation(s)
- Fabian Westhaeusser
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Spearpoint Analytics AB, Stockholm, Sweden
| | - Patrick Fuhlert
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Spearpoint Analytics AB, Stockholm, Sweden
| | - Esther Dietrich
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Maximilian Lennartz
- Institute of Pathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Robin Khatri
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Nico Kaiser
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Pontus Röbeck
- Department of Urology, Uppsala University Hospital, Uppsala, Sweden
| | - Roman Bülow
- Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany
| | | | - Anja Witte
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sam Ladjevardi
- Department of Urology, Uppsala University Hospital, Uppsala, Sweden
| | | | | | - Jan Baumbach
- Institute of Computational Systems Biology, University of Hamburg, Germany
| | - Victor G. Puelles
- III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
- Department of Pathology, Aarhus University Hospital, Aarhus, Denmark
| | - Michael Häggman
- Department of Urology, Uppsala University Hospital, Uppsala, Sweden
| | - Michael Brehler
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Peter Boor
- Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany
| | | | - Anca Dragomir
- Department of Pathology, Uppsala University Hospital and Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Christer Busch
- Spearpoint Analytics AB, Stockholm, Sweden
- Department of Urology, Uppsala University Hospital, Uppsala, Sweden
| | - Markus Graefen
- Martini-Klinik Prostate Cancer Center, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Ewert Bengtsson
- Spearpoint Analytics AB, Stockholm, Sweden
- Uppsala University, Department of Information Technology, Centre for Image Analysis, Uppsala, Sweden
| | - Guido Sauter
- Institute of Pathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Marina Zimmermann
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Bonn
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Spearpoint Analytics AB, Stockholm, Sweden
| |
Collapse
|
18
|
Wang S, Liu Y, Tam WH, Ching JYL, Xu W, Yan S, Qin B, Lin L, Peng Y, Zhu J, Cheung CP, Ip KL, Wong YM, Cheong PK, Yeung YL, Kan WHB, Leung TF, Leung TY, Chang EB, Rubin DT, Claud EC, Wu WKK, Tun HM, Chan FKL, Ng SC, Zhang L. Maternal gestational diabetes mellitus associates with altered gut microbiome composition and head circumference abnormalities in male offspring. Cell Host Microbe 2024; 32:1192-1206.e5. [PMID: 38955186 DOI: 10.1016/j.chom.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 05/02/2024] [Accepted: 06/05/2024] [Indexed: 07/04/2024]
Abstract
The impact of gestational diabetes mellitus (GDM) on maternal or infant microbiome trajectory remains poorly understood. Utilizing large-scale longitudinal fecal samples from 264 mother-baby dyads, we present the gut microbiome trajectory of the mothers throughout pregnancy and infants during the first year of life. GDM mothers had a distinct microbiome diversity and composition during the gestation period. GDM leaves fingerprints on the infant's gut microbiome, which are confounded by delivery mode. Further, Clostridium species positively correlate with a larger head circumference at month 12 in male offspring but not females. The gut microbiome of GDM mothers with male fetuses displays depleted gut-brain modules, including acetate synthesis I and degradation and glutamate synthesis II. The gut microbiome of female infants of GDM mothers has higher histamine degradation and dopamine degradation. Together, our integrative analysis indicates that GDM affects maternal and infant gut composition, which is associated with sexually dimorphic infant head growth.
Collapse
Affiliation(s)
- Shilan Wang
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yingzhi Liu
- Microbiota I-Center (MagIC), Hong Kong SAR, China
| | - Wing Hung Tam
- Department of Obstetrics and Gynaecology, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Jessica Y L Ching
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Wenye Xu
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Shuai Yan
- Microbiota I-Center (MagIC), Hong Kong SAR, China
| | - Biyan Qin
- Microbiota I-Center (MagIC), Hong Kong SAR, China
| | - Ling Lin
- Microbiota I-Center (MagIC), Hong Kong SAR, China
| | - Ye Peng
- Microbiota I-Center (MagIC), Hong Kong SAR, China; JC School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China; Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Jie Zhu
- Microbiota I-Center (MagIC), Hong Kong SAR, China
| | - Chun Pan Cheung
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Ka Long Ip
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yuen Man Wong
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Pui Kuan Cheong
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yuk Ling Yeung
- Department of Obstetrics and Gynaecology, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Wing Him Betty Kan
- Department of Obstetrics and Gynaecology, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Ting Fan Leung
- Department of Paediatrics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Hub of Paediatric Excellence, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Tak Yeung Leung
- Department of Obstetrics and Gynaecology, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Eugene B Chang
- Department of Medicine, Section of Gastroenterology, Hepatology, and Nutrition, University of Chicago, Chicago, IL 60637, USA
| | - David T Rubin
- Department of Medicine, Section of Gastroenterology, Hepatology, and Nutrition, University of Chicago, Chicago, IL 60637, USA
| | - Erika C Claud
- Departments of Pediatrics and Medicine, Pritzker School of Medicine/Biological Sciences Division, University of Chicago, Chicago, IL 60637, USA
| | - William K K Wu
- Department of Anaesthesia and Intensive Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Hein M Tun
- Microbiota I-Center (MagIC), Hong Kong SAR, China; JC School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China; Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Francis K L Chan
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Centre for Gut Microbiota Research, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Siew C Ng
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China; Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China; State Key Laboratory of Digestive Disease Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong SAR, China.
| | - Lin Zhang
- Microbiota I-Center (MagIC), Hong Kong SAR, China; Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
19
|
Chari T, Gorin G, Pachter L. Stochastic Modeling of Biophysical Responses to Perturbation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.04.602131. [PMID: 39005347 PMCID: PMC11245117 DOI: 10.1101/2024.07.04.602131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Recent advances in high-throughput, multi-condition experiments allow for genome-wide investigation of how perturbations affect transcription and translation in the cell across multiple biological entities or modalities, from chromatin and mRNA information to protein production and spatial morphology. This presents an unprecedented opportunity to unravel how the processes of DNA and RNA regulation direct cell fate determination and disease response. Most methods designed for analyzing large-scale perturbation data focus on the observational outcomes, e.g., expression; however, many potential transcriptional mechanisms, such as transcriptional bursting or splicing dynamics, can underlie these complex and noisy observations. In this analysis, we demonstrate how a stochastic biophysical modeling approach to interpreting high-throughout perturbation data enables deeper investigation of the 'how' behind such molecular measurements. Our approach takes advantage of modalities already present in data produced with current technologies, such as nascent and mature mRNA measurements, to illuminate transcriptional dynamics induced by perturbation, predict kinetic behaviors in new perturbation settings, and uncover novel populations of cells with distinct kinetic responses to perturbation.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
20
|
Li Z, Chen F, Chen L, Liu J, Tseng D, Hadi F, Omarjee S, Kishore K, Kent J, Kirkpatrick J, D’Santos C, Lawson M, Gertz J, Sikora MJ, McDonnell DP, Carroll JS, Polyak K, Oesterreich S, Lee AV. EstroGene2.0: A multi-omic database of response to estrogens, ER-modulators, and resistance to endocrine therapies in breast cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.28.601163. [PMID: 39005294 PMCID: PMC11244912 DOI: 10.1101/2024.06.28.601163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Endocrine therapies targeting the estrogen receptor (ER/ESR1) are the cornerstone to treat ER-positive breast cancers patients, but resistance often limits their effectiveness. Understanding the molecular mechanisms is thus key to optimize the existing drugs and to develop new ER-modulators. Notable progress has been made although the fragmented way data is reported has reduced their potential impact. Here, we introduce EstroGene2.0, an expanded database of its precursor 1.0 version. EstroGene2.0 focusses on response and resistance to endocrine therapies in breast cancer models. Incorporating multi-omic profiling of 361 experiments from 212 studies across 28 cell lines, a user-friendly browser offers comprehensive data visualization and metadata mining capabilities (https://estrogeneii.web.app/). Taking advantage of the harmonized data collection, our follow-up meta-analysis revealed substantial diversity in response to different classes of ER-modulators including SERMs, SERDs, SERCA and LDD/PROTAC. Notably, endocrine resistant models exhibit a spectrum of transcriptomic alterations including a contra-directional shift in ER and interferon signaling, which is recapitulated clinically. Furthermore, dissecting multiple ESR1-mutant cell models revealed the different clinical relevance of genome-edited versus ectopic overexpression model engineering and identified high-confidence mutant-ER targets, such as NPY1R. These examples demonstrate how EstroGene2.0 helps investigate breast cancer's response to endocrine therapies and explore resistance mechanisms.
Collapse
Affiliation(s)
- Zheqi Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Fangyuan Chen
- School of Medicine, Tsinghua University, Beijing, China
- Women’s Cancer Research Center, UPMC Hillman Cancer Center, Pittsburgh PA, USA
| | - Li Chen
- Computational Biology Department, Carnegie Mellon University, Pittsburgh PA, USA
| | - Jiebin Liu
- Women’s Cancer Research Center, UPMC Hillman Cancer Center, Pittsburgh PA, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Danielle Tseng
- Women’s Cancer Research Center, UPMC Hillman Cancer Center, Pittsburgh PA, USA
| | | | - Soleilmane Omarjee
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Kamal Kishore
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Joshua Kent
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Joanna Kirkpatrick
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Clive D’Santos
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
| | | | - Jason Gertz
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Matthew J. Sikora
- Department of Pathology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Donald P. McDonnell
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC, USA
| | - Jason S. Carroll
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Kornelia Polyak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Steffi Oesterreich
- Women’s Cancer Research Center, UPMC Hillman Cancer Center, Pittsburgh PA, USA
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh PA, USA
| | - Adrian V. Lee
- Women’s Cancer Research Center, UPMC Hillman Cancer Center, Pittsburgh PA, USA
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh PA, USA
- Institute for Precision Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
21
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
22
|
Haque MM, Kuppusamy P, Melemedjian OK. Disruption of mitochondrial pyruvate oxidation in dorsal root ganglia drives persistent nociceptive sensitization and causes pervasive transcriptomic alterations. Pain 2024; 165:1531-1549. [PMID: 38285538 PMCID: PMC11189764 DOI: 10.1097/j.pain.0000000000003158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/04/2023] [Accepted: 10/18/2023] [Indexed: 01/31/2024]
Abstract
ABSTRACT Metabolism is inextricably linked to every aspect of cellular function. In addition to energy production and biosynthesis, metabolism plays a crucial role in regulating signal transduction and gene expression. Altered metabolic states have been shown to maintain aberrant signaling and transcription, contributing to diseases like cancer, cardiovascular disease, and neurodegeneration. Metabolic gene polymorphisms and defects are also associated with chronic pain conditions, as are increased levels of nerve growth factor (NGF). However, the mechanisms by which NGF may modulate sensory neuron metabolism remain unclear. This study demonstrated that intraplantar NGF injection reprograms sensory neuron metabolism. Nerve growth factor suppressed mitochondrial pyruvate oxidation and enhanced lactate extrusion, requiring 24 hours to increase lactate dehydrogenase A and pyruvate dehydrogenase kinase 1 (PDHK1) expression. Inhibiting these metabolic enzymes reversed NGF-mediated effects. Remarkably, directly disrupting mitochondrial pyruvate oxidation induced severe, persistent allodynia, implicating this metabolic dysfunction in chronic pain. Nanopore long-read sequencing of poly(A) mRNA uncovered extensive transcriptomic changes upon metabolic disruption, including altered gene expression, splicing, and poly(A) tail lengths. By linking metabolic disturbance of dorsal root ganglia to transcriptome reprogramming, this study enhances our understanding of the mechanisms underlying persistent nociceptive sensitization. These findings imply that impaired mitochondrial pyruvate oxidation may drive chronic pain, possibly by impacting transcriptomic regulation. Exploring these metabolite-driven mechanisms further might reveal novel therapeutic targets for intractable pain.
Collapse
Affiliation(s)
- Md Mamunul Haque
- Deptartmen of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, United States
| | - Panjamurthy Kuppusamy
- Deptartmen of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, United States
| | - Ohannes K. Melemedjian
- Deptartmen of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, United States
- UM Center to Advance Chronic Pain Research, Baltimore, MD, United States
- UM Marlene and Stewart Greenebaum Comprehensive Cancer Center, Baltimore, MD, United States
| |
Collapse
|
23
|
Wertenbroek R, Hofmeister RJ, Xenarios I, Thoma Y, Delaneau O. Improving population scale statistical phasing with whole-genome sequencing data. PLoS Genet 2024; 20:e1011092. [PMID: 38959269 PMCID: PMC11251608 DOI: 10.1371/journal.pgen.1011092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 07/16/2024] [Accepted: 06/11/2024] [Indexed: 07/05/2024] Open
Abstract
Haplotype estimation, or phasing, has gained significant traction in large-scale projects due to its valuable contributions to population genetics, variant analysis, and the creation of reference panels for imputation and phasing of new samples. To scale with the growing number of samples, haplotype estimation methods designed for population scale rely on highly optimized statistical models to phase genotype data, and usually ignore read-level information. Statistical methods excel in resolving common variants, however, they still struggle at rare variants due to the lack of statistical information. In this study we introduce SAPPHIRE, a new method that leverages whole-genome sequencing data to enhance the precision of haplotype calls produced by statistical phasing. SAPPHIRE achieves this by refining haplotype estimates through the realignment of sequencing reads, particularly targeting low-confidence phase calls. Our findings demonstrate that SAPPHIRE significantly enhances the accuracy of haplotypes obtained from state of the art methods and also provides the subset of phase calls that are validated by sequencing reads. Finally, we show that our method scales to large data sets by its successful application to the extensive 3.6 Petabytes of sequencing data of the last UK Biobank 200,031 sample release.
Collapse
Affiliation(s)
- Rick Wertenbroek
- University of Lausanne, Lausanne, Vaud, Switzerland
- School of Engineering and Management Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, Yverdon-les-Bains, Vaud, Switzerland
| | | | | | - Yann Thoma
- School of Engineering and Management Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, Yverdon-les-Bains, Vaud, Switzerland
| | - Olivier Delaneau
- Regeneron Genetics Center, Tarrytown, New York, United States of America
| |
Collapse
|
24
|
Gao Z, Lu Y, Chong Y, Li M, Hong J, Wu J, Wu D, Xi D, Deng W. Beef Cattle Genome Project: Advances in Genome Sequencing, Assembly, and Functional Genes Discovery. Int J Mol Sci 2024; 25:7147. [PMID: 39000250 PMCID: PMC11240973 DOI: 10.3390/ijms25137147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 06/23/2024] [Accepted: 06/26/2024] [Indexed: 07/16/2024] Open
Abstract
Beef is a major global source of protein, playing an essential role in the human diet. The worldwide production and consumption of beef continue to rise, reflecting a significant trend. However, despite the critical importance of beef cattle resources in agriculture, the diversity of cattle breeds faces severe challenges, with many breeds at risk of extinction. The initiation of the Beef Cattle Genome Project is crucial. By constructing a high-precision functional annotation map of their genome, it becomes possible to analyze the genetic mechanisms underlying important traits in beef cattle, laying a solid foundation for breeding more efficient and productive cattle breeds. This review details advances in genome sequencing and assembly technologies, iterative upgrades of the beef cattle reference genome, and its application in pan-genome research. Additionally, it summarizes relevant studies on the discovery of functional genes associated with key traits in beef cattle, such as growth, meat quality, reproduction, polled traits, disease resistance, and environmental adaptability. Finally, the review explores the potential of telomere-to-telomere (T2T) genome assembly, structural variations (SVs), and multi-omics techniques in future beef cattle genetic breeding. These advancements collectively offer promising avenues for enhancing beef cattle breeding and improving genetic traits.
Collapse
Affiliation(s)
- Zhendong Gao
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Ying Lu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Yuqing Chong
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Mengfei Li
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Jieyun Hong
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Jiao Wu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Dongwang Wu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Dongmei Xi
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Weidong Deng
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
- State Key Laboratory for Conservation and Utilization of Bio-Resource in Yunnan, Kunming 650201, China
| |
Collapse
|
25
|
Carbonell-Sala S, Perteghella T, Lagarde J, Nishiyori H, Palumbo E, Arnan C, Takahashi H, Carninci P, Uszczynska-Ratajczak B, Guigó R. CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing. Nat Commun 2024; 15:5278. [PMID: 38937428 PMCID: PMC11211341 DOI: 10.1038/s41467-024-49523-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 06/10/2024] [Indexed: 06/29/2024] Open
Abstract
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.
Collapse
Affiliation(s)
- Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Tamara Perteghella
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Flomics Biotech, SL, Carrer de Roc Boronat 31, 08005, Barcelona, Catalonia, Spain
| | - Hiromi Nishiyori
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan
| | - Emilio Palumbo
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Carme Arnan
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan
- Human Technopole, Milan, Italy
| | - Barbara Uszczynska-Ratajczak
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| |
Collapse
|
26
|
Liu S, Obert C, Yu YP, Zhao J, Ren BG, Liu JJ, Wiseman K, Krajacich BJ, Wang W, Metcalfe K, Smith M, Ben-Yehezkel T, Luo JH. Utility Analyses of AVITI Sequencing Chemistry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590136. [PMID: 38712138 PMCID: PMC11071311 DOI: 10.1101/2024.04.18.590136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background DNA sequencing is a critical tool in modern biology. Over the last two decades, it has been revolutionized by the advent of massively parallel sequencing, leading to significant advances in the genome and transcriptome sequencing of various organisms. Nevertheless, challenges with accuracy, lack of competitive options and prohibitive costs associated with high throughput parallel short-read sequencing persist. Results Here, we conduct a comparative analysis using matched DNA and RNA short-reads assays between Element Biosciences' AVITI and Illumina's NextSeq 550 chemistries. Similar comparisons were evaluated for synthetic long-read sequencing for RNA and targeted single-cell transcripts between the AVITI and Illumina's NovaSeq 6000. For both DNA and RNA short-read applications, the study found that the AVITI produced significantly higher per sequence quality scores. For PCR-free DNA libraries, we observed an average 89.7% lower experimentally determined error rate when using the AVITI chemistry, compared to the NextSeq 550. For short-read RNA quantification, AVITI platform had an average of 32.5% lower error rate than that for NextSeq 550. With regards to synthetic long-read mRNA and targeted synthetic long read single cell mRNA sequencing, both platforms' respective chemistries performed comparably in quantification of genes and isoforms. The AVITI displayed a marginally lower error rate for long reads, with fewer chemistry-specific errors and a higher mutation detection rate. Conclusion These results point to the potential of the AVITI platform as a competitive candidate in high-throughput short read sequencing analyses when juxtaposed with the Illumina NextSeq 550.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Caroline Obert
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Yan-Ping Yu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Junhua Zhao
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Bao-Guo Ren
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
| | - Jia-Jun Liu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Kelly Wiseman
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Benjamin J Krajacich
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Wenjia Wang
- Department of Biostatistics, University of Pittsburgh School of Public Health, United States
| | - Kyle Metcalfe
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Mat Smith
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Tuval Ben-Yehezkel
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Jian-Hua Luo
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| |
Collapse
|
27
|
Li C, Hong W, Reuben A, Wang L, Maitra A, Zhang J, Cheng C. TimiGP-Response: the pan-cancer immune landscape associated with response to immunotherapy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.21.600089. [PMID: 38979334 PMCID: PMC11230183 DOI: 10.1101/2024.06.21.600089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Accumulating evidence suggests that the tumor immune microenvironment (TIME) significantly influences the response to immunotherapy, yet this complex relationship remains elusive. To address this issue, we developed TimiGP-Response (TIME Illustration based on Gene Pairing designed for immunotherapy Response), a computational framework leveraging single-cell and bulk transcriptomic data, along with response information, to construct cell-cell interaction networks associated with responders and estimate the role of immune cells in treatment response. This framework was showcased in triple-negative breast cancer treated with immune checkpoint inhibitors targeting the PD-1:PD-L1 interaction, and orthogonally validated with imaging mass cytometry. As a result, we identified CD8+ GZMB+ T cells associated with responders and its interaction with regulatory T cells emerged as a potential feature for selecting patients who may benefit from these therapies. Subsequently, we analyzed 3,410 patients with seven cancer types (melanoma, non-small cell lung cancer, renal cell carcinoma, metastatic urothelial carcinoma, hepatocellular carcinoma, breast cancer, and esophageal cancer) treated with various immunotherapies and combination therapies, as well as several chemo- and targeted therapies as controls. Using TimiGP-Response, we depicted the pan-cancer immune landscape associated with immunotherapy response at different resolutions. At the TIME level, CD8 T cells and CD4 memory T cells were associated with responders, while anti-inflammatory (M2) macrophages and mast cells were linked to non-responders across most cancer types and datasets. Given that T cells are the primary targets of these immunotherapies and our TIME analysis highlights their importance in response to treatment, we portrayed the pan-caner landscape on 40 T cell subtypes. Notably, CD8+ and CD4+ GZMK+ effector memory T cells emerged as crucial across all cancer types and treatments, while IL-17-producing CD8+ T cells were top candidates associated with immunotherapy non-responders. In summary, this study provides a computational method to study the association between TIME and response across the pan-cancer immune landscape, offering resources and insights into immune cell interactions and their impact on treatment efficacy.
Collapse
Affiliation(s)
- Chenyang Li
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth Houston, Houston, TX 77030, USA
| | - Wei Hong
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alexandre Reuben
- Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth Houston, Houston, TX 77030, USA
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Linghua Wang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth Houston, Houston, TX 77030, USA
| | - Anirban Maitra
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Sheikh Ahmed Center for Pancreatic Cancer Research, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jianjun Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth Houston, Houston, TX 77030, USA
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Lung Cancer Genomics Program, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Lung Cancer Interception Program, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
- The Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
28
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
29
|
Deaville LA, Berrens RV. Technology to the rescue: how to uncover the role of transposable elements in preimplantation development. Biochem Soc Trans 2024; 52:1349-1362. [PMID: 38752836 DOI: 10.1042/bst20231262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 06/27/2024]
Abstract
Transposable elements (TEs) are highly expressed in preimplantation development. Preimplantation development is the phase when the cells of the early embryo undergo the first cell fate choice and change from being totipotent to pluripotent. A range of studies have advanced our understanding of TEs in preimplantation, as well as their epigenetic regulation and functional roles. However, many questions remain about the implications of TE expression during early development. Challenges originate first due to the abundance of TEs in the genome, and second because of the limited cell numbers in preimplantation. Here we review the most recent technological advancements promising to shed light onto the role of TEs in preimplantation development. We explore novel avenues to identify genomic TE insertions and improve our understanding of the regulatory mechanisms and roles of TEs and their RNA and protein products during early development.
Collapse
Affiliation(s)
- Lauryn A Deaville
- Institute for Developmental and Regenerative Medicine, Oxford University, IMS-Tetsuya Nakamura Building, Old Road Campus, Roosevelt Dr, Oxford OX3 7TY, U.K
- Department of Paediatrics, Oxford University, Level 2, Children's Hospital, John Radcliffe Headington, Oxford OX3 9DU, U.K
- MRC Weatherall Institute of Molecular Medicine, Oxford University, John Radcliffe Hospital, Oxford OX3 9DS, U.K
| | - Rebecca V Berrens
- Institute for Developmental and Regenerative Medicine, Oxford University, IMS-Tetsuya Nakamura Building, Old Road Campus, Roosevelt Dr, Oxford OX3 7TY, U.K
- Department of Paediatrics, Oxford University, Level 2, Children's Hospital, John Radcliffe Headington, Oxford OX3 9DU, U.K
| |
Collapse
|
30
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
31
|
Henglin M, Ghareghani M, Harvey W, Porubsky D, Koren S, Eichler EE, Ebert P, Marschall T. Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580432. [PMID: 38529499 PMCID: PMC10962706 DOI: 10.1101/2024.02.15.580432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de-novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de-novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio-phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.
Collapse
Affiliation(s)
- Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| | - Maryam Ghareghani
- Department of Mathematics and Computer Science, Freie Universität Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - William Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| |
Collapse
|
32
|
Kinnersley B, Sud A, Everall A, Cornish AJ, Chubb D, Culliford R, Gruber AJ, Lärkeryd A, Mitsopoulos C, Wedge D, Houlston R. Analysis of 10,478 cancer genomes identifies candidate driver genes and opportunities for precision oncology. Nat Genet 2024:10.1038/s41588-024-01785-9. [PMID: 38890488 DOI: 10.1038/s41588-024-01785-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 05/01/2024] [Indexed: 06/20/2024]
Abstract
Tumor genomic profiling is increasingly seen as a prerequisite to guide the treatment of patients with cancer. To explore the value of whole-genome sequencing (WGS) in broadening the scope of cancers potentially amenable to a precision therapy, we analysed whole-genome sequencing data on 10,478 patients spanning 35 cancer types recruited to the UK 100,000 Genomes Project. We identified 330 candidate driver genes, including 74 that are new to any cancer. We estimate that approximately 55% of patients studied harbor at least one clinically relevant mutation, predicting either sensitivity or resistance to certain treatments or clinical trial eligibility. By performing computational chemogenomic analysis of cancer mutations we identify additional targets for compounds that represent attractive candidates for future clinical trials. This study represents one of the most comprehensive efforts thus far to identify cancer driver genes in the real world setting and assess their impact on informing precision oncology.
Collapse
Affiliation(s)
- Ben Kinnersley
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
- University College London Cancer Institute, University College London, London, UK
| | - Amit Sud
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Andrew Everall
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Alex J Cornish
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Daniel Chubb
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Richard Culliford
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Andreas J Gruber
- Systems Biology & Biomedical Data Science Laboratory, University of Konstanz, Konstanz, Germany
| | - Adrian Lärkeryd
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Costas Mitsopoulos
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - David Wedge
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK.
| |
Collapse
|
33
|
Pimentel H, Freimer JW, Arce MM, Garrido CM, Marson A, Pritchard JK. A model for accurate quantification of CRISPR effects in pooled FACS screens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.17.599448. [PMID: 38948774 PMCID: PMC11213010 DOI: 10.1101/2024.06.17.599448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
CRISPR screens are powerful tools to identify key genes that underlie biological processes. One important type of screen uses fluorescence activated cell sorting (FACS) to sort perturbed cells into bins based on the expression level of marker genes, followed by guide RNA (gRNA) sequencing. Analysis of these data presents several statistical challenges due to multiple factors including the discrete nature of the bins and typically small numbers of replicate experiments. To address these challenges, we developed a robust and powerful Bayesian random effects model and software package called Waterbear. Furthermore, we used Waterbear to explore how various experimental design parameters affect statistical power to establish principled guidelines for future screens. Finally, we experimentally validated our experimental design model findings that, when using Waterbear for analysis, high power is maintained even at low cell coverage and a high multiplicity of infection. We anticipate that Waterbear will be of broad utility for analyzing FACS-based CRISPR screens.
Collapse
Affiliation(s)
- Harold Pimentel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Departments of Computational Medicine and Human Genetics, University of California, Los Angeles, Howard Hughes Medical Institute, Los Angeles, CA 90024, USA
- These authors contributed equally
| | - Jacob W Freimer
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA
- Present address: Genentech Research and Early Development, South San Francisco, CA
- These authors contributed equally
| | - Maya M Arce
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Christian M Garrido
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA
| | - Alexander Marson
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
- UCSF Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, USA
- Parker Institute for Cancer Immunotherapy, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94143, USA
- These authors jointly supervised this work
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- These authors jointly supervised this work
| |
Collapse
|
34
|
Martorelli I, Pooryousefi A, van Thiel H, Sicking FJ, Ramackers GJ, Merckx V, Verbeek FJ. Multiple graphical views for automatically generating SQL for the MycoDiversity DB; making fungal biodiversity studies accessible. Biodivers Data J 2024; 12:e119660. [PMID: 38933486 PMCID: PMC11199959 DOI: 10.3897/bdj.12.e119660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 06/06/2024] [Indexed: 06/28/2024] Open
Abstract
Fungi is a highly diverse group of eukaryotic organisms that live under an extremely wide range of environmental conditions. Nowadays, there is a fundamental focus on observing how biodiversity varies on different spatial scales, in addition to understanding the environmental factors which drive fungal biodiversity. Metabarcoding is a high-throughput DNA sequencing technology that has positively contributed to observing fungal communities in environments. While the DNA sequencing data generated from metabarcoding studies are available in public archives, this valuable data resource is not directly usable for fungal biodiversity investigation. Additionally, due to its fragmented storage and distributed nature, it is not immediately accessible through a single user interface. We developed the MycoDiversity DataBase User Interface (https://mycodiversity.liacs.nl) to provide direct access and retrieval of fungal data that was previously inaccessible in the public domain. The user interface provides multiple graphical views of the data components used to reveal fungal biodiversity. These components include reliable geo-location terms, the reference taxonomic scientific names associated with fungal species and the standard features describing the environment where they occur. Direct observation of the public DNA sequencing data in association with fungi is accessible through SQL search queries created by interactively manipulating topological maps and dynamic hierarchical tree views. The search results are presented in configurable data table views that can be downloaded for further use. With the MycoDiversity DataBase User Interface, we make fungal biodiversity data accessible, assisting researchers and other stakeholders in using metabarcoding studies for assessing fungal biodiversity.
Collapse
Affiliation(s)
- Irene Martorelli
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, NetherlandsLeiden Institute of Advanced Computer Science (LIACS), Leiden UniversityLeidenNetherlands
- Naturalis Biodiversity Center, Leiden, NetherlandsNaturalis Biodiversity CenterLeidenNetherlands
| | - Aram Pooryousefi
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, NetherlandsLeiden Institute of Advanced Computer Science (LIACS), Leiden UniversityLeidenNetherlands
| | - Haike van Thiel
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, NetherlandsLeiden Institute of Advanced Computer Science (LIACS), Leiden UniversityLeidenNetherlands
| | - Floris J Sicking
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, NetherlandsLeiden Institute of Advanced Computer Science (LIACS), Leiden UniversityLeidenNetherlands
| | - Guus J Ramackers
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, NetherlandsLeiden Institute of Advanced Computer Science (LIACS), Leiden UniversityLeidenNetherlands
| | - Vincent Merckx
- Naturalis Biodiversity Center, Leiden, NetherlandsNaturalis Biodiversity CenterLeidenNetherlands
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, NetherlandsInstitute for Biodiversity and Ecosystem Dynamics, University of AmsterdamAmsterdamNetherlands
| | - Fons J Verbeek
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, NetherlandsLeiden Institute of Advanced Computer Science (LIACS), Leiden UniversityLeidenNetherlands
| |
Collapse
|
35
|
Relier S, Schiffers S, Beiki H, Oberdoerffer S. Enhanced ac4C detection in RNA via chemical reduction and cDNA synthesis with modified dNTPs. RNA (NEW YORK, N.Y.) 2024; 30:938-953. [PMID: 38697668 PMCID: PMC11182010 DOI: 10.1261/rna.079863.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 04/04/2024] [Indexed: 05/05/2024]
Abstract
The functional analysis of epitranscriptomic modifications in RNA is constrained by a lack of methods that accurately capture their locations and levels. We previously demonstrated that the RNA modification N4-acetylcytidine (ac4C) can be mapped at base resolution through sodium borohydride reduction to tetrahydroacetylcytidine (tetrahydro-ac4C), followed by cDNA synthesis to misincorporate adenosine opposite reduced ac4C sites, culminating in C:T mismatches at acetylated cytidines (RedaC:T). However, this process is relatively inefficient, resulting in <20% C:T mismatches at a fully modified ac4C site in 18S rRNA. Considering that ac4C locations in other substrates including mRNA are unlikely to reach full penetrance, this method is not ideal for comprehensive mapping. Here, we introduce "RetraC:T" (reduction to tetrahydro-ac4C and reverse transcription with amino-dATP to induce C:T mismatches) as a method with enhanced ability to detect ac4C in cellular RNA. In brief, RNA is reduced through NaBH4 or the closely related reagent sodium cyanoborohydride (NaCNBH3) followed by cDNA synthesis in the presence of a modified DNA nucleotide, 2-amino-dATP, that preferentially binds to tetrahydro-ac4C. Incorporation of the modified dNTP substantially improved C:T mismatch rates, reaching stoichiometric detection of ac4C in 18S rRNA. Importantly, 2-amino-dATP did not result in truncated cDNA products nor increase mismatches at other locations. Thus, modified dNTPs are introduced as a new addition to the toolbox for detecting ac4C at base resolution.
Collapse
Affiliation(s)
- Sebastien Relier
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Sarah Schiffers
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Hamid Beiki
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Shalini Oberdoerffer
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
36
|
Sanita Lima M, Silva Domingues D, Rossi Paschoal A, Smith DR. Long-read RNA sequencing can probe organelle genome pervasive transcription. Brief Funct Genomics 2024:elae026. [PMID: 38880995 DOI: 10.1093/bfgp/elae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/20/2024] [Accepted: 05/30/2024] [Indexed: 06/18/2024] Open
Abstract
40 years ago, organelle genomes were assumed to be streamlined and, perhaps, unexciting remnants of their prokaryotic past. However, the field of organelle genomics has exposed an unparallel diversity in genome architecture (i.e. genome size, structure, and content). The transcription of these eccentric genomes can be just as elaborate - organelle genomes are pervasively transcribed into a plethora of RNA types. However, while organelle protein-coding genes are known to produce polycistronic transcripts that undergo heavy posttranscriptional processing, the nature of organelle noncoding transcriptomes is still poorly resolved. Here, we review how wet-lab experiments and second-generation sequencing data (i.e. short reads) have been useful to determine certain types of organelle RNAs, particularly noncoding RNAs. We then explain how third-generation (long-read) RNA-Seq data represent the new frontier in organelle transcriptomics. We show that public repositories (e.g. NCBI SRA) already contain enough data for inter-phyla comparative studies and argue that organelle biologists can benefit from such data. We discuss the prospects of using publicly available sequencing data for organelle-focused studies and examine the challenges of such an approach. We highlight that the lack of a comprehensive database dedicated to organelle genomics/transcriptomics is a major impediment to the development of a field with implications in basic and applied science.
Collapse
Affiliation(s)
- Matheus Sanita Lima
- Department of Biology, Western University, 1151 Richmond Street, London, Ontario N6A 5B7, Canada
| | - Douglas Silva Domingues
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Avenida Padua Dias 11, Piracicaba, SP 13418-900, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Bioinformatics and Pattern Recognition Group (BIOINFO-CP), Federal University of Technology - Paraná - UTFPR, Avenida Alberto Carazzai 1640, Cornélio Procópio, PR 86300000, Brazil
| | - David Roy Smith
- Department of Biology, Western University, 1151 Richmond Street, London, Ontario N6A 5B7, Canada
| |
Collapse
|
37
|
Krause GR, Shands W, Wheeler TJ. Sensitive and error-tolerant annotation of protein-coding DNA with BATH. BIOINFORMATICS ADVANCES 2024; 4:vbae088. [PMID: 38966592 PMCID: PMC11223822 DOI: 10.1093/bioadv/vbae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 06/10/2024] [Indexed: 07/06/2024]
Abstract
Summary We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based translated sequence annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long-read sequencing data and in the context of pseudogenes. Availability and implementation The software is available at https://github.com/TravisWheelerLab/BATH.
Collapse
Affiliation(s)
- Genevieve R Krause
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| | - Walt Shands
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
- Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, United States
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| |
Collapse
|
38
|
Dobner J, Nguyen T, Pavez-Giani MG, Cyganek L, Distelmaier F, Krutmann J, Prigione A, Rossi A. mtDNA analysis using Mitopore. Mol Ther Methods Clin Dev 2024; 32:101231. [PMID: 38572068 PMCID: PMC10988129 DOI: 10.1016/j.omtm.2024.101231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 03/08/2024] [Indexed: 04/05/2024]
Abstract
Mitochondrial DNA (mtDNA) analysis is crucial for the diagnosis of mitochondrial disorders, forensic investigations, and basic research. Existing pipelines are complex, expensive, and require specialized personnel. In many cases, including the diagnosis of detrimental single nucleotide variants (SNVs), mtDNA analysis is still carried out using Sanger sequencing. Here, we developed a simple workflow and a publicly available webserver named Mitopore that allows the detection of mtDNA SNVs, indels, and haplogroups. To simplify mtDNA analysis, we tailored our workflow to process noisy long-read sequencing data for mtDNA analysis, focusing on sequence alignment and parameter optimization. We implemented Mitopore with eliBQ (eliminate bad quality reads), an innovative quality enhancement that permits the increase of per-base quality of over 20% for low-quality data. The whole Mitopore workflow and webserver were validated using patient-derived and induced pluripotent stem cells harboring mtDNA mutations. Mitopore streamlines mtDNA analysis as an easy-to-use fast, reliable, and cost-effective analysis method for both long- and short-read sequencing data. This significantly enhances the accessibility of mtDNA analysis and reduces the cost per sample, contributing to the progress of mtDNA-related research and diagnosis.
Collapse
Affiliation(s)
- Jochen Dobner
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
| | - Thach Nguyen
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
| | - Mario Gustavo Pavez-Giani
- Clinic for Cardiology and Pneumology, University Medical Center Göttingen, 37075 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, 37075 Göttingen, Germany
| | - Lukas Cyganek
- Clinic for Cardiology and Pneumology, University Medical Center Göttingen, 37075 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, 37075 Göttingen, Germany
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, 37075 Göttingen, Germany
| | - Felix Distelmaier
- Department of General Pediatrics, Neonatology and Pediatric Cardiology, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jean Krutmann
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
- Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Alessandro Prigione
- Department of General Pediatrics, Neonatology and Pediatric Cardiology, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Andrea Rossi
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
| |
Collapse
|
39
|
Zou J, Li Z, Carleton N, Oesterreich S, Lee AV, Tseng GC. Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.11.598484. [PMID: 38915481 PMCID: PMC11195192 DOI: 10.1101/2024.06.11.598484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Motivation Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g., cases versus controls) and are not directly applicable for studies with multi-class design (e.g., samples from multiple disease subtypes, treatments, tissues, or cell types). Results We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing MCC method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets. Availability https://github.com/jianzou75/MICA.
Collapse
Affiliation(s)
- Jian Zou
- Department of Statistics, School of Public Health, Chongqing Medical University, Chongqing, 400016, Chongqing, China
| | - Zheqi Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, 02215, Massachusetts, USA
- Department of Medicine, Harvard Medical School, Boston, 02215, Massachusetts, USA
| | - Neil Carleton
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, 15232, Pennsylvania, USA
- Magee-Womens Research Institute, Pittsburgh, 15213, Pennsylvania, USA
- Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - Steffi Oesterreich
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, 15232, Pennsylvania, USA
- Magee-Womens Research Institute, Pittsburgh, 15213, Pennsylvania, USA
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - Adrian V. Lee
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, 15232, Pennsylvania, USA
- Magee-Womens Research Institute, Pittsburgh, 15213, Pennsylvania, USA
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| |
Collapse
|
40
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
41
|
Margalit S, Tulpová Z, Detinis Zur T, Michaeli Y, Deek J, Nifker G, Haldar R, Gnatek Y, Omer D, Dekel B, Feldman HB, Grunwald A, Ebenstein Y. Long-Read Structural and Epigenetic Profiling of a Kidney Tumor-Matched Sample with Nanopore Sequencing and Optical Genome Mapping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.31.587463. [PMID: 38915648 PMCID: PMC11195078 DOI: 10.1101/2024.03.31.587463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor's structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.
Collapse
Affiliation(s)
- Sapir Margalit
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Zuzana Tulpová
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
- Institute of Experimental Botany of the Czech Academy of Sciences, Olomouc, Czech Republic
| | - Tahir Detinis Zur
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Yael Michaeli
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Jasline Deek
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Gil Nifker
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Rita Haldar
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Yehudit Gnatek
- Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
| | - Dorit Omer
- Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
| | - Benjamin Dekel
- Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
- Pediatric Nephrology Unit, The Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
- School of Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Hagit Baris Feldman
- School of Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- The Genetics Institute and Genomics Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Assaf Grunwald
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Yuval Ebenstein
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| |
Collapse
|
42
|
Grigorev K, Nelson TM, Overbey EG, Houerbi N, Kim J, Najjar D, Damle N, Afshin EE, Ryon KA, Thierry-Mieg J, Thierry-Mieg D, Melnick AM, Mateus J, Mason CE. Direct RNA sequencing of astronaut blood reveals spaceflight-associated m6A increases and hematopoietic transcriptional responses. Nat Commun 2024; 15:4950. [PMID: 38862496 PMCID: PMC11166648 DOI: 10.1038/s41467-024-48929-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/17/2024] [Indexed: 06/13/2024] Open
Abstract
The advent of civilian spaceflight challenges scientists to precisely describe the effects of spaceflight on human physiology, particularly at the molecular and cellular level. Newer, nanopore-based sequencing technologies can quantitatively map changes in chemical structure and expression at single molecule resolution across entire isoforms. We perform long-read, direct RNA nanopore sequencing, as well as Ultima high-coverage RNA-sequencing, of whole blood sampled longitudinally from four SpaceX Inspiration4 astronauts at seven timepoints, spanning pre-flight, day of return, and post-flight recovery. We report key genetic pathways, including changes in erythrocyte regulation, stress induction, and immune changes affected by spaceflight. We also present the first m6A methylation profiles for a human space mission, suggesting a significant spike in m6A levels immediately post-flight. These data and results represent the first longitudinal long-read RNA profiles and RNA modification maps for each gene for astronauts, improving our understanding of the human transcriptome's dynamic response to spaceflight.
Collapse
Affiliation(s)
- Kirill Grigorev
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Theodore M Nelson
- Department of Microbiology and Immunology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Eliah G Overbey
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Center for STEM, University of Austin, Austin, TX, USA
- BioAstra, Inc, New York, NY, USA
| | - Nadia Houerbi
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - JangKeun Kim
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Deena Najjar
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Namita Damle
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Evan E Afshin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Krista A Ryon
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information (NCBI), National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information (NCBI), National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Ari M Melnick
- Department of Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Jaime Mateus
- Space Exploration Technologies Corporation (SpaceX), Hawthorne, CA, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- WorldQuant Initiative for Quantitative Prediction, New York, NY, USA.
| |
Collapse
|
43
|
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, Robinson MD. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.17.553679. [PMID: 37645841 PMCID: PMC10462127 DOI: 10.1101/2023.08.17.553679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Motivation Although transcriptomics data is typically used to analyse mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g., healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, i.e., reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Results Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, versus state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. Availability and implementation DifferentialRegulation is distributed as a Bioconductor R package.
Collapse
Affiliation(s)
- Simone Tiberi
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Joël Meili
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Peiying Cai
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Charlotte Soneson
- Computational Biology Platform, Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Dongze He
- Department of Cell Biology and Molecular Genetics, University of Maryland, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, NJ, USA
| | - Alejandra Avalos-Pacheco
- Research Unit of Applied Statistics, TU Wien, Vienna, Austria
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Mark D Robinson
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
44
|
Hernandez SI, Berezin CT, Miller KM, Peccoud SJ, Peccoud J. Sequencing Strategy to Ensure Accurate Plasmid Assembly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.25.586694. [PMID: 38585828 PMCID: PMC10996661 DOI: 10.1101/2024.03.25.586694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Despite the wide use of plasmids in research and clinical production, the need to verify plasmid sequences is a bottleneck that is too often underestimated in the manufacturing process. Although sequencing platforms continue to improve, the method and assembly pipeline chosen still influence the final plasmid assembly sequence. Furthermore, few dedicated tools exist for plasmid assembly, especially for de novo assembly. Here, we evaluated short-read, long-read, and hybrid (both short and long reads) de novo assembly pipelines across three replicates of a 24-plasmid library. Consistent with previous characterizations of each sequencing technology, short-read assemblies had issues resolving GC-rich regions, and long-read assemblies commonly had small insertions and deletions, especially in repetitive regions. The hybrid approach facilitated the most accurate, consistent assembly generation and identified mutations relative to the reference sequence. Although Sanger sequencing can be used to verify specific regions, some GC-rich and repetitive regions were difficult to resolve using any method, suggesting that easily sequenced genetic parts should be prioritized in the design of new genetic constructs.
Collapse
Affiliation(s)
- Sarah I. Hernandez
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Casey-Tyler Berezin
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Katie M. Miller
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Samuel J. Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, 80523, United States of America
| |
Collapse
|
45
|
Li W, Miller D, Liu X, Tosi L, Chkaiban L, Mei H, Hung PH, Parekkadan B, Sherlock G, Levy S. Arrayed in vivo barcoding for multiplexed sequence verification of plasmid DNA and demultiplexing of pooled libraries. Nucleic Acids Res 2024; 52:e47. [PMID: 38709890 PMCID: PMC11162764 DOI: 10.1093/nar/gkae332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/23/2024] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Sequence verification of plasmid DNA is critical for many cloning and molecular biology workflows. To leverage high-throughput sequencing, several methods have been developed that add a unique DNA barcode to individual samples prior to pooling and sequencing. However, these methods require an individual plasmid extraction and/or in vitro barcoding reaction for each sample processed, limiting throughput and adding cost. Here, we develop an arrayed in vivo plasmid barcoding platform that enables pooled plasmid extraction and library preparation for Oxford Nanopore sequencing. This method has a high accuracy and recovery rate, and greatly increases throughput and reduces cost relative to other plasmid barcoding methods or Sanger sequencing. We use in vivo barcoding to sequence verify >45 000 plasmids and show that the method can be used to transform error-containing dispersed plasmid pools into sequence-perfect arrays or well-balanced pools. In vivo barcoding does not require any specialized equipment beyond a low-overhead Oxford Nanopore sequencer, enabling most labs to flexibly process hundreds to thousands of plasmids in parallel.
Collapse
Affiliation(s)
- Weiyi Li
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Darach Miller
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Xianan Liu
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Lorenzo Tosi
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Lamia Chkaiban
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Han Mei
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Po-Hsiang Hung
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Biju Parekkadan
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Gavin Sherlock
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Sasha F Levy
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| |
Collapse
|
46
|
Shelton WJ, Zandpazandi S, Nix JS, Gokden M, Bauer M, Ryan KR, Wardell CP, Vaske OM, Rodriguez A. Long-read sequencing for brain tumors. Front Oncol 2024; 14:1395985. [PMID: 38915364 PMCID: PMC11194609 DOI: 10.3389/fonc.2024.1395985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/27/2024] [Indexed: 06/26/2024] Open
Abstract
Brain tumors and genomics have a long-standing history given that glioblastoma was the first cancer studied by the cancer genome atlas. The numerous and continuous advances through the decades in sequencing technologies have aided in the advanced molecular characterization of brain tumors for diagnosis, prognosis, and treatment. Since the implementation of molecular biomarkers by the WHO CNS in 2016, the genomics of brain tumors has been integrated into diagnostic criteria. Long-read sequencing, also known as third generation sequencing, is an emerging technique that allows for the sequencing of longer DNA segments leading to improved detection of structural variants and epigenetics. These capabilities are opening a way for better characterization of brain tumors. Here, we present a comprehensive summary of the state of the art of third-generation sequencing in the application for brain tumor diagnosis, prognosis, and treatment. We discuss the advantages and potential new implementations of long-read sequencing into clinical paradigms for neuro-oncology patients.
Collapse
Affiliation(s)
- William J. Shelton
- Department of Neurosurgery, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Sara Zandpazandi
- Department of Neurosurgery, Medical University of South Carolina, Charleston, SC, United States
| | - J Stephen Nix
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Murat Gokden
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Michael Bauer
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Katie Rose Ryan
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Christopher P. Wardell
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Olena Morozova Vaske
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Analiz Rodriguez
- Department of Neurosurgery, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| |
Collapse
|
47
|
Raabe K, Sun L, Schindfessel C, Honys D, Geelen D. A word of caution: T-DNA-associated mutagenesis in plant reproduction research. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:3248-3258. [PMID: 38477707 DOI: 10.1093/jxb/erae114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 03/12/2024] [Indexed: 03/14/2024]
Abstract
T-DNA transformation is prevalent in Arabidopsis research and has expanded to a broad range of crops and model plants. While major progress has been made in optimizing the Agrobacterium-mediated transformation process for various species, a variety of pitfalls associated with the T-DNA insertion may lead to the misinterpretation of T-DNA mutant analysis. Indeed, secondary mutagenesis either on the integration site or elsewhere in the genome, together with epigenetic interactions between T-DNA inserts or frequent genomic rearrangements, can be tricky to differentiate from the effect of the knockout of the gene of interest. These are mainly the case for genomic rearrangements that become balanced in filial generations without consequential phenotypical defects, which may be confusing particularly for studies that aim to investigate fertility and gametogenesis. As a cautionary note to the plant research community studying gametogenesis, we here report an overview of the consequences of T-DNA-induced secondary mutagenesis with emphasis on the genomic imbalance on gametogenesis. Additionally, we present a simple guideline to evaluate the T-DNA-mutagenized transgenic lines to decrease the risk of faulty analysis with minimal experimental effort.
Collapse
Affiliation(s)
- Karel Raabe
- Laboratory of Pollen Biology, Institute of Experimental Botany of the Czech Academy of Sciences, Rozvojová 263, 165 02 Prague 6, Czech Republic
- Department of Experimental Plant Biology, Faculty of Science, Charles University, Viničná 5, 128 44 Prague 2, Czech Republic
| | - Limin Sun
- Horticell, Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Cédric Schindfessel
- Horticell, Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - David Honys
- Laboratory of Pollen Biology, Institute of Experimental Botany of the Czech Academy of Sciences, Rozvojová 263, 165 02 Prague 6, Czech Republic
| | - Danny Geelen
- Horticell, Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| |
Collapse
|
48
|
Audo I, Nassisi M, Zeitz C, Sahel JA. The Extraordinary Phenotypic and Genetic Variability of Retinal and Macular Degenerations: The Relevance to Therapeutic Developments. Cold Spring Harb Perspect Med 2024; 14:a041652. [PMID: 37604589 PMCID: PMC11146306 DOI: 10.1101/cshperspect.a041652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Inherited retinal diseases (IRDs) are a clinically and genetically heterogeneous group of rare conditions leading to various degrees of visual handicap and to progressive blindness in more severe cases. Besides visual rehabilitation, educational, and socio-professional support, there are currently limited therapeutic options, but the approval of the first gene therapy product for RPE65-related IRDs raised hope for therapeutic innovations. Such developments are facing obstacles intrinsic to the disease and the affected tissue including the extreme phenotypic and genetic variability of IRDs and the fine tuning of visual processing through the complex architecture of the postmitotic neural retina. A precise phenotypic characterization is required prior to genetic testing, which now relies on high-throughput sequencing. Their challenges will be discussed within this article as well as their implications in clinical trial design.
Collapse
Affiliation(s)
- Isabelle Audo
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris 75012, France
- Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, National Rare Disease Center REFERET and INSERM-DGOS CIC 1423, Paris F-75012, France
| | - Marco Nassisi
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris 75012, France
- Department of Clinical Sciences and Community Health, University of Milan, Milan 20122, Italy
- Ophthalmology Unit, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, Milan 20122, Italy
| | - Christina Zeitz
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris 75012, France
| | - José-Alain Sahel
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris 75012, France
- Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, National Rare Disease Center REFERET and INSERM-DGOS CIC 1423, Paris F-75012, France
- Department of Ophthalmology, University of Pittsburgh Medical School, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
49
|
Wu CH, Zhou X, Chen M. The curses of performing differential expression analysis using single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596315. [PMID: 38853843 PMCID: PMC11160624 DOI: 10.1101/2024.05.28.596315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Differential expression analysis is pivotal in single-cell transcriptomics for unraveling cell-type- specific responses to stimuli. While numerous methods are available to identify differentially expressed genes in single-cell data, recent evaluations of both single-cell-specific methods and methods adapted from bulk studies have revealed significant shortcomings in performance. In this paper, we dissect the four major challenges in single-cell DE analysis: normalization, excessive zeros, donor effects, and cumulative biases. These "curses" underscore the limitations and conceptual pitfalls in existing workflows. In response, we introduce a novel paradigm addressing several of these issues.
Collapse
|
50
|
Yao B, Hsu C, Goldner G, Michaeli Y, Ebenstein Y, Listgarten J. Effective training of nanopore callers for epigenetic marks with limited labelled data. Open Biol 2024; 14:230449. [PMID: 38862018 DOI: 10.1098/rsob.230449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/04/2024] [Indexed: 06/13/2024] Open
Abstract
Nanopore sequencing platforms combined with supervised machine learning (ML) have been effective at detecting base modifications in DNA such as 5-methylcytosine (5mC) and N6-methyladenine (6mA). These ML-based nanopore callers have typically been trained on data that span all modifications on all possible DNA [Formula: see text]-mer backgrounds-a complete training dataset. However, as nanopore technology is pushed to more and more epigenetic modifications, such complete training data will not be feasible to obtain. Nanopore calling has historically been performed with hidden Markov models (HMMs) that cannot make successful calls for [Formula: see text]-mer contexts not seen during training because of their independent emission distributions. However, deep neural networks (DNNs), which share parameters across contexts, are increasingly being used as callers, often outperforming their HMM cousins. It stands to reason that a DNN approach should be able to better generalize to unseen [Formula: see text]-mer contexts. Indeed, herein we demonstrate that a common DNN approach (DeepSignal) outperforms a common HMM approach (Nanopolish) in the incomplete data setting. Furthermore, we propose a novel hybrid HMM-DNN approach, amortized-HMM, that outperforms both the pure HMM and DNN approaches on 5mC calling when the training data are incomplete. This type of approach is expected to be useful for calling other base modifications such as 5-hydroxymethylcytosine and for the simultaneous calling of different modifications, settings in which complete training data are not likely to be available.
Collapse
Affiliation(s)
- Brian Yao
- Department of Electrical Engineering & Computer Sciences, University of California , Berkeley, CA 94720, USA
| | - Chloe Hsu
- Department of Electrical Engineering & Computer Sciences, University of California , Berkeley, CA 94720, USA
| | - Gal Goldner
- Department of Chemical Physics, Tel Aviv University , Tel Aviv-Yafo, Israel
| | - Yael Michaeli
- Department of Chemical Physics, Tel Aviv University , Tel Aviv-Yafo, Israel
| | - Yuval Ebenstein
- Department of Chemical Physics, Tel Aviv University , Tel Aviv-Yafo, Israel
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University , Tel Aviv-Yafo, Israel
| | - Jennifer Listgarten
- Department of Electrical Engineering & Computer Sciences, University of California , Berkeley, CA 94720, USA
- Center for Computational Biology, University of California , Berkeley, CA 94720, USA
| |
Collapse
|