1
|
Qin Y, Maggio A, Hawkins D, Beaudry L, Kim A, Pan D, Gong T, Fu Y, Yang H, Deng Y. Whole-genome bisulfite sequencing data analysis learning module on Google Cloud Platform. Brief Bioinform 2024; 25:bbae236. [PMID: 39041913 PMCID: PMC11264297 DOI: 10.1093/bib/bbae236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/26/2024] [Accepted: 05/03/2024] [Indexed: 07/24/2024] Open
Abstract
This study describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module is designed to facilitate interactive learning of whole-genome bisulfite sequencing (WGBS) data analysis utilizing cloud-based tools in Google Cloud Platform, such as Cloud Storage, Vertex AI notebooks and Google Batch. WGBS is a powerful technique that can provide comprehensive insights into DNA methylation patterns at single cytosine resolution, essential for understanding epigenetic regulation across the genome. The designed learning module first provides step-by-step tutorials that guide learners through two main stages of WGBS data analysis, preprocessing and the identification of differentially methylated regions. And then, it provides a streamlined workflow and demonstrates how to effectively use it for large datasets given the power of cloud infrastructure. The integration of these interconnected submodules progressively deepens the user's understanding of the WGBS analysis process along with the use of cloud resources. Through this module, we can enhance the accessibility and adoption of cloud computing in epigenomic research, speeding up the advancements in the related field and beyond. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
Collapse
Affiliation(s)
- Yujia Qin
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, 651 Ilalo Street, Honolulu, HI 96813, United States
| | - Angela Maggio
- Health Data and AI, Deloitte Consulting LLP, 1919 N. Lynn Street, Arlington VA 22209, United States
| | - Dale Hawkins
- Google Cloud, 1900 Reston Metro Plaza, Reston, VA 20190, United States
| | - Laura Beaudry
- Google Cloud, 1900 Reston Metro Plaza, Reston, VA 20190, United States
| | - Allen Kim
- Google Cloud, 1900 Reston Metro Plaza, Reston, VA 20190, United States
| | - Daniel Pan
- Health Data and AI, Deloitte Consulting LLP, 1919 N. Lynn Street, Arlington VA 22209, United States
| | - Ting Gong
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, 651 Ilalo Street, Honolulu, HI 96813, United States
| | - Yuanyuan Fu
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, 651 Ilalo Street, Honolulu, HI 96813, United States
| | - Hua Yang
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, 651 Ilalo Street, Honolulu, HI 96813, United States
| | - Youping Deng
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, 651 Ilalo Street, Honolulu, HI 96813, United States
| |
Collapse
|
2
|
Yuditskiy K, Bezdvornykh I, Kazantseva A, Kanapin A, Samsonova A. BSXplorer: analytical framework for exploratory analysis of BS-seq data. BMC Bioinformatics 2024; 25:96. [PMID: 38438881 PMCID: PMC10913661 DOI: 10.1186/s12859-024-05722-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 02/27/2024] [Indexed: 03/06/2024] Open
Abstract
BACKGROUND Bisulfite sequencing detects and quantifies DNA methylation patterns, contributing to our understanding of gene expression regulation, genome stability maintenance, conservation of epigenetic mechanisms across divergent taxa, epigenetic inheritance and, eventually, phenotypic variation. Graphical representation of methylation data is crucial in exploring epigenetic regulation on a genome-wide scale in both plants and animals. This is especially relevant for non-model organisms with poorly annotated genomes and/or organisms where genome sequences are not yet assembled on chromosome level. Despite being a technology of choice to profile DNA methylation for many years now there are surprisingly few lightweight and robust standalone tools available for efficient graphical analysis of data in non-model systems. This significantly limits evolutionary studies and agrigenomics research. BSXplorer is a tool specifically developed to fill this gap and assist researchers in explorative data analysis and in visualising and interpreting bisulfite sequencing data more easily. RESULTS BSXplorer provides in-depth graphical analysis of sequencing data encompassing (a) profiling of methylation levels in metagenes or in user-defined regions using line plots and heatmaps, generation of summary statistics charts, (b) enabling comparative analyses of methylation patterns across experimental samples, methylation contexts and species, and (c) identification of modules sharing similar methylation signatures at functional genomic elements. The tool processes methylation data quickly and offers API and CLI capabilities, along with the ability to create high-quality figures suitable for publication. CONCLUSIONS BSXplorer facilitates efficient methylation data mining, contrasting and visualization, making it an easy-to-use package that is highly useful for epigenetic research.
Collapse
Affiliation(s)
- Konstantin Yuditskiy
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia, 199004
| | - Igor Bezdvornykh
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia, 199004
| | - Anastasiya Kazantseva
- Laboratory of Neurocognitive Genomics, Department of Genetics and Fundamental Medicine, Ufa University of Science and Technology, Ufa, Russia, 450076
| | - Alexander Kanapin
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia, 199004
| | - Anastasia Samsonova
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia, 199004.
| |
Collapse
|
3
|
Igder S, Zamani M, Fakher S, Siri M, Ashktorab H, Azarpira N, Mokarram P. Circulating Nucleic Acids in Colorectal Cancer: Diagnostic and Prognostic Value. DISEASE MARKERS 2024; 2024:9943412. [PMID: 38380073 PMCID: PMC10878755 DOI: 10.1155/2024/9943412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 01/07/2024] [Accepted: 01/25/2024] [Indexed: 02/22/2024]
Abstract
Colorectal cancer (CRC) is the third most prevalent cancer in the world and the fourth leading cause of cancer-related mortality. DNA (cfDNA/ctDNA) and RNA (cfRNA/ctRNA) in the blood are promising noninvasive biomarkers for molecular profiling, screening, diagnosis, treatment management, and prognosis of CRC. Technological advancements that enable precise detection of both genetic and epigenetic abnormalities, even in minute quantities in circulation, can overcome some of these challenges. This review focuses on testing for circulating nucleic acids in the circulation as a noninvasive method for CRC detection, monitoring, detection of minimal residual disease, and patient management. In addition, the benefits and drawbacks of various diagnostic techniques and associated bioinformatics tools have been detailed.
Collapse
Affiliation(s)
- Somayeh Igder
- Department of Clinical Biochemistry, School of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Mozhdeh Zamani
- Autophagy Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Biochemistry, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Shima Fakher
- Department of Biochemistry, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Morvarid Siri
- Autophagy Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Hassan Ashktorab
- Department of Medicine, Gastroenterology Division and Cancer Center, Howard University College of Medicine, Washington, DC, USA
| | - Negar Azarpira
- Autophagy Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Pooneh Mokarram
- Autophagy Research Center, Department of Biochemistry, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
4
|
Stuart T, Buckberry S, Nguyen TV, Lister R. Approaches for the Analysis and Interpretation of Whole-Genome Bisulfite Sequencing Data. Methods Mol Biol 2024; 2842:391-403. [PMID: 39012607 DOI: 10.1007/978-1-0716-4051-7_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
DNA methylation is a covalent modification of DNA that plays important roles in processes such as the regulation of gene expression, transcription factor binding, and suppression of transposable elements. The use of whole-genome bisulfite sequencing (WGBS) enables the genome-wide identification and quantification of DNA methylation patterns at single-base resolution and is the gold standard for the analysis of DNA methylation. However, the computational analysis of WGBS data can be particularly challenging, as many computationally intensive steps are required. Here, we outline step-by-step an approach for the analysis and interpretation of WGBS data. First, sequencing reads must be trimmed, quality-checked, and aligned to the genome. Second, DNA methylation levels are estimated at each cytosine position using the aligned sequence reads of the bisulfite-treated DNA. Third, regions of differential cytosine methylation between samples can be identified. Finally, these data need to be visualized and interpreted in the context of the biological question at hand.
Collapse
Affiliation(s)
- Tim Stuart
- Australian Research Council Centres of Excellence in Plant Energy Biology and Plants for Space, School of Molecular Sciences, The University of Western Australia, Crawley, WA, Australia
| | - Sam Buckberry
- Australian Research Council Centres of Excellence in Plant Energy Biology and Plants for Space, School of Molecular Sciences, The University of Western Australia, Crawley, WA, Australia
- Harry Perkins Institute of Medical Research, Nedlands, WA, Australia
| | - Trung Viet Nguyen
- Australian Research Council Centres of Excellence in Plant Energy Biology and Plants for Space, School of Molecular Sciences, The University of Western Australia, Crawley, WA, Australia
- Harry Perkins Institute of Medical Research, Nedlands, WA, Australia
| | - Ryan Lister
- Australian Research Council Centres of Excellence in Plant Energy Biology and Plants for Space, School of Molecular Sciences, The University of Western Australia, Crawley, WA, Australia.
- Harry Perkins Institute of Medical Research, Nedlands, WA, Australia.
| |
Collapse
|
5
|
Bhattacharyya S, Ehsan SF, Karacosta LG. Phenotypic maps for precision medicine: a promising systems biology tool for assessing therapy response and resistance at a personalized level. FRONTIERS IN NETWORK PHYSIOLOGY 2023; 3:1256104. [PMID: 37964768 PMCID: PMC10642209 DOI: 10.3389/fnetp.2023.1256104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/28/2023] [Indexed: 11/16/2023]
Abstract
In this perspective we discuss how tumor heterogeneity and therapy resistance necessitate a focus on more personalized approaches, prompting a shift toward precision medicine. At the heart of the shift towards personalized medicine, omics-driven systems biology becomes a driving force as it leverages high-throughput technologies and novel bioinformatics tools. These enable the creation of systems-based maps, providing a comprehensive view of individual tumor's functional plasticity. We highlight the innovative PHENOSTAMP program, which leverages high-dimensional data to construct a visually intuitive and user-friendly map. This map was created to encapsulate complex transitional states in cancer cells, such as Epithelial-Mesenchymal Transition (EMT) and Mesenchymal-Epithelial Transition (MET), offering a visually intuitive way to understand disease progression and therapeutic responses at single-cell resolution in relation to EMT-related single-cell phenotypes. Most importantly, PHENOSTAMP functions as a reference map, which allows researchers and clinicians to assess one clinical specimen at a time in relation to their phenotypic heterogeneity, setting the foundation on constructing phenotypic maps for personalized medicine. This perspective argues that such dynamic predictive maps could also catalyze the development of personalized cancer treatment. They hold the potential to transform our understanding of cancer biology, providing a foundation for a future where therapy is tailored to each patient's unique molecular and cellular tumor profile. As our knowledge of cancer expands, these maps can be continually refined, ensuring they remain a valuable tool in precision oncology.
Collapse
Affiliation(s)
- Sayantan Bhattacharyya
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Shafqat F. Ehsan
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
- Department of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Loukia G. Karacosta
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
6
|
Lehle JD, McCarrey JR. Accelerating the alignment processing speed of the comprehensive end-to-end whole-genome bisulfite sequencing pipeline, wg-blimp. Biol Methods Protoc 2023; 8:bpad012. [PMID: 37431446 PMCID: PMC10329742 DOI: 10.1093/biomethods/bpad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/12/2023] Open
Abstract
Analyzing whole-genome bisulfite and related sequencing datasets is a time-intensive process due to the complexity and size of the input raw sequencing files and lengthy read alignment step requiring correction for conversion of all unmethylated Cs to Ts genome-wide. The objective of this study was to modify the read alignment algorithm associated with the whole-genome bisulfite sequencing methylation analysis pipeline (wg-blimp) to shorten the time required to complete this phase while retaining overall read alignment accuracy. Here, we report an update to the recently published pipeline wg-blimp achieved by replacing the use of the bwa-meth aligner with the faster gemBS aligner. This improvement to the wg-blimp pipeline has led to a more than ×7 acceleration in the processing speed of samples when scaled to larger publicly available FASTQ datasets containing 80-160 million reads while maintaining nearly identical accuracy of properly mapped reads when compared with data from the previous pipeline. The modifications to the wg-blimp pipeline reported here merge the speed and accuracy of the gemBS aligner with the comprehensive analysis and data visualization assets of the wg-blimp pipeline to provide a significantly accelerated workflow that can produce high-quality data much more rapidly without compromising read accuracy at the expense of increasing RAM requirements up to 48 GB.
Collapse
Affiliation(s)
- Jake D Lehle
- Correspondence address. Department of Neurosciences, Developmental and Regenerative Biology, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX 78249, USA. Tel: +1 (512)-992-8144; E-mail:
| | - John R McCarrey
- Department of Neuroscience, Developmental and Regenerative Biology, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| |
Collapse
|