1
|
Privitera GF, Alaimo S, Caruso A, Ferro A, Forte S, Pulvirenti A. TMBcalc: a computational pipeline for identifying pan-cancer Tumor Mutational Burden gene signatures. Front Genet 2024; 15:1285305. [PMID: 38645485 PMCID: PMC11026579 DOI: 10.3389/fgene.2024.1285305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/11/2024] [Indexed: 04/23/2024] Open
Abstract
Background In the precision medicine era, identifying predictive factors to select patients most likely to benefit from treatment with immunological agents is a crucial and open challenge in oncology. Methods This paper presents a pan-cancer analysis of Tumor Mutational Burden (TMB). We developed a novel computational pipeline, TMBcalc, to calculate the TMB. Our methodology can identify small and reliable gene signatures to estimate TMB from custom targeted-sequencing panels. For this purpose, our pipeline has been trained on top of 17 cancer types data obtained from TCGA. Results Our results show that TMB, computed through the identified signature, strongly correlates with TMB obtained from whole-exome sequencing (WES). Conclusion We have rigorously analyzed the effectiveness of our methodology on top of several independent datasets. In particular we conducted a comprehensive testing on: (i) 126 samples sourced from the TCGA database; few independent whole-exome sequencing (WES) datasets linked to colon, breast, and liver cancers, all acquired from the EGA and the ICGC Data Portal. This rigorous evaluation clearly highlights the robustness and practicality of our approach, positioning it as a promising avenue for driving substantial progress within the realm of clinical practice.
Collapse
Affiliation(s)
- Grete Francesca Privitera
- Department of Clinical and Experimental Medicine, Bioinformatics Unit, University of Catania, Catania, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, Bioinformatics Unit, University of Catania, Catania, Italy
| | - Anna Caruso
- Department of Physics and Astronomy, University of Catania, Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, Bioinformatics Unit, University of Catania, Catania, Italy
| | - Stefano Forte
- Istituto Oncologico del Mediterraneo (IOM) Ricerca, Viagrande, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, Bioinformatics Unit, University of Catania, Catania, Italy
| |
Collapse
|
2
|
Ko S, Kim J, Lim J, Lee SM, Park JY, Woo J, Scott-Nevros ZK, Kim JR, Yoon H, Kim D. Blanket antimicrobial resistance gene database with structural information, BOARDS, provides insights on historical landscape of resistance prevalence and effects of mutations in enzyme structure. mSystems 2024; 9:e0094323. [PMID: 38085058 PMCID: PMC10871167 DOI: 10.1128/msystems.00943-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/02/2023] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance (AMR) in pathogenic bacteria poses a significant threat to public health, yet there is still a need for development in the tools to deeply understand AMR genes based on genetic or structural information. In this study, we present an interactive web database named Blanket Overarching Antimicrobial-Resistance gene Database with Structural information (BOARDS, sbml.unist.ac.kr), a database that comprehensively includes 3,943 reported AMR gene information for 1,997 extended spectrum beta-lactamase (ESBL) and 1,946 other genes as well as a total of 27,395 predicted protein structures. These structures, which include both wild-type AMR genes and their mutants, were derived from 80,094 publicly available whole-genome sequences. In addition, we developed the rapid analysis and detection tool of antimicrobial-resistance (RADAR), a one-stop analysis pipeline to detect AMR genes across whole-genome sequencing (WGSs). By integrating BOARDS and RADAR, the AMR prevalence landscape for eight multi-drug resistant pathogens was reconstructed, leading to unexpected findings such as the pre-existence of the MCR genes before their official reports. Enzymatic structure prediction-based analysis revealed that the occurrence of mutations found in some ESBL genes was found to be closely related to the binding affinities with their antibiotic substrates. Overall, BOARDS can play a significant role in performing in-depth analysis on AMR.IMPORTANCEWhile the increasing antibiotic resistance (AMR) in pathogen has been a burden on public health, effective tools for deep understanding of AMR based on genetic or structural information remain limited. In this study, a blanket overarching antimicrobial-resistance gene database with structure information (BOARDS)-a web-based database that comprehensively collected AMR gene data with predictive protein structural information was constructed. Additionally, we report the development of a RADAR pipeline that can analyze whole-genome sequences as well. BOARDS, which includes sequence and structural information, has shown the historical landscape and prevalence of the AMR genes and can provide insight into single-nucleotide polymorphism effects on antibiotic degrading enzymes within protein structures.
Collapse
Affiliation(s)
- Seyoung Ko
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
- School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jaehyung Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jaewon Lim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Sang-Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jihoon Woo
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Zoe K. Scott-Nevros
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jong R. Kim
- School of Engineering and Digital Sciences, Nazarbayev University, Astan, Kazakhstan
| | - Hyunjin Yoon
- Department of Molecular Science and Technology, Ajou University, Suwon, South Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
- School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| |
Collapse
|
3
|
Lehle JD, McCarrey JR. Accelerating the alignment processing speed of the comprehensive end-to-end whole-genome bisulfite sequencing pipeline, wg-blimp. Biol Methods Protoc 2023; 8:bpad012. [PMID: 37431446 PMCID: PMC10329742 DOI: 10.1093/biomethods/bpad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/12/2023] Open
Abstract
Analyzing whole-genome bisulfite and related sequencing datasets is a time-intensive process due to the complexity and size of the input raw sequencing files and lengthy read alignment step requiring correction for conversion of all unmethylated Cs to Ts genome-wide. The objective of this study was to modify the read alignment algorithm associated with the whole-genome bisulfite sequencing methylation analysis pipeline (wg-blimp) to shorten the time required to complete this phase while retaining overall read alignment accuracy. Here, we report an update to the recently published pipeline wg-blimp achieved by replacing the use of the bwa-meth aligner with the faster gemBS aligner. This improvement to the wg-blimp pipeline has led to a more than ×7 acceleration in the processing speed of samples when scaled to larger publicly available FASTQ datasets containing 80-160 million reads while maintaining nearly identical accuracy of properly mapped reads when compared with data from the previous pipeline. The modifications to the wg-blimp pipeline reported here merge the speed and accuracy of the gemBS aligner with the comprehensive analysis and data visualization assets of the wg-blimp pipeline to provide a significantly accelerated workflow that can produce high-quality data much more rapidly without compromising read accuracy at the expense of increasing RAM requirements up to 48 GB.
Collapse
Affiliation(s)
- Jake D Lehle
- Correspondence address. Department of Neurosciences, Developmental and Regenerative Biology, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX 78249, USA. Tel: +1 (512)-992-8144; E-mail:
| | - John R McCarrey
- Department of Neuroscience, Developmental and Regenerative Biology, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| |
Collapse
|
4
|
Ferreira FA, He Q, Banning S, Roberts‐Sano O, Wilkins O, Kuritzkes DR, Tsibris A. HIV-1 proviral landscape characterization varies by pipeline analysis. J Int AIDS Soc 2021; 24:e25725. [PMID: 34235860 PMCID: PMC8264403 DOI: 10.1002/jia2.25725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 03/25/2021] [Accepted: 04/15/2021] [Indexed: 11/09/2022] Open
Abstract
INTRODUCTION HIV rebounds after cessation of antiretroviral therapy, representing a barrier to cure. To better understand the virus reservoir, analysis pipelines have been developed that categorize proviral sequences as intact or defective, and further determine the precise nature of the sequence defects that may be present. We investigated the effects that different analysis pipelines had on the characterization of HIV-1 proviral sequences. METHODS We used single genome amplification to generate near full-length (NFL) HIV-1 proviral DNA sequences, defined as amplicons greater than 8000 base pairs in length, isolated from peripheral blood mononuclear cells (PBMC) of treated suppressed participants with HIV-1. Amplicons underwent direct next-generation single genome sequencing and were analysed using four HIV-1 proviral characterization pipelines. Sequences were characterized as intact or defective; defective sequences were assessed for the number and types of defects present. To confirm and extend our findings, 691 proviruses from the Proviral Sequence Database (PSD) were analysed and the ProSeq-IT tool of the PSD was used to characterize both the participant and PSD proviruses. RESULTS AND DISCUSSION Virus sequences derived from thirteen ART-treated virologically suppressed participants with HIV were studied. A total of 693 HIV-1 proviral sequences were generated, 282 of which were NFL. An average of 53 sequences per participant was analysed. We found that proviruses often harbour multiple sequence defect types (mean 2.7, 95% confidence interval [CI] 2.5, 3.0); the elimination order used by each pipeline affected the percentage of proviruses allotted into each defect category. These differences varied between participants, depending on the number of defect categories present in a given provirus sequence. Pipeline-specific differences in characterizing the HIV-1 5' untranslated region (5' UTR) led to an overestimation of the number of intact NFL proviral sequences, a finding corroborated in the independent PSD analysis. A comparison of the four published pipelines to ProSeq-IT found that ProSeq IT was more likely to characterize proviruses as intact. CONCLUSIONS The choice of pipeline used for HIV-1 provirus landscape analysis may bias the classification of defective sequences. To improve the comparison of provirus characterizations across research groups, the development of a consensus elimination pipeline should be prioritized.
Collapse
Affiliation(s)
- Fernanda A Ferreira
- Virology ProgramGraduate School of Arts and SciencesHarvard UniversityCambridgeMAUSA
- Division of Infectious DiseasesBrigham and Woman’s HospitalBostonMAUSA
| | - Qianjing He
- Division of Infectious DiseasesBrigham and Woman’s HospitalBostonMAUSA
| | - Stephanie Banning
- Division of Infectious DiseasesBrigham and Woman’s HospitalBostonMAUSA
| | | | - Olivia Wilkins
- Division of Infectious DiseasesBrigham and Woman’s HospitalBostonMAUSA
| | - Daniel R. Kuritzkes
- Division of Infectious DiseasesBrigham and Woman’s HospitalBostonMAUSA
- Harvard Medical SchoolBostonMAUSA
| | - Athe Tsibris
- Division of Infectious DiseasesBrigham and Woman’s HospitalBostonMAUSA
- Harvard Medical SchoolBostonMAUSA
| |
Collapse
|
5
|
Chang Y, Fan Q, Hou J, Zhang Y, Li J. A community-supported metaproteomic pipeline for improving peptide identifications in hydrothermal vent microbiota. Brief Bioinform 2021; 22:6214661. [PMID: 33834201 DOI: 10.1093/bib/bbab052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 01/23/2021] [Accepted: 02/02/2021] [Indexed: 11/12/2022] Open
Abstract
Microorganisms in deep-sea hydrothermal vents provide valuable insights into life under extreme conditions. Mass spectrometry-based proteomics has been widely used to identify protein expression and function. However, the metaproteomic studies in deep-sea microbiota have been constrained largely by the low identification rates of protein or peptide. To improve the efficiency of metaproteomics for hydrothermal vent microbiota, we firstly constructed a microbial gene database (HVentDB) based on 117 public metagenomic samples from hydrothermal vents and proposed a metaproteomic analysis strategy, which takes the advantages of not only the sample-matched metagenome, but also the metagenomic information released publicly in the community of hydrothermal vents. A two-stage false discovery rate method was followed up to control the risk of false positive. By applying our community-supported strategy to a hydrothermal vent sediment sample, about twice as many peptides were identified when compared with the ways against the sample-matched metagenome or the public reference database. In addition, more enriched and explainable taxonomic and functional profiles were detected by the HVentDB-based approach exclusively, as well as many important proteins involved in methane, amino acid, sugar, glycan metabolism and DNA repair, etc. The new metaproteomic analysis strategy will enhance our understanding of microbiota, including their lifestyles and metabolic capabilities in extreme environments. The database HVentDB is freely accessible from http://lilab.life.sjtu.edu.cn:8080/HventDB/main.html.
Collapse
Affiliation(s)
- Yafei Chang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qilian Fan
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Jialin Hou
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yu Zhang
- School of Oceanography, Shanghai Jiao Tong University, Shanghai, China
| | - Jing Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
6
|
Ayrolles A, Brun F, Chen P, Djalovski A, Beauxis Y, Delorme R, Bourgeron T, Dikker S, Dumas G. HyPyP: a Hyperscanning Python Pipeline for inter-brain connectivity analysis. Soc Cogn Affect Neurosci 2021; 16:72-83. [PMID: 33031496 PMCID: PMC7812632 DOI: 10.1093/scan/nsaa141] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 09/22/2020] [Accepted: 10/07/2020] [Indexed: 12/24/2022] Open
Abstract
The bulk of social neuroscience takes a 'stimulus-brain' approach, typically comparing brain responses to different types of social stimuli, but most of the time in the absence of direct social interaction. Over the last two decades, a growing number of researchers have adopted a 'brain-to-brain' approach, exploring similarities between brain patterns across participants as a novel way to gain insight into the social brain. This methodological shift has facilitated the introduction of naturalistic social stimuli into the study design (e.g. movies) and, crucially, has spurred the development of new tools to directly study social interaction, both in controlled experimental settings and in more ecologically valid environments. Specifically, 'hyperscanning' setups, which allow the simultaneous recording of brain activity from two or more individuals during social tasks, has gained popularity in recent years. However, currently, there is no agreed-upon approach to carry out such 'inter-brain connectivity analysis', resulting in a scattered landscape of analysis techniques. To accommodate a growing demand to standardize analysis approaches in this fast-growing research field, we have developed Hyperscanning Python Pipeline, a comprehensive and easy open-source software package that allows (social) neuroscientists to carry-out and to interpret inter-brain connectivity analyses.
Collapse
Affiliation(s)
- Anaël Ayrolles
- Department of Neuroscience, Institut Pasteur, Paris, France
- Child and Adolescent Psychiatry Department, Assistance Publique - Hôpitaux de Paris, Robert Debré Hospital, Paris, France
| | - Florence Brun
- Department of Neuroscience, Institut Pasteur, Paris, France
| | - Phoebe Chen
- Department of Psychology, New York University, New York City, USA
| | - Amir Djalovski
- Baruch Ivcher School of Psychology, Center for Developmental Social Neuroscience, Interdiscilinary Center Herzliya, Baruch Ivcher School of Psychology, Herzliya, Israel
- Department of Psychology, Bar-Ilan University, Ramat Gan, Israel
| | - Yann Beauxis
- Department of Neuroscience, Institut Pasteur, Paris, France
| | - Richard Delorme
- Department of Neuroscience, Institut Pasteur, Paris, France
- Child and Adolescent Psychiatry Department, Assistance Publique - Hôpitaux de Paris, Robert Debré Hospital, Paris, France
| | | | - Suzanne Dikker
- Department of Psychology, New York University, New York City, USA
- Department of Clinical Psychology, Free University Amsterdam, Amsterdam, The Netherlands
| | - Guillaume Dumas
- Department of Neuroscience, Institut Pasteur, Paris, France
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Center for Complex Systems and Brain Sciences, Boca Raton, FL, USA
- Departement of Psychiatry, Université de Montréal, Montreal, QC, Canada
- Precision Psychiatry and Social Physiology laboratory, CHU Sainte-Justine Centre de Recherche, Precision Psychiatry and Social Physiology Laboratory, Montreal, QC, Canada
| |
Collapse
|
7
|
Schönung M, Hess J, Bawidamann P, Stäble S, Hey J, Langstein J, Assenov Y, Weichenhan D, Lutsik P, Lipka DB. AmpliconDesign - an interactive web server for the design of high-throughput targeted DNA methylation assays. Epigenetics 2020; 16:933-939. [PMID: 33100132 DOI: 10.1080/15592294.2020.1834921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Targeted analysis of DNA methylation patterns based on bisulfite-treated genomic DNA (BT-DNA) is considered as a gold-standard for epigenetic biomarker development. Existing software tools facilitate primer design, primer quality control or visualization of primer localization. However, high-throughput design of primers for BT-DNA amplification is hampered by limits in throughput and functionality of existing tools, requiring users to repeatedly perform specific tasks manually. Consequently, the design of PCR primers for BT-DNA remains a tedious and time-consuming process. To bridge this gap, we developed AmpliconDesign, a webserver providing a scalable and user-friendly platform for the design and analysis of targeted DNA methylation studies based on BT-DNA, e.g. deep amplicon bisulfite sequencing (ampBS-seq) or EpiTYPER MassArray. Core functionality of the web server includes high-throughput primer design and binding site validation based on in silico bisulfite-converted DNA sequences, prediction of fragmentation patterns for EpiTYPER MassArray, an interactive quality control as well as a streamlined analysis workflow for ampBS-seq.
Collapse
Affiliation(s)
- Maximilian Schönung
- Section Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) & National Center for Tumor Diseases (NCT), Heidelberg, Germany.,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Jana Hess
- Saarland University, Saarbrücken, Germany
| | - Pascal Bawidamann
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Sina Stäble
- Section Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) & National Center for Tumor Diseases (NCT), Heidelberg, Germany.,Division of Experimental Hematology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Joschka Hey
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.,Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,German-Israeli Helmholtz Research School in Cancer Biology
| | - Jens Langstein
- Section Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) & National Center for Tumor Diseases (NCT), Heidelberg, Germany.,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Yassen Assenov
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dieter Weichenhan
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Daniel B Lipka
- Section Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) & National Center for Tumor Diseases (NCT), Heidelberg, Germany
| |
Collapse
|
8
|
Smith B, Hermsen M, Lesser E, Ravichandar D, Kremers W. Developing image analysis pipelines of whole-slide images: Pre- and post-processing. J Clin Transl Sci 2020; 5:e38. [PMID: 33948260 DOI: 10.1017/cts.2020.531] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Deep learning has pushed the scope of digital pathology beyond simple digitization and telemedicine. The incorporation of these algorithms in routine workflow is on the horizon and maybe a disruptive technology, reducing processing time, and increasing detection of anomalies. While the newest computational methods enjoy much of the press, incorporating deep learning into standard laboratory workflow requires many more steps than simply training and testing a model. Image analysis using deep learning methods often requires substantial pre- and post-processing order to improve interpretation and prediction. Similar to any data processing pipeline, images must be prepared for modeling and the resultant predictions need further processing for interpretation. Examples include artifact detection, color normalization, image subsampling or tiling, removal of errant predictions, etc. Once processed, predictions are complicated by image file size - typically several gigabytes when unpacked. This forces images to be tiled, meaning that a series of subsamples from the whole-slide image (WSI) are used in modeling. Herein, we review many of these methods as they pertain to the analysis of biopsy slides and discuss the multitude of unique issues that are part of the analysis of very large images.
Collapse
|
9
|
Ko G, Kim PG, Cho Y, Jeong S, Kim JY, Kim KH, Lee HY, Han J, Yu N, Ham S, Jang I, Kang B, Shin S, Kim L, Lee SW, Nam D, Kim JF, Kim N, Kim SY, Lee S, Roh TY, Lee B. Bioinformatics services for analyzing massive genomic datasets. Genomics Inform 2020; 18:e8. [PMID: 32224841 PMCID: PMC7120352 DOI: 10.5808/gi.2020.18.1.e8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/11/2020] [Indexed: 11/25/2022] Open
Abstract
The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.
Collapse
Affiliation(s)
- Gunhwan Ko
- Korea Bioinformation Center (KOBIC), KRIBB, Daejeon 34141, Korea
| | - Pan-Gyu Kim
- Korea Bioinformation Center (KOBIC), KRIBB, Daejeon 34141, Korea
| | - Youngbum Cho
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Seongmun Jeong
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Jae-Yoon Kim
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | | | - Ho-Yeon Lee
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Jiyeon Han
- Department of BioInformation Science, Ewha Womans University, Seoul 03760, Korea
| | - Namhee Yu
- Department of BioInformation Science, Ewha Womans University, Seoul 03760, Korea
| | - Seokjin Ham
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea
| | - Insoon Jang
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea
| | - Byunghee Kang
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea
| | - Sunguk Shin
- Department of Systems, Biology Division of Life Sciences, and Institute for Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Lian Kim
- Bioposh Inc., Daejeon 34016, Korea
| | | | - Dougu Nam
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
| | - Jihyun F Kim
- Department of Systems, Biology Division of Life Sciences, and Institute for Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea.,Strategic Initiative for Microbiomes in Agriculture and Food, Yonsei University, Seoul 03722, Korea
| | - Namshin Kim
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Seon-Young Kim
- Genome Structure Research Center, KRIBB, Daejeon 34141, Korea
| | - Sanghyuk Lee
- Department of BioInformation Science, Ewha Womans University, Seoul 03760, Korea
| | - Tae-Young Roh
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea.,SysGenLab Inc., Pohang 37613, Korea
| | - Byungwook Lee
- Korea Bioinformation Center (KOBIC), KRIBB, Daejeon 34141, Korea
| |
Collapse
|
10
|
De Bonis G, Dasilva M, Pazienti A, Sanchez-Vives MV, Mattia M, Paolucci PS. Analysis Pipeline for Extracting Features of Cortical Slow Oscillations. Front Syst Neurosci 2019; 13:70. [PMID: 31824271 PMCID: PMC6882866 DOI: 10.3389/fnsys.2019.00070] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 11/05/2019] [Indexed: 11/17/2022] Open
Abstract
Cortical slow oscillations (≲1 Hz) are an emergent property of the cortical network that integrate connectivity and physiological features. This rhythm, highly revealing of the characteristics of the underlying dynamics, is a hallmark of low complexity brain states like sleep, and represents a default activity pattern. Here, we present a methodological approach for quantifying the spatial and temporal properties of this emergent activity. We improved and enriched a robust analysis procedure that has already been successfully applied to both in vitro and in vivo data acquisitions. We tested the new tools of the methodology by analyzing the electrocorticography (ECoG) traces recorded from a custom 32-channel multi-electrode array in wild-type isoflurane-anesthetized mice. The enhanced analysis pipeline, named SWAP (Slow Wave Analysis Pipeline), detects Up and Down states, enables the characterization of the spatial dependency of their statistical properties, and supports the comparison of different subjects. The SWAP is implemented in a data-independent way, allowing its application to other data sets (acquired from different subjects, or with different recording tools), as well as to the outcome of numerical simulations. By using the SWAP, we report statistically significant differences in the observed slow oscillations (SO) across cortical areas and cortical sites. Computing cortical maps by interpolating the features of SO acquired at the electrode positions, we give evidence of gradients at the global scale along an oblique axis directed from fronto-lateral toward occipito-medial regions, further highlighting some heterogeneity within cortical areas. The results obtained using the SWAP will be essential for producing data-driven brain simulations. A spatial characterization of slow oscillations will also trigger a discussion on the role of, and the interplay between, the different regions in the cortex, improving our understanding of the mechanisms of generation and propagation of delta rhythms and, more generally, of cortical properties.
Collapse
Affiliation(s)
- Giulia De Bonis
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome, Italy
| | - Miguel Dasilva
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | | | - Maria V. Sanchez-Vives
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avanc˛ats (ICREA), Barcelona, Spain
| | | | | |
Collapse
|
11
|
Cascione L, Giudice L, Ferraresso S, Marconato L, Giannuzzi D, Napoli S, Bertoni F, Giugno R, Aresu L. Long Non-Coding RNAs as Molecular Signatures for Canine B-Cell Lymphoma Characterization. Noncoding RNA 2019; 5:ncrna5030047. [PMID: 31546795 PMCID: PMC6789837 DOI: 10.3390/ncrna5030047] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 09/06/2019] [Accepted: 09/16/2019] [Indexed: 02/08/2023] Open
Abstract
Background: Diffuse large B-cell lymphoma (DLBCL), marginal zone lymphoma (MZL) and follicular lymphoma (FL) are the most common B-cell lymphomas (BCL) in dogs. Recent investigations have demonstrated overlaps of these histotypes with the human counterparts, including clinical presentation, biologic behavior, tumor genetics, and treatment response. The molecular mechanisms that underlie canine BCL are still unknown and new studies to improve diagnosis, therapy, and the utilization of canine species as spontaneous animal tumor models are undeniably needed. Recent work using human DLBCL transcriptomes has suggested that long non-coding RNAs (lncRNAs) play a key role in lymphoma pathogenesis and pinpointed a restricted number of lncRNAs as potential targets for further studies. Results: To expand the knowledge of non-coding molecules involved in canine BCL, we used transcriptomes obtained from a cohort of 62 dogs with newly-diagnosed multicentric DLBCL, MZL and FL that had undergone complete staging work-up and were treated with chemotherapy or chemo-immunotherapy. We developed a customized R pipeline performing a transcriptome assembly by multiple algorithms to uncover novel lncRNAs, and delineate genome-wide expression of unannotated and annotated lncRNAs. Our pipeline also included a new package for high performance system biology analysis, which detects high-scoring network biological neighborhoods to identify functional modules. Moreover, our customized pipeline quantified the expression of novel and annotated lncRNAs, allowing us to subtype DLBCLs into two main groups. The DLBCL subtypes showed statistically different survivals, indicating the potential use of lncRNAs as prognostic biomarkers in future studies. Conclusions: In this manuscript, we describe the methodology used to identify lncRNAs that differentiate B-cell lymphoma subtypes and we interpreted the biological and clinical values of the results. We inferred the potential functions of lncRNAs to obtain a comprehensive and integrative insight that highlights their impact in this neoplasm.
Collapse
Affiliation(s)
- Luciano Cascione
- Institute of Oncology Research, Universita' della Svizzera Italiana, 6500 Bellinzona, Switzerland.
- Swiss Institute of Bioinformatics, 1000 Lausanne, Switzerland.
| | - Luca Giudice
- Department of Computer Science, University of Verona, 37100 Verona, Italy.
| | - Serena Ferraresso
- Department of Comparative Biomedicine and Food Science, University of Padova, 35100 Padova, Italy.
| | - Laura Marconato
- Centro Oncologico Veterinario, 40037 Sasso Marconi BO, Italy.
| | - Diana Giannuzzi
- Department of Comparative Biomedicine and Food Science, University of Padova, 35100 Padova, Italy.
| | - Sara Napoli
- Institute of Oncology Research, Universita' della Svizzera Italiana, 6500 Bellinzona, Switzerland.
| | - Francesco Bertoni
- Institute of Oncology Research, Universita' della Svizzera Italiana, 6500 Bellinzona, Switzerland.
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, 37100 Verona, Italy.
| | - Luca Aresu
- Department of Veterinary Sciences, University of Turin, 10095 Grugliasco, Italy.
| |
Collapse
|
12
|
Hofmann A, Cross M, Karow MA, Straub JH, Clemen CS, Eichinger L. A convenient tool for bivariate data analysis and bar graph plotting with R. Biochem Mol Biol Educ 2019; 47:207-210. [PMID: 30629319 DOI: 10.1002/bmb.21205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 11/19/2018] [Accepted: 12/21/2018] [Indexed: 06/09/2023]
Abstract
The Java software jBar consists of a graphical user interface that allows the user to customize and assemble an included script for R. The scripted R pipeline calculates means and standard errors/deviations for replicates of numerical bivariate data and generates presentations in the form of bar graphs. A two-sided Student's t test is carried out against a user-selected reference and p-values are calculated. The user can enter the data conveniently through the built-in spreadsheet and configure the R pipeline in the graphical user interface. The configured R script is written into a file and then executed. Bar graphs can be generated as static PNG, PDF, and SVG files or as interactive HTML widgets. © 2019 International Union of Biochemistry and Molecular Biology, 47(2): 207-210, 2019.
Collapse
Affiliation(s)
- Andreas Hofmann
- Griffith Institute for Drug Discovery, Griffith University, Nathan, Queensland, 4111, Australia
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Megan Cross
- Griffith Institute for Drug Discovery, Griffith University, Nathan, Queensland, 4111, Australia
| | - Malte A Karow
- Center for Biochemistry, Institute of Biochemistry I, Medical Faculty, University of Cologne, Cologne, Germany
| | - Jan H Straub
- Griffith Institute for Drug Discovery, Griffith University, Nathan, Queensland, 4111, Australia
| | - Christoph S Clemen
- Center for Biochemistry, Institute of Biochemistry I, Medical Faculty, University of Cologne, Cologne, Germany
- Department of Neurology, Heimer Institute for Muscle Research, University Hospital, Bergmannsheil, Ruhr-University Bochum, Bochum, Germany
| | - Ludwig Eichinger
- Center for Biochemistry, Institute of Biochemistry I, Medical Faculty, University of Cologne, Cologne, Germany
| |
Collapse
|
13
|
Andersen LM. Group Analysis in FieldTrip of Time-Frequency Responses: A Pipeline for Reproducibility at Every Step of Processing, Going From Individual Sensor Space Representations to an Across-Group Source Space Representation. Front Neurosci 2018; 12:261. [PMID: 29765297 PMCID: PMC5938406 DOI: 10.3389/fnins.2018.00261] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 04/04/2018] [Indexed: 11/13/2022] Open
Abstract
An important aim of an analysis pipeline for magnetoencephalographic (MEG) data is that it allows for the researcher spending maximal effort on making the statistical comparisons that will answer his or her questions. The example question being answered here is whether the so-called beta rebound differs between novel and repeated stimulations. Two analyses are presented: going from individual sensor space representations to, respectively, an across-group sensor space representation and an across-group source space representation. The data analyzed are neural responses to tactile stimulations of the right index finger in a group of 20 healthy participants acquired from an Elekta Neuromag System. The processing steps covered for the first analysis are MaxFiltering the raw data, defining, preprocessing and epoching the data, cleaning the data, finding and removing independent components related to eye blinks, eye movements and heart beats, calculating participants' individual evoked responses by averaging over epoched data and subsequently removing the average response from single epochs, calculating a time-frequency representation and baselining it with non-stimulation trials and finally calculating a grand average, an across-group sensor space representation. The second analysis starts from the grand average sensor space representation and after identification of the beta rebound the neural origin is imaged using beamformer source reconstruction. This analysis covers reading in co-registered magnetic resonance images, segmenting the data, creating a volume conductor, creating a forward model, cutting out MEG data of interest in the time and frequency domains, getting Fourier transforms and estimating source activity with a beamformer model where power is expressed relative to MEG data measured during periods of non-stimulation. Finally, morphing the source estimates onto a common template and performing group-level statistics on the data are covered. Functions for saving relevant figures in an automated and structured manner are also included. The protocol presented here can be applied to any research protocol where the emphasis is on source reconstruction of induced responses where the underlying sources are not coherent.
Collapse
Affiliation(s)
- Lau M Andersen
- NatMEG, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
14
|
Andersen LM. Group Analysis in MNE-Python of Evoked Responses from a Tactile Stimulation Paradigm: A Pipeline for Reproducibility at Every Step of Processing, Going from Individual Sensor Space Representations to an across-Group Source Space Representation. Front Neurosci 2018; 12:6. [PMID: 29403349 PMCID: PMC5786561 DOI: 10.3389/fnins.2018.00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 01/04/2018] [Indexed: 11/30/2022] Open
Abstract
An important aim of an analysis pipeline for magnetoencephalographic data is that it allows for the researcher spending maximal effort on making the statistical comparisons that will answer the questions of the researcher, while in turn spending minimal effort on the intricacies and machinery of the pipeline. I here present a set of functions and scripts that allow for setting up a clear, reproducible structure for separating raw and processed data into folders and files such that minimal effort can be spend on: (1) double-checking that the right input goes into the right functions; (2) making sure that output and intermediate steps can be accessed meaningfully; (3) applying operations efficiently across groups of subjects; (4) re-processing data if changes to any intermediate step are desirable. Applying the scripts requires only general knowledge about the Python language. The data analyses are neural responses to tactile stimulations of the right index finger in a group of 20 healthy participants acquired from an Elekta Neuromag System. Two analyses are presented: going from individual sensor space representations to, respectively, an across-group sensor space representation and an across-group source space representation. The processing steps covered for the first analysis are filtering the raw data, finding events of interest in the data, epoching data, finding and removing independent components related to eye blinks and heart beats, calculating participants' individual evoked responses by averaging over epoched data and calculating a grand average sensor space representation over participants. The second analysis starts from the participants' individual evoked responses and covers: estimating noise covariance, creating a forward model, creating an inverse operator, estimating distributed source activity on the cortical surface using a minimum norm procedure, morphing those estimates onto a common cortical template and calculating the patterns of activity that are statistically different from baseline. To estimate source activity, processing of the anatomy of subjects based on magnetic resonance imaging is necessary. The necessary steps are covered here: importing magnetic resonance images, segmenting the brain, estimating boundaries between different tissue layers, making fine-resolution scalp surfaces for facilitating co-registration, creating source spaces and creating volume conductors for each subject.
Collapse
Affiliation(s)
- Lau M Andersen
- NatMEG, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
15
|
Abstract
Complex biological systems are composed of multiple cell types whose transcriptional activity can vary due to differences in cell state, environmental stimulation, or intrinsic programs. Conventional bulk analysis methods capture the average transcriptional programs of the cell population, thus missing the unique cellular signature of each single cell. In recent years, the development of single-cell RNA-sequencing (scRNA-seq) technologies has provided a powerful approach to dissect the cellular heterogeneity of complex biological systems. However, such approaches require specialized equipment or are costly. In this article, we describe an improved Smart-seq2-based method to profile the transcriptome of hundreds of single cells simultaneously, without utilizing commercial kits or requiring any specialized single-cell capture/library preparation tools. Moreover, we introduce the Automated Single-cell Analysis Pipeline (ASAP), which allows researchers without strong computational expertise to explore scRNA-seq data using a wide range of commonly used algorithms and sophisticated visualization tools. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Wanze Chen
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Vincent Gardeux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Antonio Meireles-Filho
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
16
|
Crispatzu G, Kulkarni P, Toliat MR, Nürnberg P, Herling M, Herling CD, Frommolt P. Semi-automated cancer genome analysis using high-performance computing. Hum Mutat 2017; 38:1325-1335. [PMID: 28598576 DOI: 10.1002/humu.23275] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 05/24/2017] [Accepted: 06/03/2017] [Indexed: 12/26/2022]
Abstract
Next-generation sequencing (NGS) has turned from a new and experimental technology into a standard procedure for cancer genome studies and clinical investigation. While a multitude of software packages for cancer genome data analysis have been made available, these need to be combined into efficient analytical workflows that cover multiple aspects relevant to a clinical environment and that deliver handy results within a reasonable time frame. Here, we introduce QuickNGS Cancer as a new suite of bioinformatics pipelines that is focused on cancer genomics and significantly reduces the analytical hurdles that still limit a broader applicability of NGS technology, particularly to clinically driven research. QuickNGS Cancer allows a highly efficient analysis of a broad variety of NGS data types, specifically considering cancer-specific issues, such as biases introduced by tumor impurity and aneuploidy or the assessment of genomic variations regarding their biomedical relevance. It delivers highly reproducible analysis results ready for interpretation within only a few days after sequencing, as shown by a reanalysis of 140 tumor/normal pairs from The Cancer Genome Atlas (TCGA) in which QuickNGS Cancer detected a significant number of mutations in key cancer genes missed by a well-established mutation calling pipeline. Finally, QuickNGS Cancer obtained several unexpected mutations in leukemias that could be confirmed by Sanger sequencing.
Collapse
Affiliation(s)
- Giuliano Crispatzu
- Bioinformatics Core Facility, Cluster of Excellence on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany.,Laboratory of Lymphocyte Signaling and Oncoproteome, Cluster of Excellence on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Pranav Kulkarni
- Bioinformatics Core Facility, Cluster of Excellence on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Mohammad R Toliat
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
| | - Peter Nürnberg
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany.,Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Marco Herling
- Laboratory of Lymphocyte Signaling and Oncoproteome, Cluster of Excellence on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany.,Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Carmen D Herling
- Laboratory of Functional Genomics in Lymphoid Malignancies, Department I of Internal Medicine, Center of Integrated Oncology (CIO) Cologne-Bonn, University of Cologne, Cologne, Germany
| | - Peter Frommolt
- Bioinformatics Core Facility, Cluster of Excellence on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| |
Collapse
|
17
|
Cadzow M, Boocock J, Nguyen HT, Wilcox P, Merriman TR, Black MA. A bioinformatics workflow for detecting signatures of selection in genomic data. Front Genet 2014; 5:293. [PMID: 25206364 PMCID: PMC4144660 DOI: 10.3389/fgene.2014.00293] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Accepted: 08/06/2014] [Indexed: 11/13/2022] Open
Abstract
The detection of "signatures of selection" is now possible on a genome-wide scale in many plant and animal species, and can be performed in a population-specific manner due to the wealth of per-population genome-wide genotype data that is available. With genomic regions that exhibit evidence of having been under selection shown to also be enriched for genes associated with biologically important traits, detection of evidence of selective pressure is emerging as an additional approach for identifying novel gene-trait associations. While high-density genotype data is now relatively easy to obtain, for many researchers it is not immediately obvious how to go about identifying signatures of selection in these data sets. Here we describe a basic workflow, constructed from open source tools, for detecting and examining evidence of selection in genomic data. Code to install and implement the pipeline components, and instructions to run a basic analysis using the workflow described here, can be downloaded from our public GitHub repository: http://www.github.com/smilefreak/selectionTools/
Collapse
Affiliation(s)
- Murray Cadzow
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| | - James Boocock
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| | - Hoang T Nguyen
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand ; Department of Mathematics and Statistics, University of Otago Dunedin, New Zealand
| | - Phillip Wilcox
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand ; New Zealand Forest Research Institute Ltd Rotorua, New Zealand
| | - Tony R Merriman
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| | - Michael A Black
- Department of Biochemistry, University of Otago Dunedin, New Zealand ; Virtual Institute of Statistical Genetics Rotorua, New Zealand
| |
Collapse
|
18
|
Lee IH, Lee K, Hsing M, Choe Y, Park JH, Kim SH, Bohn JM, Neu MB, Hwang KB, Green RC, Kohane IS, Kong SW. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline. Hum Mutat 2014; 35:537-47. [PMID: 24478219 DOI: 10.1002/humu.22520] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/23/2014] [Indexed: 01/02/2023]
Abstract
Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).
Collapse
Affiliation(s)
- In-Hee Lee
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Department of Medicine, Boston Children's Hospital, Boston, Massachusetts, 02115
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|