1
|
Yang J, Li L, Zhu X, He C, Li T, Qin J, Wang Y. Microbial Community Characterization and Molecular Resistance Monitoring in Geriatric Intensive Care Units in China Using mNGS. Infect Drug Resist 2023; 16:5121-5134. [PMID: 37576519 PMCID: PMC10422961 DOI: 10.2147/idr.s421702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 07/29/2023] [Indexed: 08/15/2023] Open
Abstract
Background Surface pathogens in the ICU pose a global public health threat, especially to elderly patients who are immunocompromised. To detect these pathogens, unbiased methods such as metagenomic next-generation sequencing (mNGS) are increasingly utilized for environmental microbiological surveillance. Methods In a six-month study from January to July 2022, we investigated microbial communities in Chinese geriatric ICUs by regularly monitoring multiple surfaces at three-month intervals. Using mNGS sequencing, we analyzed microorganisms present at eight specific locations within the ICU. Additionally, we compared pathogen profiles and drug resistance genes between patient cultures and environmental samples collected during the same period. Results The microbial composition remained relatively stable over time, but significant differences in alpha diversities were observed among various surfaces such as floors, hands, pumps, trolleys, and ventilator inlets/outlets. Surfaces with high contact frequency for healthcare workers, including workstations, ventilator panels, trolleys, pumps, and beds, harbored pathogenic microorganisms such as Acinetobacter baumannii, Cutibacterium acnes, Staphylococcus haemolyticus, Pseudomonas aeruginosa, and Enterococcus faecium. Acinetobacter baumannii, particularly the carbapenem-resistant strain (CRAB), was the most frequently identified pathogen in geriatric ICU patients regardless of testing method used. The mNGS approach enabled detection of viruses, fungi, and parasites that are challenging to culture. Additionally, an abundance of drug resistance genes was found in almost all environmental samples. Conclusion The microbial composition and abundance in the ICU remained relatively constant over time. The floor exhibited the highest microbial diversity and abundance in the ICU environment. Drug-resistant genes in the ICU environment may migrate between patients. Overall, mNGS is an emerging and powerful tool for microbiological monitoring of the hospital environment.
Collapse
Affiliation(s)
- Jilin Yang
- Department of Critical Care Medicine, The First Affiliated Hospital of Kunming Medical University, Kunming, People’s Republic of China
| | - Lingyi Li
- Department of Medical, Hangzhou Matridx Biotechnology Company, Hangzhou, People’s Republic of China
| | - Xiaolin Zhu
- Department of Critical Care Medicine, The First Affiliated Hospital of Kunming Medical University, Kunming, People’s Republic of China
| | - Chen He
- Department of Critical Care Medicine, The First Affiliated Hospital of Kunming Medical University, Kunming, People’s Republic of China
| | - Ting Li
- Department of Critical Care Medicine, The First Affiliated Hospital of Kunming Medical University, Kunming, People’s Republic of China
| | - Jiahong Qin
- Department of Critical Care Medicine, The First Affiliated Hospital of Kunming Medical University, Kunming, People’s Republic of China
| | - Yijie Wang
- Department of Critical Care Medicine, The First Affiliated Hospital of Kunming Medical University, Kunming, People’s Republic of China
| |
Collapse
|
2
|
De Falco A, Caruso F, Su XD, Iavarone A, Ceccarelli M. A variational algorithm to detect the clonal copy number substructure of tumors from scRNA-seq data. Nat Commun 2023; 14:1074. [PMID: 36841879 PMCID: PMC9968345 DOI: 10.1038/s41467-023-36790-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 02/16/2023] [Indexed: 02/27/2023] Open
Abstract
Single-cell RNA sequencing is the reference technology to characterize the composition of the tumor microenvironment and to study tumor heterogeneity at high resolution. Here we report Single CEll Variational ANeuploidy analysis (SCEVAN), a fast variational algorithm for the deconvolution of the clonal substructure of tumors from single-cell RNA-seq data. It uses a multichannel segmentation algorithm exploiting the assumption that all the cells in a given copy number clone share the same breakpoints. Thus, the smoothed expression profile of every individual cell constitutes part of the evidence of the copy number profile in each subclone. SCEVAN can automatically and accurately discriminate between malignant and non-malignant cells, resulting in a practical framework to analyze tumors and their microenvironment. We apply SCEVAN to datasets encompassing 106 samples and 93,322 cells from different tumor types and technologies. We demonstrate its application to characterize the intratumor heterogeneity and geographic evolution of malignant brain tumors.
Collapse
Affiliation(s)
- Antonio De Falco
- Department of Electrical Engineering and Information Technology (DIETI), University of Naples 'Federico II', 80128, Naples, Italy.,BIOGEM Institute of Molecular Biology and Genetics, 83031, Ariano Irpino, Italy
| | - Francesca Caruso
- Department of Electrical Engineering and Information Technology (DIETI), University of Naples 'Federico II', 80128, Naples, Italy.,BIOGEM Institute of Molecular Biology and Genetics, 83031, Ariano Irpino, Italy
| | - Xiao-Dong Su
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, 5 Yiheyuan Road, Haidian District, 100871, Beijing, China
| | - Antonio Iavarone
- Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA.,Department of Neurological Surgery, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Michele Ceccarelli
- Department of Electrical Engineering and Information Technology (DIETI), University of Naples 'Federico II', 80128, Naples, Italy. .,BIOGEM Institute of Molecular Biology and Genetics, 83031, Ariano Irpino, Italy.
| |
Collapse
|
3
|
Díez-Villanueva A, Sanz-Pamplona R, Solé X, Cordero D, Crous-Bou M, Guinó E, Lopez-Doriga A, Berenguer A, Aussó S, Paré-Brunet L, Obón-Santacana M, Moratalla-Navarro F, Salazar R, Sanjuan X, Santos C, Biondo S, Diez-Obrero V, Garcia-Serrano A, Alonso MH, Carreras-Torres R, Closa A, Moreno V. COLONOMICS - integrative omics data of one hundred paired normal-tumoral samples from colon cancer patients. Sci Data 2022; 9:595. [PMID: 36182938 PMCID: PMC9526730 DOI: 10.1038/s41597-022-01697-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 08/16/2022] [Indexed: 11/29/2022] Open
Abstract
Colonomics is a multi-omics dataset that includes 250 samples: 50 samples from healthy colon mucosa donors and 100 paired samples from colon cancer patients (tumor/adjacent). From these samples, Colonomics project includes data from genotyping, DNA methylation, gene expression, whole exome sequencing and micro-RNAs (miRNAs) expression. It also includes data from copy number variation (CNV) from tumoral samples. In addition, clinical data from all these samples is available. The aims of the project were to explore and integrate these datasets to describe colon cancer at molecular level and to compare normal and tumoral tissues. Also, to improve screening by finding biomarkers for the diagnosis and prognosis of colon cancer. This project has its own website including four browsers allowing users to explore Colonomics datasets. Since generated data could be reuse for the scientific community for exploratory or validation purposes, here we describe omics datasets included in the Colonomics project as well as results from multi-omics layers integration.
Collapse
Affiliation(s)
- Anna Díez-Villanueva
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Rebeca Sanz-Pamplona
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Xavier Solé
- Molecular Biology CORE, Center for Biomedical Diagnostics, Hospital Clínic de Barcelona, 08036, Barcelona, Spain
- Translational Genomic and Targeted Therapeutics in Solid Tumors, August Pi i Sunyer Biomedical Research Institute (IDIBAPS), 08036, Barcelona, Spain
| | - David Cordero
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Marta Crous-Bou
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology (ICO) - Bellvitge Biomedical Research Institute (IDIBELL). L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Elisabet Guinó
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Adriana Lopez-Doriga
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Antoni Berenguer
- Rheumatology Department - Parc Taulí Research and Innovation Institute (I3PT), Barcelona, Spain
| | - Susanna Aussó
- TIC Salut Social Foundation. Ministry of Health of Generalitat de Catalunya, Barcelona, Spain
| | | | - Mireia Obón-Santacana
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Ferran Moratalla-Navarro
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
| | - Ramon Salazar
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Medical Oncology Department. Catalan Institute of Oncology (ICO), Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Oncology (CIBERONC), Madrid, Spain
| | - Xavier Sanjuan
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Pathology Service, Bellvitge University Hospital (HUB), Hospitalet de Llobregat, Barcelona, Spain
| | - Cristina Santos
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Medical Oncology Department. Catalan Institute of Oncology (ICO), Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Oncology (CIBERONC), Madrid, Spain
| | - Sebastiano Biondo
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Digestive Surgery Service, Bellvitge University Hospital (HUB). Hospitalet de Llobregat, Barcelona, Spain
| | - Virginia Diez-Obrero
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
| | - Ainhoa Garcia-Serrano
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Maria Henar Alonso
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
| | - Robert Carreras-Torres
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Adria Closa
- The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
| | - Víctor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain.
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain.
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain.
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain.
| |
Collapse
|
4
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:bbac043. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
5
|
Xi J, Li A, Wang M. HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:422-434. [PMID: 29994262 DOI: 10.1109/tcbb.2018.2846599] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A common strategy to discovering cancer associated copy number aberrations (CNAs) from a cohort of cancer samples is to detect recurrent CNAs (RCNAs). Although the previous methods can successfully identify communal RCNAs shared by nearly all tumor samples, detecting subgroup-specific RCNAs and their related subgroup samples from cancer samples with heterogeneity is still invalid for these existing approaches. In this paper, we introduce a novel integrated method called HetRCNA, which can identify statistically significant subgroup-specific RCNAs and their related subgroup samples. Based on matrix decomposition framework with weight constraint, HetRCNA can successfully measure the subgroup samples by coefficients of left vectors with weight constraint and subgroup-specific RCNAs by coefficients of the right vectors and significance test. When we evaluate HetRCNA on simulated dataset, the results show that HetRCNA gives the best performances among the competing methods and is robust to the noise factors of the simulated data. When HetRCNA is applied on a real breast cancer dataset, our approach successfully identifies a bunch of RCNA regions and the result is highly correlated with the results of the other two investigated approaches. Notably, the genomic regions identified by HetRCNA harbor many breast cancer related genes reported by previous researches.
Collapse
|
6
|
Lopez-Doriga A, Valle L, Alonso MH, Aussó S, Closa A, Sanjuan X, Barquero D, Rodríguez-Moranta F, Sanz-Pamplona R, Moreno V. Telomere length alterations in microsatellite stable colorectal cancer and association with the immune response. Biochim Biophys Acta Mol Basis Dis 2018; 1864:2992-3000. [DOI: 10.1016/j.bbadis.2018.06.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Revised: 06/12/2018] [Accepted: 06/12/2018] [Indexed: 02/07/2023]
|
7
|
Girimurugan SB, Liu Y, Lung PY, Vera DL, Dennis JH, Bass HW, Zhang J. iSeg: an efficient algorithm for segmentation of genomic and epigenomic data. BMC Bioinformatics 2018; 19:131. [PMID: 29642840 PMCID: PMC5896135 DOI: 10.1186/s12859-018-2140-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 03/26/2018] [Indexed: 11/16/2022] Open
Abstract
Background Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems. Results We designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences. Conclusions We have developed an efficient general-purpose segmentation tool and showed that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis. iSeg is capable of analyzing datasets that have both positive and negative values. Tunable parameters allow users to readily adjust the statistical stringency to best match the biological nature of individual datasets, including widely or sparsely mapped genomic datasets or those with non-normal distributions. Electronic supplementary material The online version of this article (10.1186/s12859-018-2140-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Yuhang Liu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Pei-Yau Lung
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Daniel L Vera
- Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, FL, USA
| | - Jonathan H Dennis
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Hank W Bass
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA.
| |
Collapse
|
8
|
Alonso MH, Aussó S, Lopez-Doriga A, Cordero D, Guinó E, Solé X, Barenys M, de Oca J, Capella G, Salazar R, Sanz-Pamplona R, Moreno V. Comprehensive analysis of copy number aberrations in microsatellite stable colon cancer in view of stromal component. Br J Cancer 2017; 117:421-431. [PMID: 28683472 PMCID: PMC5537504 DOI: 10.1038/bjc.2017.208] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 05/11/2017] [Accepted: 06/09/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Somatic copy number aberrations (CNAs) are common acquired changes in cancer cells having an important role in the progression of colon cancer (colorectal cancer, CRC). This study aimed to perform a characterisation of CNA and their impact in gene expression. METHODS Copy number aberrations were inferred from SNP array data in a series of 99 CRC. Copy number aberration events were calculated and used to assess the association between copy number dosage, clinical and molecular characteristics of the tumours, and gene expression changes. All analyses were adjusted for the quantity of stroma in each sample, which was inferred from gene expression data. RESULTS High heterogeneity among samples was observed; the proportion of altered genome ranged between 0.04 and 26.6%. Recurrent CNA regions with gains were frequent in chromosomes 7p, 8q, 13q, and 20, whereas 8p, 17p, and 18 cumulated losses. A significant positive correlation was observed between the number of somatic mutations and total CNA (Spearman's r=0.42, P=0.006). Approximately 37% of genes located in CNA regions changed their level of expression and the average partial correlation (adjusted for stromal content) with copy number was 0.54 (interquartile range 0.20 to 0.81). Altered genes showed enrichment in pathways relevant for CRC. Tumours classified as CMS2 and CMS4 by the consensus molecular subtyping showed higher frequency of CNA. Losses of one small region in 1p36.33, with gene CDK11B, were associated with poor prognosis. More than 66% of the recurrent CNA were validated in the The Cancer Genome Atlas (TCGA) data when analysed with the same procedure. Furthermore, 79% of the genes with altered expression in our data were validated in the TCGA. CONCLUSIONS Although CNA are frequent events in microsatellite stable CRC, few focal recurrent regions were found. These aberrations have strong effects on gene expression and contribute to deregulate relevant cancer pathways. Owing to the diploid nature of stromal cells, it is important to consider the purity of tumour samples to accurately calculate CNA events in CRC.
Collapse
Affiliation(s)
- M Henar Alonso
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - Susanna Aussó
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - Adriana Lopez-Doriga
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - David Cordero
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - Elisabet Guinó
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - Xavier Solé
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - Mercè Barenys
- Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain.,Gastroenterology Service, Hospital de Viladecans, Barcelona, Spain.,Faculty of Medicine, Department of Clinical Sciences, University of Barcelona (UB), Barcelona, Spain
| | - Javier de Oca
- Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain.,Faculty of Medicine, Department of Clinical Sciences, University of Barcelona (UB), Barcelona, Spain.,Department of General and Digestive Surgery, Bellvitge University Hospital, Barcelona, Spain
| | - Gabriel Capella
- Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain.,Faculty of Medicine, Department of Clinical Sciences, University of Barcelona (UB), Barcelona, Spain.,Hereditary Cancer Program, Catalan Institute of Oncology (ICO) and CIBERONC, Barcelona, Spain
| | - Ramón Salazar
- Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain.,Faculty of Medicine, Department of Clinical Sciences, University of Barcelona (UB), Barcelona, Spain.,Oncology Department, Catalan Institute of Oncology (ICO) and CIBERONC, Barcelona, Spain
| | - Rebeca Sanz-Pamplona
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
| | - Victor Moreno
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), CIBERESP, Gran Via 199, Hospitalet Llobregat, 08908 Barcelona, Spain.,Molecular Mechanisms and Experimental Therapy Cancer Program, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain.,Faculty of Medicine, Department of Clinical Sciences, University of Barcelona (UB), Barcelona, Spain
| |
Collapse
|
9
|
Mohammadi M, Abed Hodtani G. A robust aCGH data recovery framework based on half quadratic minimization. Comput Biol Med 2016; 70:58-66. [PMID: 26803290 DOI: 10.1016/j.compbiomed.2015.12.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Revised: 12/29/2015] [Accepted: 12/30/2015] [Indexed: 11/27/2022]
Abstract
This paper presents a general half quadratic framework for simultaneous analysis of the whole array comparative genomic hybridization (aCGH) profiles in a data set. The proposed framework accommodates different M-estimation loss functions and two underlying assumptions for aCGH profiles of a data set: sparsity and low rank. Using M-estimation loss functions, this framework is more robust to various types of noise and outliers. The solution of the proposed framework is given by half quadratic (HQ) minimization. To hasten this procedure, accelerated proximal gradient (APG) is utilized. Experimental results support the robustness of the proposed framework in comparison to the state-of-the-art algorithms.
Collapse
Affiliation(s)
- Majid Mohammadi
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Ghosheh Abed Hodtani
- Department of Electrical Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Room 848, Azadi Square, Mashhad, Iran.
| |
Collapse
|
10
|
Anjum S, Morganella S, D'Angelo F, Iavarone A, Ceccarelli M. VEGAWES: variational segmentation on whole exome sequencing for copy number detection. BMC Bioinformatics 2015; 16:315. [PMID: 26416038 PMCID: PMC4587906 DOI: 10.1186/s12859-015-0748-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 09/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. RESULTS We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. CONCLUSIONS In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
Collapse
Affiliation(s)
- Samreen Anjum
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar.
| | - Sandro Morganella
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL -EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | | | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University, New York, 10027, USA.
| | - Michele Ceccarelli
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, 82100, Italy.
| |
Collapse
|
11
|
Nutsua ME, Fischer A, Nebel A, Hofmann S, Schreiber S, Krawczak M, Nothnagel M. Family-Based Benchmarking of Copy Number Variation Detection Software. PLoS One 2015. [PMID: 26197066 PMCID: PMC4510559 DOI: 10.1371/journal.pone.0133465] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The analysis of structural variants, in particular of copy-number variations (CNVs), has proven valuable in unraveling the genetic basis of human diseases. Hence, a large number of algorithms have been developed for the detection of CNVs in SNP array signal intensity data. Using the European and African HapMap trio data, we undertook a comparative evaluation of six commonly used CNV detection software tools, namely Affymetrix Power Tools (APT), QuantiSNP, PennCNV, GLAD, R-gada and VEGA, and assessed their level of pair-wise prediction concordance. The tool-specific CNV prediction accuracy was assessed in silico by way of intra-familial validation. Software tools differed greatly in terms of the number and length of the CNVs predicted as well as the number of markers included in a CNV. All software tools predicted substantially more deletions than duplications. Intra-familial validation revealed consistently low levels of prediction accuracy as measured by the proportion of validated CNVs (34-60%). Moreover, up to 20% of apparent family-based validations were found to be due to chance alone. Software using Hidden Markov models (HMM) showed a trend to predict fewer CNVs than segmentation-based algorithms albeit with greater validity. PennCNV yielded the highest prediction accuracy (60.9%). Finally, the pairwise concordance of CNV prediction was found to vary widely with the software tools involved. We recommend HMM-based software, in particular PennCNV, rather than segmentation-based algorithms when validity is the primary concern of CNV detection. QuantiSNP may be used as an additional tool to detect sets of CNVs not detectable by the other tools. Our study also reemphasizes the need for laboratory-based validation, such as qPCR, of CNVs predicted in silico.
Collapse
Affiliation(s)
- Marcel Elie Nutsua
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Annegret Fischer
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Almut Nebel
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Sylvia Hofmann
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Stefan Schreiber
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany
| | - Michael Nothnagel
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany; Cologne Center for Genomics, University of Cologne, Cologne, Germany
| |
Collapse
|
12
|
Zhou X, Yang C, Wan X, Zhao H, Yu W. Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:230-235. [PMID: 23702561 PMCID: PMC3715577 DOI: 10.1109/tcbb.2012.166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | | | | | | | | |
Collapse
|
13
|
Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS One 2012; 7:e52516. [PMID: 23285074 PMCID: PMC3527554 DOI: 10.1371/journal.pone.0052516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
Collapse
|
14
|
Morganella S, Ceccarelli M. VegaMC: a R/bioconductor package for fast downstream analysis of large array comparative genomic hybridization datasets. ACTA ACUST UNITED AC 2012; 28:2512-4. [PMID: 22815357 DOI: 10.1093/bioinformatics/bts453] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
SUMMARY Identification of genetic alterations of tumor cells has become a common method to detect the genes involved in development and progression of cancer. In order to detect driver genes, several samples need to be simultaneously analyzed. The Cancer Genome Atlas (TCGA) project provides access to a large amount of data for several cancer types. TGCA is an invaluable source of information, but analysis of this huge dataset possess important computational problems in terms of memory and execution times. Here, we present a R/package, called VegaMC (Vega multi-channel), that enables fast and efficient detection of significant recurrent copy number alterations in very large datasets. VegaMC is integrated with the output of the common tools that convert allele signal intensities in log R ratio and B allele frequency. It also enables the detection of loss of heterozigosity and provides in output two web pages allowing a rapid and easy navigation of the aberrant genes. Synthetic data and real datasets are used for quantitative and qualitative evaluation purposes. In particular, we demonstrate the ability of VegaMC on two large TGCA datasets: colon adenocarcinoma and glioblastoma multiforme. For both the datasets, we provide the list of aberrant genes which contain previously validated genes and can be used as basis for further investigations. AVAILABILITY VegaMC is a R/Bioconductor Package, available at http://bioconductor.org/packages/release/bioc/html/VegaMC.html. CONTACT morganella@unisannio.it SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
15
|
Seifert M, Gohr A, Strickert M, Grosse I. Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana. PLoS Comput Biol 2012; 8:e1002286. [PMID: 22253580 PMCID: PMC3257270 DOI: 10.1371/journal.pcbi.1002286] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 10/11/2011] [Indexed: 12/19/2022] Open
Abstract
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM). Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
Collapse
Affiliation(s)
- Michael Seifert
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | |
Collapse
|
16
|
Rippe RCA, Meulman JJ, Eilers PHC. Visualization of genomic changes by segmented smoothing using an L0 penalty. PLoS One 2012; 7:e38230. [PMID: 22679492 PMCID: PMC3367998 DOI: 10.1371/journal.pone.0038230] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 05/05/2012] [Indexed: 11/22/2022] Open
Abstract
Copy number variations (CNV) and allelic imbalance in tumor tissue can show strong segmentation. Their graphical presentation can be enhanced by appropriate smoothing. Existing signal and scatterplot smoothers do not respect segmentation well. We present novel algorithms that use a penalty on the L(0) norm of differences of neighboring values. Visualization is our main goal, but we compare classification performance to that of VEGA.
Collapse
Affiliation(s)
- Ralph C A Rippe
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | |
Collapse
|
17
|
Mahmud MP, Schliep A. Fast MCMC sampling for hidden Markov Models to determine copy number variations. BMC Bioinformatics 2011; 12:428. [PMID: 22047014 PMCID: PMC3371636 DOI: 10.1186/1471-2105-12-428] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 11/02/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. RESULTS We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. CONCLUSIONS We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. AVAILABILITY An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org.
Collapse
Affiliation(s)
- Md Pavel Mahmud
- Department of Computer Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA.
| | | |
Collapse
|
18
|
Morganella S, Pagnotta SM, Ceccarelli M. Finding recurrent copy number alterations preserving within-sample homogeneity. ACTA ACUST UNITED AC 2011; 27:2949-56. [PMID: 21873327 DOI: 10.1093/bioinformatics/btr488] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
MOTIVATION Copy number alterations (CNAs) represent an important component of genetic variation and play a significant role in many human diseases. Development of array comparative genomic hybridization (aCGH) technology has made it possible to identify CNAs. Identification of recurrent CNAs represents the first fundamental step to provide a list of genomic regions which form the basis for further biological investigations. The main problem in recurrent CNAs discovery is related to the need to distinguish between functional changes and random events without pathological relevance. Within-sample homogeneity represents a common feature of copy number profile in cancer, so it can be used as additional source of information to increase the accuracy of the results. Although several algorithms aimed at the identification of recurrent CNAs have been proposed, no attempt of a comprehensive comparison of different approaches has yet been published. RESULTS We propose a new approach, called Genomic Analysis of Important Alterations (GAIA), to find recurrent CNAs where a statistical hypothesis framework is extended to take into account within-sample homogeneity. Statistical significance and within-sample homogeneity are combined into an iterative procedure to extract the regions that likely are involved in functional changes. Results show that GAIA represents a valid alternative to other proposed approaches. In addition, we perform an accurate comparison by using two real aCGH datasets and a carefully planned simulation study. AVAILABILITY GAIA has been implemented as R/Bioconductor package. It can be downloaded from the following page http://bioinformatics.biogem.it/download/gaia. CONTACT ceccarelli@unisannio.it; morganella@unisannio.it. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Science, University of Sannio, 82100, Benevento, Italy.
| | | | | |
Collapse
|