1
|
Taxonomy and Phylogeny of Novel and Extant Taxa in Pleosporales Associated with Mangifera indica from Yunnan, China (Series I). J Fungi (Basel) 2022; 8:jof8020152. [PMID: 35205906 PMCID: PMC8876165 DOI: 10.3390/jof8020152] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/27/2022] [Accepted: 01/28/2022] [Indexed: 11/17/2022] Open
Abstract
Pleosporales is the largest fungal order with a worldwide distribution in terrestrial and aquatic environments. During investigations of saprobic fungi associated with mango (Mangifera indica) in Baoshan and Honghe, Yunnan, China, fungal taxa belonging to pleosporales were collected. Morphological examinations and phylogenetic analyses of ITS, LSU, SSU, rpb2 and tef1-α loci were used to identify the fungal taxa. A new genus, Mangifericomes; four new species, namely Mangifericomes hongheensis, Neomassaria hongheensis, Paramonodictys hongheensis, and Paramonodictys yunnanensis; and six new host and country records, namely Byssosphaeria siamensis, Crassiparies quadrisporus, Paradictyoarthrinium aquatica, Phaeoseptum mali, Torula fici, and Vaginatispora amygdali, are introduced. Photoplates, full descriptions, and phylogenetic trees to show the placement of new and known taxa are provided.
Collapse
|
2
|
Jun Shin S, Wu Y, Hao N. A backward procedure for change‐point detection with applications to copy number variation detection. CAN J STAT 2020. [DOI: 10.1002/cjs.11535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Seung Jun Shin
- Department of StatisticsKorea UniversitySeoul South Korea
| | - Yichao Wu
- Department of Mathematics, Statistics, and Computer ScienceThe University of Illinois at ChicagoChicago IL U.S.A
| | - Ning Hao
- Department of MathematicsThe University of ArizonaTuscon AZ U.S.A
| |
Collapse
|
3
|
An efficient method to handle the 'large p, small n' problem for genomewide association studies using Haseman-Elston regression. J Genet 2017; 95:847-852. [PMID: 27994183 DOI: 10.1007/s12041-016-0705-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The 'large p, small n' problem in genomewide association studies (GWAS) is an important subject in genetic studies. Many approaches have been proposed for this issue, but none of them successfully combine the Haseman-Elston (H-E) regression with sliding-window scan approaches in GWAS. In this article, we extended H-E regression to GWAS, and replaced original data with different measurements of phenotype of sib pairs. Meanwhile, we also applied hidden Markov model to infer identity by state. Using subsequent simulation studies, we found that it had higher statistical power than the corresponding single-marker association studies. The advantage of the H-E regression was also sufficient to capture about 48.01% of the quantitative trait locus (QTL). Meanwhile, the results show that the power decreases with the increase in the number of QTLs, and the power of H-E regression is sensitive to heritability.
Collapse
|
4
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
5
|
Yu T, Peng H. Hierarchical clustering of high-throughput expression data based on general dependences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1080-1085. [PMID: 24334400 PMCID: PMC3905248 DOI: 10.1109/tcbb.2013.99] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
High-throughput expression technologies, including gene expression array and liquid chromatography--mass spectrometry (LC-MS) and so on, measure thousands of features, i.e., genes or metabolites, on a continuous scale. In such data, both linear and nonlinear relations exist between features. Nonlinear relations can reflect critical regulation patterns in the biological system. However, they are not identified and utilized by traditional clustering methods based on linear associations. Clustering based on general dependences, i.e., both linear and nonlinear relations, is hampered by the high dimensionality and high noise level of the data. We developed a sensitive nonparametric measure of general dependence between (groups of) random variables in high dimensions. Based on this dependence measure, we developed a hierarchical clustering method. In simulation studies, the method outperformed correlation- and mutual information (MI)-based hierarchical clustering methods in clustering features with nonlinear dependences. We applied the method to a microarray data set measuring the gene expression in cell-cycle time series to show it generates biologically relevant results. The R code is available at http://userwww.service.emory.edu/~tyu8/GDHC.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| | - Hesen Peng
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| |
Collapse
|
6
|
Nilsen G, Liestøl K, Van Loo P, Moen Vollan HK, Eide MB, Rueda OM, Chin SF, Russell R, Baumbusch LO, Caldas C, Børresen-Dale AL, Lingjaerde OC. Copynumber: Efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 2012; 13:591. [PMID: 23442169 PMCID: PMC3582591 DOI: 10.1186/1471-2164-13-591] [Citation(s) in RCA: 191] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 10/15/2012] [Indexed: 12/15/2022] Open
Abstract
Background Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number. Results A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented. Conclusions The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.
Collapse
Affiliation(s)
- Gro Nilsen
- Biomedical Informatics, Dept of Informatics, University of Oslo, Oslo, Norway
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Donigan KA, Tuck D, Schulz V, Sweasy JB. DNA polymerase β variant Ile260Met generates global gene expression changes related to cellular transformation. Mutagenesis 2012; 27:683-91. [PMID: 22914675 DOI: 10.1093/mutage/ges034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Maintenance of genomic stability is essential for cellular survival. The base excision repair (BER) pathway is critical for resolution of abasic sites and damaged bases, estimated to occur 20,000 times in cells daily. DNA polymerase β (Pol β) participates in BER by filling DNA gaps that result from excision of damaged bases. Approximately 30% of human tumours express Pol β variants, many of which have altered fidelity and activity in vitro and when expressed, induce cellular transformation. The prostate tumour variant Ile260Met transforms cells and is a sequence-context-dependent mutator. To test the hypothesis that mutations induced in vivo by Ile260Met lead to cellular transformation, we characterized the genome-wide expression profile of a clone expressing Ile260Met as compared with its non-induced counterpart. Using a 1.5-fold minimum cut-off with a false discovery rate (FDR) of <0.05, 912 genes exhibit altered expression. Microarray results were confirmed by quantitative real-time polymerase chain reaction (qRT-PCR) and revealed unique expression profiles in other clones. Gene Ontology (GO) clusters were analyzed using Ingenuity Pathways Analysis to identify altered gene networks and associated nodes. We determined three nodes of interest that exhibited dysfunctional regulation of downstream gene products without themselves having altered expression. One node, peroxisome proliferator-activated protein γ (PPARG), was sequenced and found to contain a coding region mutation in PPARG2 only in transformed cells. Further analysis suggests that this mutation leads to dominant negative activity of PPARG2. PPARG is a transcription factor implicated to have tumour suppressor function. This suggests that the PPARG2 mutant may have played a role in driving cellular transformation. We conclude that PPARG induces cellular transformation by a mutational mechanism.
Collapse
Affiliation(s)
- Katherine A Donigan
- Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | | | | | | |
Collapse
|
8
|
ROCS: receiver operating characteristic surface for class-skewed high-throughput data. PLoS One 2012; 7:e40598. [PMID: 22792381 PMCID: PMC3391298 DOI: 10.1371/journal.pone.0040598] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Accepted: 06/11/2012] [Indexed: 12/04/2022] Open
Abstract
The receiver operating characteristic (ROC) curve is an important tool to gauge the performance of classifiers. In certain situations of high-throughput data analysis, the data is heavily class-skewed, i.e. most features tested belong to the true negative class. In such cases, only a small portion of the ROC curve is relevant in practical terms, rendering the ROC curve and its area under the curve (AUC) insufficient for the purpose of judging classifier performance. Here we define an ROC surface (ROCS) using true positive rate (TPR), false positive rate (FPR), and true discovery rate (TDR). The ROC surface, together with the associated quantities, volume under the surface (VUS) and FDR-controlled area under the ROC curve (FCAUC), provide a useful approach for gauging classifier performance on class-skewed high-throughput data. The implementation as an R package is available at http://userwww.service.emory.edu/~tyu8/ROCS/.
Collapse
|
9
|
Cutts RJ, Dayem Ullah AZ, Sangaralingam A, Gadaleta E, Lemoine NR, Chelala C. O-miner: an integrative platform for automated analysis and mining of -omics data. Nucleic Acids Res 2012; 40:W560-8. [PMID: 22600742 PMCID: PMC3394300 DOI: 10.1093/nar/gks432] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-throughput profiling has generated massive amounts of data across basic, clinical and translational research fields. However, open source comprehensive web tools for analysing data obtained from different platforms and technologies are still lacking. To fill this gap and the unmet computational needs of ongoing research projects, we developed O-miner, a rapid, comprehensive, efficient web tool that covers all the steps required for the analysis of both transcriptomic and genomic data starting from raw image files through in-depth bioinformatics analysis and annotation to biological knowledge extraction. O-miner was developed from a biologist end-user perspective. Hence, it is as simple to use as possible within the confines of the complexity of the data being analysed. It provides a strong analytical suite able to overlay and harness large, complicated, raw and heterogeneous sets of profiles with biological/clinical data. Biologists can use O-miner to analyse and integrate different types of data and annotations to build knowledge of relevant altered mechanisms and pathways in order to identify and prioritize novel targets for further biological validation. Here we describe the analytical workflows currently available using O-miner and present examples of use. O-miner is freely available at www.o-miner.org.
Collapse
Affiliation(s)
- Rosalind J Cutts
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | | | | | | | | |
Collapse
|
10
|
Ahn J, Yoon Y, Park C, Park S. CNV detection method optimized for high-resolution arrayCGH by normality test. Comput Biol Med 2012; 42:468-73. [DOI: 10.1016/j.compbiomed.2011.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Revised: 12/05/2011] [Accepted: 12/27/2011] [Indexed: 11/24/2022]
|
11
|
Park C, Ahn J, Yoon Y, Park S. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data. PLoS One 2011; 6:e26975. [PMID: 22073121 PMCID: PMC3205051 DOI: 10.1371/journal.pone.0026975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 10/07/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.
Collapse
Affiliation(s)
- Chihyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Jaegyoon Ahn
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Youngmi Yoon
- Division of Information Engineering, Gachon University of Medicine and Science, Incheon, South Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| |
Collapse
|
12
|
Dalmasso C, Broët P. Detection of chromosomal abnormalities using high resolution arrays in clinical cancer research. J Biomed Inform 2011; 44:936-42. [PMID: 21703362 DOI: 10.1016/j.jbi.2011.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2010] [Revised: 05/11/2011] [Accepted: 06/06/2011] [Indexed: 01/15/2023]
Abstract
In clinical cancer research, high throughput genomic technologies are increasingly used to identify copy number aberrations. However, the admixture of tumor and stromal cells and the inherent karyotypic heterogeneity of most of the solid tumor samples make this task highly challenging. Here, we propose a robust two-step strategy to detect copy number aberrations in such a context. A spatial mixture model is first used to fit the preprocessed data. Then, a calling algorithm is applied to classify the genomic segments in three biologically meaningful states (copy loss, copy gain and modal copy). The results of a simulation study show the good properties of the proposed procedure with complex patterns of genomic aberrations. The interest of the proposed procedure in clinical cancer research is then illustrated by the analysis of real lung adenocarcinoma samples.
Collapse
Affiliation(s)
- Cyril Dalmasso
- Genome Institute of Singapore, 60 Biopolis Street, 02-01 Genome, Singapore.
| | | |
Collapse
|
13
|
Vandeweyer G, Reyniers E, Wuyts W, Rooms L, Kooy RF. CNV-WebStore: online CNV analysis, storage and interpretation. BMC Bioinformatics 2011; 12:4. [PMID: 21208430 PMCID: PMC3024943 DOI: 10.1186/1471-2105-12-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Accepted: 01/05/2011] [Indexed: 02/02/2023] Open
Abstract
Background Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV data management and interpretation system. Results We present CNV-WebStore, an online platform to streamline the processing and downstream interpretation of microarray data in a clinical context, tailored towards but not limited to the Illumina BeadArray platform. Provided analysis tools include CNV analsyis, parent of origin and uniparental disomy detection. Interpretation tools include data visualisation, gene prioritisation, automated PubMed searching, linking data to several genome browsers and annotation of CNVs based on several public databases. Finally a module is provided for uniform reporting of results. Conclusion CNV-WebStore is able to present copy number data in an intuitive way to both lab technicians and clinicians, making it a useful tool in daily clinical practice.
Collapse
Affiliation(s)
- Geert Vandeweyer
- Department of Medical Genetics, University Hospital Antwerp, Antwerp, Belgium
| | | | | | | | | |
Collapse
|
14
|
A bayesian analysis for identifying DNA copy number variations using a compound poisson process. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010; 2010:268513. [PMID: 20976296 DOI: 10.1155/2010/268513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2010] [Revised: 07/29/2010] [Accepted: 08/06/2010] [Indexed: 11/17/2022]
Abstract
To study chromosomal aberrations that may lead to cancer formation or genetic diseases, the array-based Comparative Genomic Hybridization (aCGH) technique is often used for detecting DNA copy number variants (CNVs). Various methods have been developed for gaining CNVs information based on aCGH data. However, most of these methods make use of the log-intensity ratios in aCGH data without taking advantage of other information such as the DNA probe (e.g., biomarker) positions/distances contained in the data. Motivated by the specific features of aCGH data, we developed a novel method that takes into account the estimation of a change point or locus of the CNV in aCGH data with its associated biomarker position on the chromosome using a compound Poisson process. We used a Bayesian approach to derive the posterior probability for the estimation of the CNV locus. To detect loci of multiple CNVs in the data, a sliding window process combined with our derived Bayesian posterior probability was proposed. To evaluate the performance of the method in the estimation of the CNV locus, we first performed simulation studies. Finally, we applied our approach to real data from aCGH experiments, demonstrating its applicability.
Collapse
|
15
|
Rueda OM, Diaz-Uriarte R. RJaCGH: Bayesian analysis of aCGH arrays for detecting copy number changes and recurrent regions. Bioinformatics 2009; 25:1959-60. [PMID: 19420051 PMCID: PMC2712338 DOI: 10.1093/bioinformatics/btp307] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Revised: 04/20/2009] [Accepted: 04/30/2009] [Indexed: 11/29/2022] Open
Abstract
SUMMARY Several methods have been proposed to detect copy number changes and recurrent regions of copy number variation from aCGH, but few methods return probabilities of alteration explicitly, which are the direct answer to the question 'is this probe/region altered?' RJaCGH fits a Non-Homogeneous Hidden Markov model to the aCGH data using Markov Chain Monte Carlo with Reversible Jump, and returns the probability that each probe is gained or lost. Using these probabilites, recurrent regions (over sets of individuals) of copy number alteration can be found. AVAILABILITY RJaCGH is available as an R package from CRAN repositories (e.g. http://cran.r-project.org/web/packages).
Collapse
Affiliation(s)
- Oscar M Rueda
- Structural Biology and Biocomputing Programme, Spanish National Cancer Center (CNIO), Madrid 28029, Spain.
| | | |
Collapse
|
16
|
Sun W, Wright FA, Tang Z, Nordgard SH, Van Loo P, Yu T, Kristensen VN, Perou CM. Integrated study of copy number states and genotype calls using high-density SNP arrays. Nucleic Acids Res 2009; 37:5365-77. [PMID: 19581427 PMCID: PMC2935461 DOI: 10.1093/nar/gkp493] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Bicciato S, Spinelli R, Zampieri M, Mangano E, Ferrari F, Beltrame L, Cifola I, Peano C, Solari A, Battaglia C. A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res 2009; 37:5057-70. [PMID: 19542187 PMCID: PMC2731905 DOI: 10.1093/nar/gkp520] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The integration of high-throughput genomic data represents an opportunity for deciphering the interplay between structural and functional organization of genomes and for discovering novel biomarkers. However, the development of integrative approaches to complement gene expression (GE) data with other types of gene information, such as copy number (CN) and chromosomal localization, still represents a computational challenge in the genomic arena. This work presents a computational procedure that directly integrates CN and GE profiles at genome-wide level. When applied to DNA/RNA paired data, this approach leads to the identification of Significant Overlaps of Differentially Expressed and Genomic Imbalanced Regions (SODEGIR). This goal is accomplished in three steps. The first step extends to CN a method for detecting regional imbalances in GE. The second part provides the integration of CN and GE data and identifies chromosomal regions with concordantly altered genomic and transcriptional status in a tumor sample. The last step elevates the single-sample analysis to an entire dataset of tumor specimens. When applied to study chromosomal aberrations in a collection of astrocytoma and renal carcinoma samples, the procedure proved to be effective in identifying discrete chromosomal regions of coordinated CN alterations and changes in transcriptional levels.
Collapse
Affiliation(s)
- Silvio Bicciato
- Department of Biomedical Sciences, University of Modena and Reggio Emilia, Modena 41100, Italy.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Copy number variation (CNV) contributes in phenotypically relevant ways to the genetic variability of many organisms. Cost-effective genomewide methods for identifying copy number variation are necessary to elucidate the contribution that these structural variants make to the genomes of model organisms. We have developed a novel approach for the identification of copy number variation by next generation sequencing. As a proof of concept our method has been applied to map the deletions of three Drosophila deficiency strains. We demonstrate that low sequence coverage is sufficient for identifying and mapping large deletions at kilobase resolution, suggesting that data generated from high-throughput sequencing experiments are sufficient for simultaneously analyzing many strains. Genomic DNA from two Drosophila deficiency stocks was barcoded and sequenced in multiplex, and the breakpoints associated with each deletion were successfully identified. The approach we describe is immediately applicable to the systematic exploration of copy number variation in model organisms and humans.
Collapse
|
19
|
Cancer gene discovery in mouse and man. Biochim Biophys Acta Rev Cancer 2009; 1796:140-61. [PMID: 19285540 PMCID: PMC2756404 DOI: 10.1016/j.bbcan.2009.03.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2009] [Revised: 03/03/2009] [Accepted: 03/05/2009] [Indexed: 12/31/2022]
Abstract
The elucidation of the human and mouse genome sequence and developments in high-throughput genome analysis, and in computational tools, have made it possible to profile entire cancer genomes. In parallel with these advances mouse models of cancer have evolved into a powerful tool for cancer gene discovery. Here we discuss the approaches that may be used for cancer gene identification in both human and mouse and discuss how a cross-species 'oncogenomics' approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumourigenesis.
Collapse
|
20
|
A response to Yu et al. "A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array", BMC Bioinformatics 2007, 8: 145. BMC Bioinformatics 2007; 8:394. [PMID: 17939873 PMCID: PMC2222656 DOI: 10.1186/1471-2105-8-394] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2007] [Accepted: 10/16/2007] [Indexed: 12/16/2022] Open
Abstract
Background Yu et al. (BMC Bioinformatics 2007,8: 145+) have recently compared the performance of several methods for the detection of genomic amplification and deletion breakpoints using data from high-density single nucleotide polymorphism arrays. One of the methods compared is our non-homogenous Hidden Markov Model approach. Our approach uses Markov Chain Monte Carlo for inference, but Yu et al. ran the sampler for a severely insufficient number of iterations for a Markov Chain Monte Carlo-based method. Moreover, they did not use the appropriate reference level for the non-altered state. Methods We rerun the analysis in Yu et al. using appropriate settings for both the Markov Chain Monte Carlo iterations and the reference level. Additionally, to show how easy it is to obtain answers to additional specific questions, we have added a new analysis targeted specifically to the detection of breakpoints. Results The reanalysis shows that the performance of our method is comparable to that of the other methods analyzed. In addition, we can provide probabilities of a given spot being a breakpoint, something unique among the methods examined. Conclusion Markov Chain Monte Carlo methods require using a sufficient number of iterations before they can be assumed to yield samples from the distribution of interest. Running our method with too small a number of iterations cannot be representative of its performance. Moreover, our analysis shows how our original approach can be easily adapted to answer specific additional questions (e.g., identify edges).
Collapse
|
21
|
Díaz-Uriarte R, Rueda OM. ADaCGH: A parallelized web-based application and R package for the analysis of aCGH data. PLoS One 2007; 2:e737. [PMID: 17710137 PMCID: PMC1940324 DOI: 10.1371/journal.pone.0000737] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2007] [Accepted: 07/09/2007] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Copy number alterations (CNAs) in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH). The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies. METHODOLOGY/PRINCIPAL FINDINGS ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM). For improved speed, we use parallel computing (via MPI). Additional information (GO terms, PubMed citations, KEGG and Reactome pathways) is available for individual genes, and for sets of genes with altered copy numbers. CONCLUSIONS/SIGNIFICANCE ADACGH represents a qualitative increase in the standards of these types of applications: a) all of the best performing algorithms are included, not just one or two; b) we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45x); c) we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms); d) we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e) all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.
Collapse
Affiliation(s)
- Ramón Díaz-Uriarte
- Structural Biology and Biocomputing Programme, Spanish National Cancer Center, Madrid, Spain.
| | | |
Collapse
|