1
|
On the Hierarchical Bernoulli Mixture Model Using Bayesian Hamiltonian Monte Carlo. Symmetry (Basel) 2021. [DOI: 10.3390/sym13122404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The model developed considers the uniqueness of a data-driven binary response (indicated by 0 and 1) identified as having a Bernoulli distribution with finite mixture components. In social science applications, Bernoulli’s constructs a hierarchical structure data. This study introduces the Hierarchical Bernoulli mixture model (Hibermimo), a new analytical model that combines the Bernoulli mixture with hierarchical structure data. The proposed approach uses a Hamiltonian Monte Carlo algorithm with a No-U-Turn Sampler (HMC/NUTS). The study has performed a compatible syntax program computation utilizing the HMC/NUTS to analyze the Bayesian Bernoulli mixture aggregate regression model (BBMARM) and Hibermimo. In the model estimation, Hibermimo yielded a result of ~90% compliance with the modeling of each district and a small Widely Applicable Information Criteria (WAIC) value.
Collapse
|
2
|
Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study. Genes (Basel) 2021; 12:genes12121872. [PMID: 34946821 PMCID: PMC8700916 DOI: 10.3390/genes12121872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/18/2021] [Accepted: 11/24/2021] [Indexed: 11/17/2022] Open
Abstract
Lung adenocarcinoma (LUAD) is a common and very lethal cancer. Accurate staging is a prerequisite for its effective diagnosis and treatment. Therefore, improving the accuracy of the stage prediction of LUAD patients is of great clinical relevance. Previous works have mainly focused on single genomic data information or a small number of different omics data types concurrently for generating predictive models. A few of them have considered multi-omics data from genome to proteome. We used a publicly available dataset to illustrate the potential of multi-omics data for stage prediction in LUAD. In particular, we investigated the roles of the specific omics data types in the prediction process. We used a self-developed method, Omics-MKL, for stage prediction that combines an existing feature ranking technique Minimum Redundancy and Maximum Relevance (mRMR), which avoids redundancy among the selected features, and multiple kernel learning (MKL), applying different kernels for different omics data types. Each of the considered omics data types individually provided useful prediction results. Moreover, using multi-omics data delivered notably better results than using single-omics data. Gene expression and methylation information seem to play vital roles in the staging of LUAD. The Omics-MKL method retained 70 features after the selection process. Of these, 21 (30%) were methylation features and 34 (48.57%) were gene expression features. Moreover, 18 (25.71%) of the selected features are known to be related to LUAD, and 29 (41.43%) to lung cancer in general. Using multi-omics data from genome to proteome for predicting the stage of LUAD seems promising because each omics data type may improve the accuracy of the predictions. Here, methylation and gene expression data may play particularly important roles.
Collapse
|
3
|
Wang R, Li S, Wen W, Zhang J. Multi-Omics Analysis of the Effects of Smoking on Human Tumors. Front Mol Biosci 2021; 8:704910. [PMID: 34796198 PMCID: PMC8592943 DOI: 10.3389/fmolb.2021.704910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 09/15/2021] [Indexed: 12/13/2022] Open
Abstract
Comprehensive studies on cancer patients with different smoking histories, including non-smokers, former smokers, and current smokers, remain elusive. Therefore, we conducted a multi-omics analysis to explore the effect of smoking history on cancer patients. Patients with smoking history were screened from The Cancer Genome Atlas database, and their multi-omics data and clinical information were downloaded. A total of 2,317 patients were included in this study, whereby current smokers presented the worst prognosis, followed by former smokers, while non-smokers showed the best prognosis. More importantly, smoking history was an independent prognosis factor. Patients with different smoking histories exhibited different immune content, and former smokers had the highest immune cells and tumor immune microenvironment. Smokers are under a higher incidence of genomic instability that can be reversed following smoking cessation in some changes. We also noted that smoking reduced the sensitivity of patients to chemotherapeutic drugs, whereas smoking cessation can reverse the situation. Competing endogenous RNA network revealed that mir-193b-3p, mir-301b, mir-205-5p, mir-132-3p, mir-212-3p, mir-1271-5p, and mir-137 may contribute significantly in tobacco-mediated tumor formation. We identified 11 methylation driver genes (including EIF5A2, GBP6, HGD, HS6ST1, ITGA5, NR2F2, PLS1, PPP1R18, PTHLH, SLC6A15, and YEATS2), and methylation modifications of some of these genes have not been reported to be associated with tumors. We constructed a 46-gene model that predicted overall survival with good predictive power. We next drew nomograms of each cancer type. Interestingly, calibration diagrams and concordance indexes are verified that the nomograms were highly accurate for the prognosis of patients. Meanwhile, we found that the 46-gene model has good applicability to the overall survival as well as to disease-specific survival and progression-free intervals. The results of this research provide new and valuable insights for the diagnosis, treatment, and follow-up of cancer patients with different smoking histories.
Collapse
Affiliation(s)
- Rui Wang
- Department of Hepatobiliary Surgery, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, Haikou, China
| | - Shanshan Li
- Department of Nursing, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, Haikou, China
| | - Wen Wen
- Department of Hepatobiliary Surgery, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, Haikou, China
| | - Jianquan Zhang
- Department of Hepatobiliary Surgery, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, Haikou, China
| |
Collapse
|
4
|
Li D, Wang X, Lu S, Wang P, Wang X, Yin W, Zhu W, Li S. Integrated analysis revealing genome-wide chromosomal copy number variation in supraglottic laryngeal squamous cell carcinoma. Oncol Lett 2020; 20:1201-1212. [PMID: 32724360 PMCID: PMC7377034 DOI: 10.3892/ol.2020.11653] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 04/27/2020] [Indexed: 01/22/2023] Open
Abstract
Laryngeal squamous cell carcinoma (LSCC) is a genetically complex tumor type and one of the leading causes of cancer-associated disability and mortality. Genetic instability, such as chromosomal instability, is associated with the tumorigenesis of LSCC. Copy number variations (CNVs) have been demonstrated to contribute to the genetic diversity of tumor pathogenesis. Comparative genomic hybridization (CGH) has emerged as a high-throughput genomic technology that facilitates the aggregation of high-resolution data of cancer-associated genomic imbalances. In the present study, a total of 38 primary supraglottic LSCC cases were analyzed by high-resolution array-based CGH (aCGH) to improve the understanding of the genetic alterations in LSCC. Additionally, integration with bioinformatic analysis of microarray expression profiling data from the Gene Expression Omnibus (GEO) database provided a fundamental method for the identification of putative target genes. Genomic CNVs were detected in all cases. The size of net genomic imbalances per case ranged between a loss of 682.3 Mb (~24% of the genome) and a gain of 1,958.6 Mb (~69% of the genome). Recurrent gains included 2pter-q22.1, 3q26.1-qter, 5pter-p12, 7p22.3p14.1, 8p12p11.22, 8q24.13q24.3, 11q13.2q13.4, 12pter-p12.2, 18pter-p11.31 and 20p13p12.1, whereas recurrent losses included 3pter-p21.32, 4q28.1-q35.2, 5q13.2-qter, 9pter-p21.3 and monosomy 13. Gains of 3q26.1-qter were associated with tumor stage, poor differentiation and smoking history. Additionally, through integration with bioinformatic analysis of data from the GEO database, putative target oncogenes, including sex-determining region Y-box 2, eukaryotic translation initiation factor 4 gamma 1, fragile X-related gene 1, disheveled segment polarity protein 3, defective n cullin neddylation 1 domain containing 1, insulin like growth factor 2 mRNA binding protein 2 and CCDC26 long non-coding RNA, and tumor suppressor genes, such as CUB and sushi multiple domains 1, cyclin dependent kinase inhibitor 2A, protocadherin 20, serine peptidase inhibitor Kazal type 5 and Nei like DNA glycosylase 3, were identified in supraglottic LSCC. Supraglottic LSCC is a genetically complex tumor type and aCGH was demonstrated to be effective in the determination of molecular profiles with higher resolution. The present results enable the identification of putative target oncogenes and tumor suppressor gene mapping in supraglottic LSCC.
Collapse
Affiliation(s)
- Dongjie Li
- Department of Otorhinolaryngology, Head and Neck Surgery, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Xianfu Wang
- Department of Pediatrics, Genetics Laboratory, University of Oklahoma Health Sciences Center, Oklahoma, OK 73104, USA
| | - Shunfei Lu
- Department of Clinical Medicine, Lishui College of Medicine, Lishui, Zhejiang 323000, P.R. China
| | - Ping Wang
- Department of Otorhinolaryngology, Head and Neck Surgery, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Xin Wang
- Department of Otorhinolaryngology, Head and Neck Surgery, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Wanzhong Yin
- Department of Otorhinolaryngology, Head and Neck Surgery, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Wei Zhu
- Department of Otorhinolaryngology, Head and Neck Surgery, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Shibo Li
- Department of Pediatrics, Genetics Laboratory, University of Oklahoma Health Sciences Center, Oklahoma, OK 73104, USA
| |
Collapse
|
5
|
Hu X, Moon JW, Li S, Xu W, Wang X, Liu Y, Lee JY. Amplification and overexpression of CTTN and CCND1 at chromosome 11q13 in Esophagus squamous cell carcinoma (ESCC) of North Eastern Chinese Population. Int J Med Sci 2016; 13:868-874. [PMID: 27877079 PMCID: PMC5118758 DOI: 10.7150/ijms.16845] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 09/01/2016] [Indexed: 12/13/2022] Open
Abstract
Esophageal squamous cell carcinoma (ESCC) is a genetically complex tumor type and is a major cause of cancer-related mortality. The combination of genetics, diet, behavior, and environment plays an important role in the carcinogenesis of ESCC. To characterize the genomic aberrations of this disease, we investigated the genomic imbalances in 19 primary ESCC cases using high-resolution array comparative genomic hybridization (CGH). All cases showed either loss or gain of whole chromosomes or segments of chromosome(s) with variable genomic sizes. The copy number alterations per case affected the median 34% (~ 1,034Mb/3,000Mb) of the whole genome. Recurrent gains were 1q21.3-qter, 3q13.11-qter, 5pter-p11, 7pter-p15.3, 7p12.1-p11.2, 7q11-q11.2, 8p12-qter, 11q13.2-q13.3, 12pter-p13.31, 17q24.2, 20q11.21-qter, and 22q11.21-q11.22 whereas the recurrent losses were 3pter-p11.1, 4pter-p12, 4q28.3-q31.22, 4q31.3-q32.1, 9pter-p12, 11q22.3-qter and 13q12.11-q22.1. Amplification of 11q13 resulting in overexpression of CTTN/CCND1 was the most prominent finding, which was observed in 13 of 19 ESCC cases. These unique profiles of copy number alteration should be validated by further studies and need to be taken into consideration when developing biomarkers for early detection of ESCC.
Collapse
Affiliation(s)
- Xiaoxia Hu
- Department of Pediatrics, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104 USA; Department of Clinical Medicine, College of Medicine and Health, Lishui University, Zhejiang, 323000, P.R. China
| | - Ji Wook Moon
- Department of Pathology, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Shibo Li
- Department of Pediatrics, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104 USA
| | - Weihong Xu
- Department of Pathology, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Xianfu Wang
- Department of Pathology, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Yuanyuan Liu
- Department of Internal Medicine, the First Hospital of Jilin University, Jilin, 130021, P.R. China
| | - Ji-Yun Lee
- Department of Pathology, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| |
Collapse
|
6
|
Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics. Proc Natl Acad Sci U S A 2014; 111:17947-52. [PMID: 25425670 DOI: 10.1073/pnas.1420822111] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Many cancers have substantial genomic heterogeneity within a given tumor, and to fully understand that diversity requires the ability to perform single cell analysis. We performed targeted sequencing of a panel of single nucleotide variants (SNVs), deletions, and IgH sequences in 1,479 single tumor cells from six acute lymphoblastic leukemia (ALL) patients. By accurately segregating groups of cooccurring mutations into distinct clonal populations, we identified codominant clones in the majority of patients. Evaluation of intraclonal mutation patterns identified clone-specific punctuated cytosine mutagenesis events, showed that most structural variants are acquired before SNVs, determined that KRAS mutations occur late in disease development but are not sufficient for clonal dominance, and identified clones within the same patient that are arrested at varied stages in B-cell development. Taken together, these data order the sequence of genetic events that underlie childhood ALL and provide a framework for understanding the development of the disease at single-cell resolution.
Collapse
|
7
|
Kim D, Shin H, Sohn KA, Verma A, Ritchie MD, Kim JH. Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. Methods 2014; 67:344-53. [PMID: 24561168 DOI: 10.1016/j.ymeth.2014.02.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2013] [Revised: 01/25/2014] [Accepted: 02/07/2014] [Indexed: 01/06/2023] Open
Abstract
In order to improve our understanding of cancer and develop multi-layered theoretical models for the underlying mechanism, it is essential to have enhanced understanding of the interactions between multiple levels of genomic data that contribute to tumor formation and progression. Although there exist recent approaches such as a graph-based framework that integrates multi-omics data including copy number alteration, methylation, gene expression, and miRNA data for cancer clinical outcome prediction, most of previous methods treat each genomic data as independent and the possible interplay between them is not explicitly incorporated to the model. However, cancer is dysregulated by multiple levels in the biological system through genomic, epigenomic, transcriptomic, and proteomic level. Thus, genomic features are likely to interact with other genomic features in the different genomic levels. In order to deepen our knowledge, it would be desirable to incorporate such inter-relationship information when integrating multi-omics data for cancer clinical outcome prediction. In this study, we propose a new graph-based framework that integrates not only multi-omics data but inter-relationship between them for better elucidating cancer clinical outcomes. In order to highlight the validity of the proposed framework, serous cystadenocarcinoma data from TCGA was adopted as a pilot task. The proposed model incorporating inter-relationship between different genomic features showed significantly improved performance compared to the model that does not consider inter-relationship when integrating multi-omics data. For the pair between miRNA and gene expression data, the model integrating miRNA, for example, gene expression, and inter-relationship between them with an AUC of 0.8476 (REI) outperformed the model combining miRNA and gene expression data with an AUC of 0.8404. Similar results were also obtained for other pairs between different levels of genomic data. Integration of different levels of data and inter-relationship between them can aid in extracting new biological knowledge by drawing an integrative conclusion from many pieces of information collected from diverse types of genomic data, eventually leading to more effective screening strategies and alternative therapies that may improve outcomes.
Collapse
Affiliation(s)
- Dokyoon Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Republic of Korea; Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| | - Hyunjung Shin
- Department of Industrial & Information Systems Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749 Suwon, Republic of Korea.
| | - Kyung-Ah Sohn
- Department of Information and Computer Engineering, Ajou University, Suwon 443-749, Republic of Korea.
| | - Anurag Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Republic of Korea; Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Republic of Korea.
| |
Collapse
|
8
|
Kim D, Shin H, Joung JG, Lee SY, Kim JH. Intra-relation reconstruction from inter-relation: miRNA to gene expression. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 3:S8. [PMID: 24521265 PMCID: PMC3852212 DOI: 10.1186/1752-0509-7-s3-s8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
BACKGROUND In computational biology, a novel knowledge has been obtained mostly by identifying 'intra-relation,' the relation between entities on a specific biological level such as from gene expression or from microRNA (miRNA) and many such researches have been successful. However, intra-relations are not fully explaining complex cancer mechanisms because the inter-relation information between different levels of genomic data is missing, e.g. miRNA and its target genes. The 'inter-relation' between different levels of genomic data can be constructed from biological experimental data as well as genomic knowledge. METHODS Previously, we have proposed a graph-based framework that integrates with multi-layers of genomic data, copy number alteration, DNA methylation, gene expression, and miRNA expression, for the cancer clinical outcome prediction. However, the limitation of previous work was that we integrated with multi-layers of genomic data without considering of inter-relationship information between genomic features. In this paper, we propose a new integrative framework that combines genomic dataset from gene expression and genomic knowledge from inter-relation between miRNA and gene expression for the clinical outcome prediction as a pilot study. RESULTS In order to demonstrate the validity of the proposed method, the prediction of short-term/long-term survival for 82 patients in glioblastoma multiforme (GBM) was adopted as a base task. Based on our results, the accuracy of our predictive model increases because of incorporation of information fused over genomic dataset from gene expression and genomic knowledge from inter-relation between miRNA and gene expression. CONCLUSIONS In the present study, the intra-relation of gene expression was reconstructed from inter-relation between miRNA and gene expression for prediction of short-term/long-term survival of GBM patients. Our finding suggests that the utilization of external knowledge representing miRNA-mediated regulation of gene expression is substantially useful for elucidating the cancer phenotype.
Collapse
Affiliation(s)
- Dokyoon Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
- Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749, Suwon, Korea
| | - Je-Gun Joung
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
- Translational Bioinformatics Lab (TBL), Samsung Genome Institute (SGI), Samsung Medical Center, Seoul, Korea
| | - Su-Yeon Lee
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
| |
Collapse
|
9
|
Nord KH, Macchia G, Tayebwa J, Nilsson J, Vult von Steyern F, Brosjö O, Mandahl N, Mertens F. Integrative genome and transcriptome analyses reveal two distinct types of ring chromosome in soft tissue sarcomas. Hum Mol Genet 2013; 23:878-88. [PMID: 24070870 DOI: 10.1093/hmg/ddt479] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Gene amplification is a common phenomenon in malignant neoplasms of all types. One mechanism behind increased gene copy number is the formation of ring chromosomes. Such structures are mitotically unstable and during tumor progression they accumulate material from many different parts of the genome. Hence, their content varies considerably between and within tumors. Partly due to this extensive variation, the genetic content of many ring-containing tumors remains poorly characterized. Ring chromosomes are particularly prevalent in specific subtypes of sarcoma. Here, we have combined fluorescence in situ hybridization (FISH), global genomic copy number and gene expression data on ring-containing soft tissue sarcomas and show that they harbor two fundamentally different types of ring chromosome: MDM2-positive and MDM2-negative rings. While the former are often found in an otherwise normal chromosome complement, the latter seem to arise in the context of general chromosomal instability. In line with this, sarcomas with MDM2-negative rings commonly show complete loss of either CDKN2A or RB1 -both known to be important for genome integrity. Sarcomas with MDM2-positive rings instead show co-amplification of a variety of potential driver oncogenes. More than 100 different genes were found to be involved, many of which are known to induce cell growth, promote proliferation or inhibit apoptosis. Several of the amplified and overexpressed genes constitute potential drug targets.
Collapse
Affiliation(s)
- Karolin H Nord
- Department of Clinical Genetics, University and Regional Laboratories, Skåne University Hospital, Lund University, 221 84 Lund, Sweden
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Discovering gene-environment interactions in glioblastoma through a comprehensive data integration bioinformatics method. Neurotoxicology 2012; 35:1-14. [PMID: 23261424 DOI: 10.1016/j.neuro.2012.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Accepted: 11/07/2012] [Indexed: 11/23/2022]
Abstract
Glioblastoma multiforme (GBM) is the most common and aggressive type of human brain tumor. Although considerable efforts to delineate the underlying pathophysiological pathways have been made during the last decades, only very limited progress on treatment have been achieved because molecular pathways that drive the aggressive nature of GBM are largely unknown. Recent studies have emphasized the importance of environmental factors and the role of gene-environment interactions (GEI) in the development of GBM. Factors such as small sample sizes and study costs have limited the conduct of GEI studies in brain tumors however. Additionally, advances in high-throughput microarrays have produced a wealth of information concerning molecular biology of glioma. In particular, microarrays have been used to obtain genetic and epigenetic changes between normal non-tumor tissue and glioma tissue. Due to the relative rarity of gliomas, microarray data for these tumors is often the product of small studies, and thus pooling this data becomes desirable. To address the challenge of small sample sizes and GEI study difficulties, we introduce a comprehensive bioinformatics method using genetic variations (copy number variations and small-scale variations) and environmental data integration that links with glioblastoma (GEG) to identify: (1) genes that interact with chemicals and have genetic variants linked to the development of GBM, (2) important pathways that may be influenced by environmental exposures (or endogenous chemicals), and (3) genes with variants in GBM that have been understudied in relation to GBM development. The first step in our GEG method identified genes responsive to environmental exposures using the Environmental Genome Project, Comparative Toxicology, and Seattle SNPs databases. These environmentally responsive genes were then compared to a curated list of genes containing copy number variation and/or mutations in GBM. This comparison produced a list of genes responsive to the environment and important to GBM that was then further analyzed using gene networking tools such as RSpider, Cytoscape, and DAVID. Using this GEG bioinformatics method we were able to identify 173 genes with the potential to be involved in GEI that may be important to the development of GBM. Sixty five of these environmentally responsive genes have not been reported as important to GBM development, despite several of them having substantial potential for response to chemicals and subsequent disease related actions. The main biological functions of these 173 genes include signaling by nerve growth factor, DNA repair, integrin cell surface interactions, biological oxidations, apoptosis, synaptic transmission, cell cycle checkpoints, and arachidonic acid metabolism. Importantly, some of these functions have been implicated in the development of several cancers, including glioma. In summary, our GEG bioinformatics approach revealed potential gene-environment interactions, and generated new data for hypothesis generation, in GBM.
Collapse
|
11
|
Synergistic effect of different levels of genomic data for cancer clinical outcome prediction. J Biomed Inform 2012; 45:1191-8. [DOI: 10.1016/j.jbi.2012.07.008] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Revised: 06/20/2012] [Accepted: 07/19/2012] [Indexed: 11/23/2022]
|
12
|
Liu YY, Chen HY, Zhang ML, Tian D, Li S, Lee JY. Loss of fragile histidine triad and amplification of 1p36.22 and 11p15.5 in primary gastric adenocarcinomas. World J Gastroenterol 2012; 18:4522-32. [PMID: 22969225 PMCID: PMC3435777 DOI: 10.3748/wjg.v18.i33.4522] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 02/01/2012] [Accepted: 04/13/2012] [Indexed: 02/06/2023] Open
Abstract
AIM: To investigate the genomic copy number alterations that may harbor key driver genes in gastric tumorigenesis.
METHODS: Using high-resolution array comparative genomic hybridization (CGH), we investigated the genomic alterations of 20 advanced primary gastric adenocarcinomas (seventeen tubular and three mucinous) of Chinese patients from the Jilin province. Ten matching adjacent normal regions from the same patients were also studied.
RESULTS: The most frequent imbalances detected in these cancer samples were gains of 3q26.31-q27.2, 5p, 8q, 11p, 18p, 19q and 20q and losses of 3p, 4p, 18q and 21q. The use of high-resolution array CGH increased the resolution and sensitivity of the observed genomic changes and identified focal genetic imbalances, which included 54 gains and 16 losses that were smaller than 1 Mb in size. The most interesting focal imbalances were the intergenic loss/homozygous deletion of the fragile histidine triad gene and the amplicons 11q13, 18q11.2 and 19q12, as well as the novel amplicons 1p36.22 and 11p15.5.
CONCLUSION: These regions, especially the focal amplicons, may harbor key driver genes that will serve as biomarkers for either the diagnosis or the prognosis of gastric cancer, and therefore, a large-scale investigation is recommended.
Collapse
|
13
|
Zhang K, Yang Y, Devanarayan V, Xie L, Deng Y, Donald S. A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data. BMC Genomics 2011; 12 Suppl 5:S10. [PMID: 22369459 PMCID: PMC3287492 DOI: 10.1186/1471-2164-12-s5-s10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges. Results The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes. Conclusions We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website http://ndinbre.org/programs/bioinformatics.php.
Collapse
Affiliation(s)
- Ke Zhang
- Department of Pathology, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58201, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Liu Y, Lee YF, Ng MK. SNP and gene networks construction and analysis from classification of copy number variations data. BMC Bioinformatics 2011; 12 Suppl 5:S4. [PMID: 21989070 PMCID: PMC3226254 DOI: 10.1186/1471-2105-12-s5-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Detection of genomic DNA copy number variations (CNVs) can provide a complete and more comprehensive view of human disease. It is interesting to identify and represent relevant CNVs from a genome-wide data due to high data volume and the complexity of interactions. RESULTS In this paper, we incorporate the DNA copy number variation data derived from SNP arrays into a computational shrunken model and formalize the detection of copy number variations as a case-control classification problem. More than 80% accuracy can be obtained using our classification model and by shrinkage, the number of relevant CNVs to disease can be determined. In order to understand relevant CNVs, we study their corresponding SNPs in the genome and a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and identify SNP networks based on their P-values. Our selected SNP networks are statistically significant compared with random SNP networks and play a role in the biological process. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that have similarity values being greater than a threshold are selected to construct gene networks. A gene enrichment analysis show that our gene networks are functionally important.Experimental results demonstrate that our selected SNP and gene networks based on the selected CNVs contain some functional relationships directly or indirectly to disease study. CONCLUSIONS Two datasets are given to demonstrate the effectiveness of the introduced method. Some statistical and biological analysis show that this shrunken classification model is effective in identifying CNVs from genome-wide data and our proposed framework has a potential to become a useful analysis tool for SNP data sets.
Collapse
Affiliation(s)
- Yang Liu
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | | | | |
Collapse
|
15
|
Lu X, Zhang K, Van Sant C, Coon J, Semizarov D. An algorithm for classifying tumors based on genomic aberrations and selecting representative tumor models. BMC Med Genomics 2010; 3:23. [PMID: 20569491 PMCID: PMC2901344 DOI: 10.1186/1755-8794-3-23] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2010] [Accepted: 06/22/2010] [Indexed: 01/08/2023] Open
Abstract
Background Cancer is a heterogeneous disease caused by genomic aberrations and characterized by significant variability in clinical outcomes and response to therapies. Several subtypes of common cancers have been identified based on alterations of individual cancer genes, such as HER2, EGFR, and others. However, cancer is a complex disease driven by the interaction of multiple genes, so the copy number status of individual genes is not sufficient to define cancer subtypes and predict responses to treatments. A classification based on genome-wide copy number patterns would be better suited for this purpose. Method To develop a more comprehensive cancer taxonomy based on genome-wide patterns of copy number abnormalities, we designed an unsupervised classification algorithm that identifies genomic subgroups of tumors. This algorithm is based on a modified genomic Non-negative Matrix Factorization (gNMF) algorithm and includes several additional components, namely a pilot hierarchical clustering procedure to determine the number of clusters, a multiple random initiation scheme, a new stop criterion for the core gNMF, as well as a 10-fold cross-validation stability test for quality assessment. Result We applied our algorithm to identify genomic subgroups of three major cancer types: non-small cell lung carcinoma (NSCLC), colorectal cancer (CRC), and malignant melanoma. High-density SNP array datasets for patient tumors and established cell lines were used to define genomic subclasses of the diseases and identify cell lines representative of each genomic subtype. The algorithm was compared with several traditional clustering methods and showed improved performance. To validate our genomic taxonomy of NSCLC, we correlated the genomic classification with disease outcomes. Overall survival time and time to recurrence were shown to differ significantly between the genomic subtypes. Conclusions We developed an algorithm for cancer classification based on genome-wide patterns of copy number aberrations and demonstrated its superiority to existing clustering methods. The algorithm was applied to define genomic subgroups of three cancer types and identify cell lines representative of these subgroups. Our data enabled the assembly of representative cell line panels for testing drug candidates.
Collapse
Affiliation(s)
- Xin Lu
- Global Pharmaceutical Research and Development, Abbott Laboratories, 100 Abbott Park Road, Building AP-10, Dep. R4CD, Abbott Park, IL 60064, USA.
| | | | | | | | | |
Collapse
|
16
|
Tsui IFL, Poh CF, Garnis C, Rosin MP, Zhang L, Lam WL. Multiple pathways in the FGF signaling network are frequently deregulated by gene amplification in oral dysplasias. Int J Cancer 2009; 125:2219-28. [PMID: 19623652 DOI: 10.1002/ijc.24611] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Genetic alteration in oral premalignant lesions (OPLs), the precursors of oral squamous cell carcinomas (OSCCs), may represent key changes in disease initiation and development. We ask if DNA amplification occurs at this early stage of cancer development and which oncogenic pathways are disrupted in OPLs. Here, we evaluated 50 high-grade dysplasias and low-grade dysplasias that later progressed to cancer for gene dosage aberrations using tiling-path DNA microarrays. Early occurrences of DNA amplification and homozygous deletion were frequently detected, with 40% (20/50) of these early lesions exhibiting such features. Expression for 88 genes in 7 recurrent amplicons were evaluated in 5 independent head and neck cancer datasets, with 40 candidates found to be overexpressed relative to normal tissues. These genes were significantly enriched in the canonical ERK/MAPK, FGF, p53, PTEN and PI3K/AKT signaling pathways (p = 8.95 x 10(-3) to 3.18 x 10(-2)). These identified pathways share interactions in one signaling network, and amplification-mediated deregulation of this network was found in 30.0% of these preinvasive lesions. No such alterations were found in 14 low-grade dysplasias that did not progress, whereas 43.5% (10/23) of OSCCs were found to have altered genes within the pathways with DNA amplification. Multitarget FISH showed that amplification of EGFR and CCND1 can coexist in single cells of an oral dysplasia, suggesting the dependence on multiple oncogenes for OPL progression. Taken together, these findings identify a critical biological network that is frequently disrupted in high-risk OPLs, with different specific genes disrupted in different individuals.
Collapse
Affiliation(s)
- Ivy F L Tsui
- Department of Cancer Genetics and Developmental Biology, British Columbia Cancer Research Centre, Vancouver, BC, Canada.
| | | | | | | | | | | |
Collapse
|
17
|
Alvegård T, Hall KS, Bauer H, Rydholm A. The Scandinavian Sarcoma Group: 30 years' experience. ACTA ORTHOPAEDICA. SUPPLEMENTUM 2009; 80:1-104. [PMID: 19919379 DOI: 10.1080/17453690610046602] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
18
|
Marttinen P, Myllykangas S, Corander J. Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics 2009; 10:90. [PMID: 19296858 PMCID: PMC2679022 DOI: 10.1186/1471-2105-10-90] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 03/18/2009] [Indexed: 01/14/2023] Open
Abstract
Background The versatility of DNA copy number amplifications for profiling and categorization of various tissue samples has been widely acknowledged in the biomedical literature. For instance, this type of measurement techniques provides possibilities for exploring sets of cancerous tissues to identify novel subtypes. The previously utilized statistical approaches to various kinds of analyses include traditional algorithmic techniques for clustering and dimension reduction, such as independent and principal component analyses, hierarchical clustering, as well as model-based clustering using maximum likelihood estimation for latent class models. Results While purely algorithmic methods are usually easily applicable, their suboptimal performance and limitations in making formal inference have been thoroughly discussed in the statistical literature. Here we introduce a Bayesian model-based approach to simultaneous identification of underlying tissue groups and the informative amplifications. The model-based approach provides the possibility of using formal inference to determine the number of groups from the data, in contrast to the ad hoc methods often exploited for similar purposes. The model also automatically recognizes the chromosomal areas that are relevant for the clustering. Conclusion Validatory analyses of simulated data and a large database of DNA copy number amplifications in human neoplasms are used to illustrate the potential of our approach. Our software implementation BASTA for performing Bayesian statistical tissue profiling is freely available for academic purposes at
Collapse
Affiliation(s)
- Pekka Marttinen
- Department of Mathematics and Statistics, University of Helsinki, Finland.
| | | | | |
Collapse
|