1
|
Tiong KL, Sintupisut N, Lin MC, Cheng CH, Woolston A, Lin CH, Ho M, Lin YW, Padakanti S, Yeang CH. An integrated analysis of the cancer genome atlas data discovers a hierarchical association structure across thirty three cancer types. PLOS Digit Health 2022; 1:e0000151. [PMID: 36812605 PMCID: PMC9931374 DOI: 10.1371/journal.pdig.0000151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/31/2022] [Indexed: 06/18/2023]
Abstract
Cancer cells harbor molecular alterations at all levels of information processing. Genomic/epigenomic and transcriptomic alterations are inter-related between genes, within and across cancer types and may affect clinical phenotypes. Despite the abundant prior studies of integrating cancer multi-omics data, none of them organizes these associations in a hierarchical structure and validates the discoveries in extensive external data. We infer this Integrated Hierarchical Association Structure (IHAS) from the complete data of The Cancer Genome Atlas (TCGA) and compile a compendium of cancer multi-omics associations. Intriguingly, diverse alterations on genomes/epigenomes from multiple cancer types impact transcriptions of 18 Gene Groups. Half of them are further reduced to three Meta Gene Groups enriched with (1) immune and inflammatory responses, (2) embryonic development and neurogenesis, (3) cell cycle process and DNA repair. Over 80% of the clinical/molecular phenotypes reported in TCGA are aligned with the combinatorial expressions of Meta Gene Groups, Gene Groups, and other IHAS subunits. Furthermore, IHAS derived from TCGA is validated in more than 300 external datasets including multi-omics measurements and cellular responses upon drug treatments and gene perturbations in tumors, cancer cell lines, and normal tissues. To sum up, IHAS stratifies patients in terms of molecular signatures of its subunits, selects targeted genes or drugs for precision cancer therapy, and demonstrates that associations between survival times and transcriptional biomarkers may vary with cancer types. These rich information is critical for diagnosis and treatments of cancers.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Nardnisa Sintupisut
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Min-Chin Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- Psomagen, Rockville, Maryland, United States of America
| | - Chih-Hung Cheng
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Andrew Woolston
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- Translational Cancer Immunotherapy & Genomics Lab, Barts Cancer Institute, Charterhouse Square, London, United Kingdom
| | - Chih-Hsu Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- C3.ai, Redwood City, California, United States of America
| | - Mirrian Ho
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Yu-Wei Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- AiLife Diagnostics, Pearland, Texas, United States of America
| | - Sridevi Padakanti
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| |
Collapse
|
2
|
Abstract
Protein complexes are the fundamental units of many biological functions. Despite their many advantages, one major adverse impact of protein complexes is accumulations of unassembled subunits that may disrupt other processes or exert cytotoxic effects. Synthesis of excess subunits can be inhibited via negative feedback control or they can be degraded more efficiently than assembled subunits, with this latter being termed cooperative stability. Whereas controlled synthesis of complex subunits has been investigated extensively, how cooperative stability acts in complex formation remains largely unexplored. To fill this knowledge gap, we have built quantitative models of heteromeric complexes with or without cooperative stability and compared their behaviours in the presence of synthesis rate variations. A system displaying cooperative stability is robust against synthesis rate variations as it retains high dimer/monomer ratios across a broad range of parameter configurations. Moreover, cooperative stability can alleviate the constraint of limited supply of a given subunit and makes complex abundance more responsive to unilateral upregulation of another subunit. We also conducted an in silico experiment to comprehensively characterize and compare four types of circuits that incorporate combinations of negative feedback control and cooperative stability in terms of eight systems characteristics pertaining to optimality, robustness and controllability. Intriguingly, though individual circuits prevailed for distinct characteristics, the system with cooperative stability alone achieved the most balanced performance across all characteristics. Our study provides theoretical justification for the contribution of cooperative stability to natural biological systems and represents a guideline for designing synthetic complex formation systems with desirable characteristics.
Collapse
Affiliation(s)
- Kuan-Lun Hsu
- Institute of Molecular Biology, Academia Sinica, 128 Academia Road, Section 2, Taipei, Taiwan
| | - Hsueh-Chi S Yen
- Institute of Molecular Biology, Academia Sinica, 128 Academia Road, Section 2, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Taipei, Taiwan.
| |
Collapse
|
3
|
Tiong KL, Lin YW, Yeang CH. Characterization of gene cluster heterogeneity in single-cell transcriptomic data within and across cancer types. Biol Open 2022; 11:275538. [PMID: 35665803 PMCID: PMC9235070 DOI: 10.1242/bio.059256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 05/19/2022] [Indexed: 11/20/2022] Open
Abstract
Despite the remarkable progress in probing tumor transcriptomic heterogeneity by single-cell RNA sequencing (sc-RNAseq) data, several gaps exist in prior studies. Tumor heterogeneity is frequently mentioned but not quantified. Clustering analyses typically target cells rather than genes, and differential levels of transcriptomic heterogeneity of gene clusters are not characterized. Relations between gene clusters inferred from multiple datasets remain less explored. We provided a series of quantitative methods to analyze cancer sc-RNAseq data. First, we proposed two quantitative measures to assess intra-tumoral heterogeneity/homogeneity. Second, we established a hierarchy of gene clusters from sc-RNAseq data, devised an algorithm to reduce the gene cluster hierarchy to a compact structure, and characterized the gene clusters with functional enrichment and heterogeneity. Third, we developed an algorithm to align the gene cluster hierarchies from multiple datasets to a small number of meta gene clusters. By applying these methods to nine cancer sc-RNAseq datasets, we discovered that cancer cell transcriptomes were more homogeneous within tumors than the accompanying normal cells. Furthermore, many gene clusters from the nine datasets were aligned to two large meta gene clusters, which had high and low heterogeneity and were enriched with distinct functions. Finally, we found the homogeneous meta gene cluster retained stronger expression coherence and associations with survival times in bulk level RNAseq data than the heterogeneous meta gene cluster, yet the combinatorial expression patterns of breast cancer subtypes in bulk level data were not preserved in single-cell data. The inference outcomes derived from nine cancer sc-RNAseq datasets provide insights about the contributing factors for transcriptomic heterogeneity of cancer cells and complex relations between bulk level and single-cell RNAseq data. They demonstrate the utility of our methods to enable a comprehensive characterization of co-expressed gene clusters in a wide range of sc-RNAseq data in cancers and beyond. Summary: We propose quantitative methods to analyze cancer sc-RNAseq data: measures of intra-tumoral heterogeneity, characterization of a hierarchy of gene clusters, and alignment of gene cluster hierarchies from multiple datasets.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Taipei 115, Taiwan
| | - Yu-Wei Lin
- Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Taipei 115, Taiwan.,The University of Texas MD Anderson Cancer Center, School of Health Profession, Master Program of Diagnostic Genetics, Houston, Texas, 77030, USA
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Taipei 115, Taiwan
| |
Collapse
|
4
|
Yeh CW, Huang WC, Hsu PH, Yeh KH, Wang LC, Hsu PWC, Lin HC, Chen YN, Chen SC, Yeang CH, Yen HCS. The C-degron pathway eliminates mislocalized proteins and products of deubiquitinating enzymes. EMBO J 2021; 40:e105846. [PMID: 33469951 PMCID: PMC8013793 DOI: 10.15252/embj.2020105846] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/11/2020] [Accepted: 12/15/2020] [Indexed: 01/22/2023] Open
Abstract
Protein termini are determinants of protein stability. Proteins bearing degradation signals, or degrons, at their amino‐ or carboxyl‐termini are eliminated by the N‐ or C‐degron pathways, respectively. We aimed to elucidate the function of C‐degron pathways and to unveil how normal proteomes are exempt from C‐degron pathway‐mediated destruction. Our data reveal that C‐degron pathways remove mislocalized cellular proteins and cleavage products of deubiquitinating enzymes. Furthermore, the C‐degron and N‐degron pathways cooperate in protein removal. Proteome analysis revealed a shortfall in normal proteins targeted by C‐degron pathways, but not of defective proteins, suggesting proteolysis‐based immunity as a constraint for protein evolution/selection. Our work highlights the importance of protein termini for protein quality surveillance, and the relationship between the functional proteome and protein degradation pathways.
Collapse
Affiliation(s)
- Chi-Wei Yeh
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Wei-Chieh Huang
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Pang-Hung Hsu
- Department of Life Science, Institute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan
| | - Kun-Hai Yeh
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Li-Chin Wang
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.,Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| | | | - Hsiu-Chuan Lin
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.,Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| | - Yi-Ning Chen
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Shu-Chuan Chen
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan.,Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Hsueh-Chi S Yen
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.,Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| |
Collapse
|
5
|
Dai CL, Vazifeh MM, Yeang CH, Tachet R, Wells RS, Vilar MG, Daly MJ, Ratti C, Martin AR. Population Histories of the United States Revealed through Fine-Scale Migration and Haplotype Analysis. Am J Hum Genet 2020; 106:371-388. [PMID: 32142644 PMCID: PMC7058830 DOI: 10.1016/j.ajhg.2020.02.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 02/05/2020] [Indexed: 12/11/2022] Open
Abstract
The population of the United States is shaped by centuries of migration, isolation, growth, and admixture between ancestors of global origins. Here, we assemble a comprehensive view of recent population history by studying the ancestry and population structure of more than 32,000 individuals in the US using genetic, ancestral birth origin, and geographic data from the National Geographic Genographic Project. We identify migration routes and barriers that reflect historical demographic events. We also uncover the spatial patterns of relatedness in subpopulations through the combination of haplotype clustering, ancestral birth origin analysis, and local ancestry inference. Examples of these patterns include substantial substructure and heterogeneity in Hispanics/Latinos, isolation-by-distance in African Americans, elevated levels of relatedness and homozygosity in Asian immigrants, and fine-scale structure in European descents. Taken together, our results provide detailed insights into the genetic structure and demographic history of the diverse US population.
Collapse
Affiliation(s)
- Chengzhen L Dai
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mohammad M Vazifeh
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan
| | - Remi Tachet
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | - Miguel G Vilar
- Genographic Project, National Geographic Society, Washington, DC 20036, USA
| | - Mark J Daly
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Carlo Ratti
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
| |
Collapse
|
6
|
Simak M, Yeang CH, Lu HHS. Correction: Exploring candidate biological functions by Boolean Function Networks for Saccharomyces cerevisiae. PLoS One 2019; 14:e0221703. [PMID: 31437254 PMCID: PMC6706051 DOI: 10.1371/journal.pone.0221703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
[This corrects the article DOI: 10.1371/journal.pone.0185475.].
Collapse
|
7
|
Akhmetzhanov AR, Kim JW, Sullivan R, Beckman RA, Tamayo P, Yeang CH. Modelling bistable tumour population dynamics to design effective treatment strategies. J Theor Biol 2019; 474:88-102. [DOI: 10.1016/j.jtbi.2019.05.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Revised: 05/05/2019] [Accepted: 05/07/2019] [Indexed: 12/16/2022]
|
8
|
Kim JW, Abudayyeh OO, Yeerna H, Yeang CH, Stewart M, Jenkins RW, Kitajima S, Konieczkowski DJ, Medetgul-Ernar K, Cavazos T, Mah C, Ting S, Van Allen EM, Cohen O, Mcdermott J, Damato E, Aguirre AJ, Liang J, Liberzon A, Alexe G, Doench J, Ghandi M, Vazquez F, Weir BA, Tsherniak A, Subramanian A, Meneses-Cime K, Park J, Clemons P, Garraway LA, Thomas D, Boehm JS, Barbie DA, Hahn WC, Mesirov JP, Tamayo P. Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States. Cell Syst 2019; 5:105-118.e9. [PMID: 28837809 DOI: 10.1016/j.cels.2017.08.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 05/01/2017] [Accepted: 08/03/2017] [Indexed: 12/13/2022]
Abstract
The systematic sequencing of the cancer genome has led to the identification of numerous genetic alterations in cancer. However, a deeper understanding of the functional consequences of these alterations is necessary to guide appropriate therapeutic strategies. Here, we describe Onco-GPS (OncoGenic Positioning System), a data-driven analysis framework to organize individual tumor samples with shared oncogenic alterations onto a reference map defined by their underlying cellular states. We applied the methodology to the RAS pathway and identified nine distinct components that reflect transcriptional activities downstream of RAS and defined several functional states associated with patterns of transcriptional component activation that associates with genomic hallmarks and response to genetic and pharmacological perturbations. These results show that the Onco-GPS is an effective approach to explore the complex landscape of oncogenic cellular states across cancers, and an analytic framework to summarize knowledge, establish relationships, and generate more effective disease models for research or as part of individualized precision medicine paradigms.
Collapse
Affiliation(s)
- Jong Wook Kim
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Omar O Abudayyeh
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Huwate Yeerna
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Chen-Hsiang Yeang
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Michelle Stewart
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Russell W Jenkins
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Shunsuke Kitajima
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - David J Konieczkowski
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Harvard Radiation Oncology Program, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kate Medetgul-Ernar
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Taylor Cavazos
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Clarence Mah
- School of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Stephanie Ting
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Eliezer M Van Allen
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Ofir Cohen
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - John Mcdermott
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Emily Damato
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Andrew J Aguirre
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Jonathan Liang
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Arthur Liberzon
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Gabriella Alexe
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | - John Doench
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Mahmoud Ghandi
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Francisca Vazquez
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Barbara A Weir
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Aviad Tsherniak
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Aravind Subramanian
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Karina Meneses-Cime
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Jason Park
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Paul Clemons
- Center for the Science of Therapeutics, Broad Institute, Cambridge, MA 02142, USA
| | - Levi A Garraway
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - David Thomas
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Jesse S Boehm
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - David A Barbie
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - William C Hahn
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA; Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jill P Mesirov
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; School of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA
| | - Pablo Tamayo
- Cancer Program, Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; School of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center, University of California San Diego, La Jolla, CA 92103, USA.
| |
Collapse
|
9
|
Abstract
BACKGROUND Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms. RESULTS We propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method. CONCLUSIONS We demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
10
|
Simak M, Yeang CH, Lu HHS. Exploring candidate biological functions by Boolean Function Networks for Saccharomyces cerevisiae. PLoS One 2017; 12:e0185475. [PMID: 28981547 PMCID: PMC5628832 DOI: 10.1371/journal.pone.0185475] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Accepted: 09/13/2017] [Indexed: 01/26/2023] Open
Abstract
The great amount of gene expression data has brought a big challenge for the discovery of Gene Regulatory Network (GRN). For network reconstruction and the investigation of regulatory relations, it is desirable to ensure directness of links between genes on a map, infer their directionality and explore candidate biological functions from high-throughput transcriptomic data. To address these problems, we introduce a Boolean Function Network (BFN) model based on techniques of hidden Markov model (HMM), likelihood ratio test and Boolean logic functions. BFN consists of two consecutive tests to establish links between pairs of genes and check their directness. We evaluate the performance of BFN through the application to S. cerevisiae time course data. BFN produces regulatory relations which show consistency with succession of cell cycle phases. Furthermore, it also improves sensitivity and specificity when compared with alternative methods of genetic network reverse engineering. Moreover, we demonstrate that BFN can provide proper resolution for GO enrichment of gene sets. Finally, the Boolean functions discovered by BFN can provide useful insights for the identification of control mechanisms of regulatory processes, which is the special advantage of the proposed approach. In combination with low computational complexity, BFN can serve as an efficient screening tool to reconstruct genes relations on the whole genome level. In addition, the BFN approach is also feasible to a wide range of time course datasets.
Collapse
Affiliation(s)
- Maria Simak
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
| | | | - Henry Horng-Shing Lu
- Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
- Big Data Research Center, National Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
11
|
Chen YF, Lin HC, Chuang KN, Lin CH, Yen HCS, Yeang CH. A quantitative model for the rate-limiting process of UGA alternative assignments to stop and selenocysteine codons. PLoS Comput Biol 2017; 13:e1005367. [PMID: 28178267 PMCID: PMC5323020 DOI: 10.1371/journal.pcbi.1005367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Revised: 02/23/2017] [Accepted: 01/18/2017] [Indexed: 12/20/2022] Open
Abstract
Ambiguity in genetic codes exists in cases where certain stop codons are alternatively used to encode non-canonical amino acids. In selenoprotein transcripts, the UGA codon may either represent a translation termination signal or a selenocysteine (Sec) codon. Translating UGA to Sec requires selenium and specialized Sec incorporation machinery such as the interaction between the SECIS element and SBP2 protein, but how these factors quantitatively affect alternative assignments of UGA has not been fully investigated. We developed a model simulating the UGA decoding process. Our model is based on the following assumptions: (1) charged Sec-specific tRNAs (Sec-tRNASec) and release factors compete for a UGA site, (2) Sec-tRNASec abundance is limited by the concentrations of selenium and Sec-specific tRNA (tRNASec) precursors, and (3) all synthesis reactions follow first-order kinetics. We demonstrated that this model captured two prominent characteristics observed from experimental data. First, UGA to Sec decoding increases with elevated selenium availability, but saturates under high selenium supply. Second, the efficiency of Sec incorporation is reduced with increasing selenoprotein synthesis. We measured the expressions of four selenoprotein constructs and estimated their model parameters. Their inferred Sec incorporation efficiencies did not correlate well with their SECIS-SBP2 binding affinities, suggesting the existence of additional factors determining the hierarchy of selenoprotein synthesis under selenium deficiency. This model provides a framework to systematically study the interplay of factors affecting the dual definitions of a genetic codon. The “code book” of protein translation maps 43 = 64 triplets of RNA sequences (codons) into 20 canonical amino acids and the stop signal. This code book is universal in almost all organisms on earth. Selenoproteins consist of selenium-containing amino acids–selenocysteines (Sec)–that are not among the 20 canonical amino acids. The cells “borrow” a stop codon UGA to translate selenocysteines. Since UGA maps to two possible outcomes, the translation machinery can synthesize both full-length selenoproteins (when UGA encodes selenocysteine) and truncated peptide chains (when UGA encodes translational termination). Despite extensive study about selenoprotein synthesis mechanisms, a quantitative model for how cells allocate resources to synthesize each species is yet to appear. We propose a quantitative model that can explain the dependency of experimental observables such as protein stability and Sec incorporation efficiency by various factors such as selenium concentration and mRNA levels. Saturation of those quantities implies the existence of limiting factors such as mRNA transcripts and Sec-specific tRNAs. The match between model simulations and experimental data suggests that the cellular decision making of synthesizing the two species of proteins may follow simple first-order kinetics.
Collapse
Affiliation(s)
- Yen-Fu Chen
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Hsiu-Chuan Lin
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| | - Kai-Neng Chuang
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| | - Chih-Hsu Lin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Hsueh-Chi S. Yen
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
- * E-mail: (HCSY); (CHY)
| | - Chen-Hsiang Yeang
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (HCSY); (CHY)
| |
Collapse
|
12
|
Abstract
Background Current cancer precision medicine strategies match therapies to static consensus molecular properties of an individual’s cancer, thus determining the next therapeutic maneuver. These strategies typically maintain a constant treatment while the cancer is not worsening. However, cancers feature complicated sub-clonal structure and dynamic evolution. We have recently shown, in a comprehensive simulation of two non-cross resistant therapies across a broad parameter space representing realistic tumors, that substantial improvement in cure rates and median survival can be obtained utilizing dynamic precision medicine strategies. These dynamic strategies explicitly consider intratumoral heterogeneity and evolutionary dynamics, including predicted future drug resistance states, and reevaluate optimal therapy every 45 days. However, the optimization is performed in single 45 day steps (“single-step optimization”). Results Herein we evaluate analogous strategies that think multiple therapeutic maneuvers ahead, considering potential outcomes at 5 steps ahead (“multi-step optimization”) or 40 steps ahead (“adaptive long term optimization (ALTO)”) when recommending the optimal therapy in each 45 day block, in simulations involving both 2 and 3 non-cross resistant therapies. We also evaluate an ALTO approach for situations where simultaneous combination therapy is not feasible (“Adaptive long term optimization: serial monotherapy only (ALTO-SMO)”). Simulations utilize populations of 764,000 and 1,700,000 virtual patients for 2 and 3 drug cases, respectively. Each virtual patient represents a unique clinical presentation including sizes of major and minor tumor subclones, growth rates, evolution rates, and drug sensitivities. While multi-step optimization and ALTO provide no significant average survival benefit, cure rates are significantly increased by ALTO. Furthermore, in the subset of individual virtual patients demonstrating clinically significant difference in outcome between approaches, by far the majority show an advantage of multi-step or ALTO over single-step optimization. ALTO-SMO delivers cure rates superior or equal to those of single- or multi-step optimization, in 2 and 3 drug cases respectively. Conclusion In selected virtual patients incurable by dynamic precision medicine using single-step optimization, analogous strategies that “think ahead” can deliver long-term survival and cure without any disadvantage for non-responders. When therapies require dose reduction in combination (due to toxicity), optimal strategies feature complex patterns involving rapidly interleaved pulses of combinations and high dose monotherapy. Reviewers This article was reviewed by Wendy Cornell, Marek Kimmel, and Andrzej Swierniak. Wendy Cornell and Andrzej Swierniak are external reviewers (not members of the Biology Direct editorial board). Andrzej Swierniak was nominated by Marek Kimmel. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0153-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Robert A Beckman
- Departments of Oncology and of Biostatistics, Bioinformatics, and Biomathematics, Lombardi Comprehensive Cancer Center and Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
13
|
Woolston A, Sintupisut N, Lu TP, Lai LC, Tsai MH, Chuang EY, Yeang CH. Putative effectors for prognosis in lung adenocarcinoma are ethnic and gender specific. Oncotarget 2016; 6:19483-99. [PMID: 26160836 PMCID: PMC4637300 DOI: 10.18632/oncotarget.4287] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 06/09/2015] [Indexed: 01/13/2023] Open
Abstract
Lung adenocarcinoma possesses distinct patterns of EGFR/KRAS mutations between East Asian and Western, male and female patients. However, beyond the well-known EGFR/KRAS distinction, gender and ethnic specific molecular aberrations and their effects on prognosis remain largely unexplored. Association modules capture the dependency of an effector molecular aberration and target gene expressions. We established association modules from the copy number variation (CNV), DNA methylation and mRNA expression data of a Taiwanese female cohort. The inferred modules were validated in four external datasets of East Asian and Caucasian patients by examining the coherence of the target gene expressions and their associations with prognostic outcomes. Modules 1 (cis-acting effects with chromosome 7 CNV) and 3 (DNA methylations of UBIAD1 and VAV1) possessed significantly negative associations with survival times among two East Asian patient cohorts. Module 2 (cis-acting effects with chromosome 18 CNV) possessed significantly negative associations with survival times among the East Asian female subpopulation alone. By examining the genomic locations and functions of the target genes, we identified several putative effectors of the two cis-acting CNV modules: RAC1, EGFR, CDK5 and RALBP1. Furthermore, module 3 targets were enriched with genes involved in cell proliferation and division and hence were consistent with the negative associations with survival times. We demonstrated that association modules in lung adenocarcinoma with significant links of prognostic outcomes were ethnic and/or gender specific. This discovery has profound implications in diagnosis and treatment of lung adenocarcinoma and echoes the fundamental principles of the personalized medicine paradigm.
Collapse
Affiliation(s)
- Andrew Woolston
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | - Tzu-Pin Lu
- Department of Public Health, National Taiwan University, Taipei, Taiwan
| | - Liang-Chuan Lai
- Graduate Institute of Physiology, National Taiwan University, Taipei, Taiwan
| | - Mong-Hsun Tsai
- Institute of Biotechnology, National Taiwan University, Taipei, Taiwan
| | - Eric Y Chuang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | | |
Collapse
|
14
|
Beckman RA, Yeang CH. Abstract A1-47: Long-range personalized cancer treatment strategies incorporating evolutionary dynamics. Cancer Res 2015. [DOI: 10.1158/1538-7445.transcagen-a1-47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Current personalized cancer medicine tailors therapy to heterogeneity between cancers of the same organ type occurring within different individuals. However, it does not yet address heterogeneity at the single cell level within individual cancers or the dynamics of cancer, due to heritable genetic and epigenetic change, as well as transient functional changes.
We established computational methods for evaluating personalized medicine strategies, comparing the current personalized medicine strategy to alternatives. Current personalized medicine matches therapy to a tumor molecular profile at diagnosis and at tumor relapse or progression. This strategy focuses on the average, static, and current properties of the sample. Non-standard strategies also consider minor sub-clones, dynamics, and predicted future tumor states.
Previous simulation results in a system with two non-cross resistant agents using non-standard strategies, optimized in single 45 day blocks, demonstrated significantly improved outcomes (Beckman, Schemmann, and Yeang, 109: 14586-91, 2012). The current work explores the effect of long range planning with a time horizon of up to five years on the effectiveness of the strategies, as well as generalizing the model to three non-cross resistant agents..
Methods: We developed a mathematical model of cancer therapy with two non-cross resistant agents incorporating genetic evolutionary dynamics and single cell heterogeneity, and examined simulated clinical outcomes (cell numbers of clones and sub-clones, projected survival).
Previously we compared the current personalized medicine strategy to 5 alternative personalized strategies. The latter strategies explicitly considered sub-clones, evolutionary dynamics, and likely future sub-clones in addition to the current predominant clone. Particular emphasis was given to the prevention of incurable, multiply resistant sub-clones. The optimization was performed in single 45 day blocks.
In the current work, we extended the work to three drug systems. We also extended the previous single step heuristic strategies to encompass multistep heuristics of up to five 45 day steps, and global optimization in 45 day blocks over a 5 year time horizon, with strategic updates every 45 days. Branch and bound methods were used for pruning decision trees. Parallel processing (23 servers) facilitated computations.
Results: Previously we had carried out a computerized virtual clinical trial of over 3 million evaluable cancer “patients”, comparing current personalized medicine and 5 non-standard strategies. The 3 million virtual patients represented a comprehensive survey of likely population structures, growth rates, phenotypic transition rates (by heritable genetic or epigenetic mechanisms), and drug sensitivities. All alternatives tested resulted in an approximate doubling in mean and median survival compared to current personalized medicine and an increase in the apparent cure rate from 0.7% for current personalized medicine to 17-20% for alternatives. In no case was the current personalized medicine strategy superior.
In the current work, we found in large simulations that planning ahead led to further increases in cure rates and analyzed several examples where highly complex treatment sequences led to cures which were not possible with single step 45 day optimization. Further, previous and current conclusions applied equally for three non-cross resistant agents.
Conclusions: Explicit consideration of intratumoral heterogeneity and evolutionary dynamics, with probabilistic consideration of future outcomes with a long strategy horizon, can potentially lead to markedly improved patient outcomes, including cure rates. Application of knowledge from growing molecular and clinical oncology databases may allow more informative therapeutic simulations than previously possible.
Citation Format: Robert A. Beckman, Chen-Hsiang Yeang. Long-range personalized cancer treatment strategies incorporating evolutionary dynamics. [abstract]. In: Proceedings of the AACR Special Conference on Translation of the Cancer Genome; Feb 7-9, 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 1):Abstract nr A1-47.
Collapse
Affiliation(s)
- Robert A. Beckman
- 1Lombardi Cancer Center, Georgetown University Medical Center, Washington, DC,
| | | |
Collapse
|
15
|
Beckman RA, Yeang CH. Abstract B2-52: Long range personalized cancer treatment strategies incorporating evolutionary dynamics. Cancer Res 2015. [DOI: 10.1158/1538-7445.compsysbio-b2-52] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Current personalized cancer medicine tailors therapy to heterogeneity between cancers of the same organ type occurring within different individuals. However, it does not yet address heterogeneity at the single cell level within individual cancers or the dynamics of cancer, due to heritable genetic and epigenetic change, as well as transient functional changes.
We established computational methods for evaluating personalized medicine strategies, comparing the current personalized medicine strategy to alternatives. Current personalized medicine matches therapy to a tumor molecular profile at diagnosis and at tumor relapse or progression. This strategy focuses on the average, static, and current properties of the sample. Non-standard strategies also consider minor sub-clones, dynamics, and predicted future tumor states.
Previous simulation results in a system with two non-cross resistant agents using non-standard strategies, optimized in single 45 day blocks, demonstrated significantly improved outcomes (Beckman, Schemmann, and Yeang, 109: 14586-91, 2012). The current work explores the effect of long range planning with a time horizon of up to five years on the effectiveness of the strategies, as well as generalizing the model to three non-cross resistant agents..
Methods: We developed a mathematical model of cancer therapy with two non-cross resistant agents incorporating genetic evolutionary dynamics and single cell heterogeneity, and examined simulated clinical outcomes (cell numbers of clones and sub-clones, projected survival).
Previously we compared the current personalized medicine strategy to 5 alternative personalized strategies. The latter strategies explicitly considered sub-clones, evolutionary dynamics, and likely future sub-clones in addition to the current predominant clone. Particular emphasis was given to the prevention of incurable, multiply resistant sub-clones. The optimization was performed in single 45 day blocks.
In the current work, we extended the work to three drug systems. We also extended the previous single step heuristic strategies to encompass multistep heuristics of up to five 45 day steps, and global optimization in 45 day blocks over a 5 year time horizon, with strategic updates every 45 days. Branch and bound methods were used for pruning decision trees. Parallel processing (23 servers) facilitated computations.
Results: Previously we had carried out a computerized virtual clinical trial of over 3 million evaluable cancer “patients”, comparing current personalized medicine and 5 non-standard strategies. The 3 million virtual patients represented a comprehensive survey of likely population structures, growth rates, phenotypic transition rates (by heritable genetic or epigenetic mechanisms), and drug sensitivities. All alternatives tested resulted in an approximate doubling in mean and median survival compared to current personalized medicine and an increase in the apparent cure rate from 0.7% for current personalized medicine to 17-20% for alternatives. In no case was the current personalized medicine strategy superior.
In the current work, we found in large simulations that planning ahead led to further increases in cure rates and analyzed several examples where highly complex treatment sequences led to cures which were not possible with single step 45 day optimization. Further, previous and current conclusions applied equally for three non-cross resistant agents.
Conclusions: Explicit consideration of intratumoral heterogeneity and evolutionary dynamics, with probabilistic consideration of future outcomes with a long strategy horizon, can potentially lead to markedly improved patient outcomes, including cure rates. Application of knowledge from growing molecular and clinical oncology databases may allow more informative therapeutic simulations than previously possible.
Citation Format: Robert A. Beckman, Chen-Hsiang Yeang. Long range personalized cancer treatment strategies incorporating evolutionary dynamics. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B2-52.
Collapse
|
16
|
Abstract
Cancer is an evolutionary process that is driven by mutation and selection. Tumors are genetically unstable, and research has shown that this is the most efficient way for cancers to evolve. Genetic instability leads to genetic heterogeneity and dynamic change within a single individual's tumor, in turn leading to therapeutic resistance. Cancer treatment has also evolved from an empirical science of killing dividing cells to the current era of 'personalized medicine', exquisitely targeting the molecular features of individual cancers. However, current personalized medicine regards a single individual's cancer as largely uniform and static. Moreover, from a strategic perspective, current personalized medicine thinks primarily of the immediate therapy selection. Ongoing research suggests that new, nonstandard personalized treatment strategies that plan further ahead and consider intratumoral heterogeneity and the evolving nature of cancer (due to genetic instability) may lead to the next level of therapeutic benefit beyond current personalized medicine.
Collapse
Affiliation(s)
- Robert A Beckman
- Center for Evolution & Cancer, Helen Diller Family Cancer Center, University of California at San Francisco, San Francisco, CA, USA
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, Taiwan
| |
Collapse
|
17
|
Yeang CH, Ma GC, Hsu HW, Lin YS, Chang SM, Cheng PJ, Chen CA, Ni YH, Chen M. Genome-wide normalized score: a novel algorithm to detect fetal trisomy 21 during non-invasive prenatal testing. Ultrasound Obstet Gynecol 2014; 44:25-30. [PMID: 24700679 DOI: 10.1002/uog.13377] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Revised: 03/19/2014] [Accepted: 03/26/2014] [Indexed: 06/03/2023]
Abstract
OBJECTIVES Non-invasive prenatal testing for fetal trisomy 21 (T21) by massively parallel shotgun sequencing (MPSS) is available for clinical use but its efficacy is limited by several factors, e.g. the proportion of cell-free fetal DNA in maternal plasma and sequencing depth. Existing algorithms discard DNA reads from the chromosomes for which testing is not being performed (i.e. those other than chromosome 21) and are thus more susceptible to diluted fetal DNA and limited sequencing depth. We aimed to describe and evaluate a novel algorithm for aneuploidy detection (genome-wide normalized score (GWNS)), which normalizes read counts by the proportions of DNA fragments from chromosome 21 in normal controls. METHODS We assessed the GWNS approach by comparison with two existing algorithms, i.e. Z-score and normalized chromosome value (NCV), using theoretical approximations and computer simulations in a set of 86 cases (64 euploid and 22 T21 cases). We then validated GWNS by studying an expanded set of clinical samples (n = 208). Finally, dilution experiments were undertaken to compare performance of the three algorithms (Z-score, NCV, GWNS) when fetal DNA concentration was low. RESULTS At fixed levels of significance and power, GWNS required a smaller fetal DNA proportion and fewer total MPSS reads compared to Z-score or NCV. In dilution experiments, GWNS also outperformed the other two methods by reaching the correct diagnosis with the lowest range of fetal DNA concentrations (GWNS, 3.83-4.75%; Z-score, 4.75-5.22%; NCV, 6.47-8.58%). CONCLUSION Our results demonstrate that GWNS is comparable to Z-score and NCV methods regarding the performance of detecting fetal T21. Dilution experiments suggest that GWNS may perform better than the other methods when fetal fraction is low.
Collapse
Affiliation(s)
- C H Yeang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Lin IY, Chiu FL, Yeang CH, Chen HF, Chuang CY, Yang SY, Hou PS, Sintupisut N, Ho HN, Kuo HC, Lin KI. Suppression of the SOX2 neural effector gene by PRDM1 promotes human germ cell fate in embryonic stem cells. Stem Cell Reports 2014; 2:189-204. [PMID: 24527393 PMCID: PMC3923219 DOI: 10.1016/j.stemcr.2013.12.009] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 12/10/2013] [Accepted: 12/16/2013] [Indexed: 01/02/2023] Open
Abstract
The mechanisms of transcriptional regulation underlying human primordial germ cell (PGC) differentiation are largely unknown. The transcriptional repressor Prdm1/Blimp-1 is known to play a critical role in controlling germ cell specification in mice. Here, we show that PRDM1 is expressed in developing human gonads and contributes to the determination of germline versus neural fate in early development. We show that knockdown of PRDM1 in human embryonic stem cells (hESCs) impairs germline potential and upregulates neural genes. Conversely, ectopic expression of PRDM1 in hESCs promotes the generation of cells that exhibit phenotypic and transcriptomic features of early PGCs. Furthermore, PRDM1 suppresses transcription of SOX2. Overexpression of SOX2 in hESCs under conditions favoring germline differentiation skews cell fate from the germline to the neural lineage. Collectively, our results demonstrate that PRDM1 serves as a molecular switch to modulate the divergence of neural or germline fates through repression of SOX2 during human development. PRDM1 serves as a molecular switch that determines neural or germline fate in human PRDM1 is expressed in early germ cells of developing human fetal gonads PRDM1 is expressed in hESC-derived germ cells and directs germline differentiation Germline differentiation in hESC requires PRDM1-mediated suppression of SOX2
Collapse
Affiliation(s)
- I-Ying Lin
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
- Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 112, Taiwan
| | - Feng-Lan Chiu
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan
| | - Hsin-Fu Chen
- Graduate Institute of Clinical Genomics, College of Medicine, National Taiwan University, Taipei 106, Taiwan
- Department of Obstetrics and Gynecology, Division of Reproductive Endocrinology and Infertility, National Taiwan University Hospital, Taipei 100, Taiwan
| | - Ching-Yu Chuang
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan
| | - Shii-Yi Yang
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Pei-Shan Hou
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan
| | | | - Hong-Nerng Ho
- Graduate Institute of Clinical Genomics, College of Medicine, National Taiwan University, Taipei 106, Taiwan
- Department of Obstetrics and Gynecology, Division of Reproductive Endocrinology and Infertility, National Taiwan University Hospital, Taipei 100, Taiwan
| | - Hung-Chih Kuo
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan
- Corresponding author
| | - Kuo-I Lin
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
- Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 112, Taiwan
- Corresponding author
| |
Collapse
|
19
|
Abstract
Glioblastoma multiforme (GBM) is the most common and malignant primary brain tumor in adults. Decades of investigations and the recent effort of the Cancer Genome Atlas (TCGA) project have mapped many molecular alterations in GBM cells. Alterations on DNAs may dysregulate gene expressions and drive malignancy of tumors. It is thus important to uncover causal and statistical dependency between ‘effector’ molecular aberrations and ‘target’ gene expressions in GBMs. A rich collection of prior studies attempted to combine copy number variation (CNV) and mRNA expression data. However, systematic methods to integrate multiple types of cancer genomic data—gene mutations, single nucleotide polymorphisms, CNVs, DNA methylations, mRNA and microRNA expressions and clinical information—are relatively scarce. We proposed an algorithm to build ‘association modules’ linking effector molecular aberrations and target gene expressions and applied the module-finding algorithm to the integrated TCGA GBM data sets. The inferred association modules were validated by six tests using external information and datasets of central nervous system tumors: (i) indication of prognostic effects among patients; (ii) coherence of target gene expressions; (iii) retention of effector–target associations in external data sets; (iv) recurrence of effector molecular aberrations in GBM; (v) functional enrichment of target genes; and (vi) co-citations between effectors and targets. Modules associated with well-known molecular aberrations of GBM—such as chromosome 7 amplifications, chromosome 10 deletions, EGFR and NF1 mutations—passed the majority of the validation tests. Furthermore, several modules associated with less well-reported molecular aberrations—such as chromosome 11 CNVs, CD40, PLXNB1 and GSTM1 methylations, and mir-21 expressions—were also validated by external information. In particular, modules constituting trans-acting effects with chromosome 11 CNVs and cis-acting effects with chromosome 10 CNVs manifested strong negative and positive associations with survival times in brain tumors. By aligning the information of association modules with the established GBM subclasses based on transcription or methylation levels, we found each subclass possessed multiple concurrent molecular aberrations. Furthermore, the joint molecular characteristics derived from 16 association modules had prognostic power not explained away by the strong biomarker of CpG island methylator phenotypes. Functional and survival analyses indicated that immune/inflammatory responses and epithelial-mesenchymal transitions were among the most important determining processes of prognosis. Finally, we demonstrated that certain molecular aberrations uniquely recurred in GBM but were relatively rare in non-GBM glioma cells. These results justify the utility of an integrative analysis on cancer genomes and provide testable characterizations of driver aberration events in GBM.
Collapse
Affiliation(s)
- Nardnisa Sintupisut
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC and Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC
| | | | | |
Collapse
|
20
|
Chen DH, Chang AYF, Liao BY, Yeang CH. Functional characterization of motif sequences under purifying selection. Nucleic Acids Res 2013; 41:2105-20. [PMID: 23303791 PMCID: PMC3575792 DOI: 10.1093/nar/gks1456] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Revised: 12/13/2012] [Accepted: 12/13/2012] [Indexed: 11/14/2022] Open
Abstract
Diverse life forms are driven by the evolution of gene regulatory programs including changes in regulator proteins and cis-regulatory elements. Alterations of cis-regulatory elements are likely to dominate the evolution of the gene regulatory networks, as they are subjected to smaller selective constraints compared with proteins and hence may evolve quickly to adapt the environment. Prior studies on cis-regulatory element evolution focus primarily on sequence substitutions of known transcription factor-binding motifs. However, evolutionary models for the dynamics of motif occurrence are relatively rare, and comprehensive characterization of the evolution of all possible motif sequences has not been pursued. In the present study, we propose an algorithm to estimate the strength of purifying selection of a motif sequence based on an evolutionary model capturing the birth and death of motif occurrences on promoters. We term this measure as the 'evolutionary retention coefficient', as it is related yet distinct from the canonical definition of selection coefficient in population genetics. Using this algorithm, we estimate and report the evolutionary retention coefficients of all possible 10-nucleotide sequences from the aligned promoter sequences of 27 748. orthologous gene families in 34 mammalian species. Intriguingly, the evolutionary retention coefficients of motifs are intimately associated with their functional relevance. Top-ranking motifs (sorted by evolutionary retention coefficients) are significantly enriched with transcription factor-binding sequences according to the curated knowledge from the TRANSFAC database and the ChIP-seq data generated from the ENCODE Consortium. Moreover, genes harbouring high-scoring motifs on their promoters retain significantly coherent expression profiles, and those genes are over-represented in the functional classes involved in gene regulation. The validation results reveal the dependencies between natural selection and functions of cis-regulatory elements and shed light on the evolution of gene regulatory networks.
Collapse
Affiliation(s)
- De-Hua Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC and Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, ROC
| | - Andrew Ying-Fei Chang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC and Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, ROC
| | - Ben-Yang Liao
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC and Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, ROC
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC and Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, ROC
| |
Collapse
|
21
|
Abstract
Domain architectures and catalytic functions of enzymes constitute the centerpieces of a metabolic network. These types of information are formulated as a two-layered network consisting of domains, proteins, and reactions-a domain-protein-reaction (DPR) network. We propose an algorithm to reconstruct the evolutionary history of DPR networks across multiple species and categorize the mechanisms of metabolic systems evolution in terms of network changes. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Although the evolutionary mechanisms in early ancestors of prokaryotes and eukaryotes are quite similar, more novel and duplicated domain compositions with identical catalytic functions arise along the eukaryotic lineage. In contrast, prokaryotic enzymes become more versatile by catalyzing multiple reactions with similar chemical operations. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. For instance, although the pathways of steroid biosynthesis, protein kinases, and glycosaminoglycan biosynthesis all constitute prominent features of animal-specific physiology, their evolution of domain architectures and catalytic functions follows distinct patterns. Steroid biosynthesis is enriched with reaction creations but retains a relatively conserved repertoire of domain compositions and proteins. Protein kinases retain conserved reactions but possess many novel domains and proteins. In contrast, glycosaminoglycan biosynthesis has high rates of reaction/protein creations and domain recruitments. Finally, we elicit and validate two general principles underlying the evolution of DPR networks: 1) duplicated enzyme proteins possess similar catalytic functions and 2) the majority of novel domains arise to catalyze novel reactions. These results shed new lights on the evolution of metabolic systems.
Collapse
Affiliation(s)
- Summit Suen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | | |
Collapse
|
22
|
Beckman RA, Schemmann GS, Yeang CH. Abstract LB-448: Next generation personalized medicine strategies incorporating genetic dynamics and single cell heterogeneity may lead to improved outcomes. Cancer Res 2012. [DOI: 10.1158/1538-7445.am2012-lb-448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Cancers are heterogeneous and often genetically unstable. Current practice of personalized medicine tailors therapy to heterogeneity between cancers of the same organ type occurring within different individuals. However, it does not yet address heterogeneity at the single cell level within individual cancers or the dynamic nature of cancer, due to heritable genetic and epigenetic change, as well as transient functional changes. We established methods for evaluating personalized medicine strategies, and compared the current personalized medicine strategy to alternatives. Current personalized medicine matches therapy to a tumor molecular profile at diagnosis and at tumor relapse or progression. This strategy focuses on the average, static, and current properties of the sample. Next-generation strategies also consider minor sub-clones, dynamics, and predicted future tumor states. Methods: We developed a mathematical model of targeted cancer therapy incorporating genetic evolutionary dynamics and single cell heterogeneity, and examined simulated clinical outcomes (cell numbers of clones and sub-clones, projected survival). We compared the current personalized medicine strategy to 5 alternative personalized strategies. The latter strategies explicitly considered sub-clones, evolutionary dynamics, and likely future sub-clones in addition to the current predominant clone. Particular emphasis was given to the prevention of incurable, multiply resistant sub-clones. Results: We carried out a computerized virtual clinical trial of over 3 million evaluable cancer “patients,” comparing current personalized medicine and 5 alternative strategies. While the current personalized medicine strategy was equally effective to the alternatives in 2/3 of the cases, in 1/3 of the cases alternative strategies led to improved outcomes. All alternatives tested resulted in an approximate doubling in mean and median survival compared to current personalized medicine and an increase in the apparent cure rate from 0.7% for current personalized medicine to 17-20% for alternatives. In no case was the current personalized medicine strategy superior. Conclusions: These findings may lead to improved patient outcomes. Further, they suggest global enhancements to translational oncology research paradigms: for example, molecular characterization of incurable, multiply resistant “end states” from autopsy may be equally or more important than characterizing initial diagnostic states. We have developed methods to evaluate alternative personalized medicine strategies. Next generation strategies may consider sub-clones, evolutionary dynamics, and predicted future states. Application of knowledge from growing molecular and empirical oncology databases may allow more informative therapeutic simulations than previously possible.
Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr LB-448. doi:1538-7445.AM2012-LB-448
Collapse
Affiliation(s)
- Robert A. Beckman
- 1Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ
| | | | - Chen-Hsiang Yeang
- 1Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ
| |
Collapse
|
23
|
Li SD, Tagami T, Ho YF, Yeang CH. Deciphering causal and statistical relations of molecular aberrations and gene expressions in NCI-60 cell lines. BMC Syst Biol 2011; 5:186. [PMID: 22051105 PMCID: PMC3259106 DOI: 10.1186/1752-0509-5-186] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 11/04/2011] [Indexed: 12/02/2022]
Abstract
BACKGROUND Cancer cells harbor a large number of molecular alterations such as mutations, amplifications and deletions on DNA sequences and epigenetic changes on DNA methylations. These aberrations may dysregulate gene expressions, which in turn drive the malignancy of tumors. Deciphering the causal and statistical relations of molecular aberrations and gene expressions is critical for understanding the molecular mechanisms of clinical phenotypes. RESULTS In this work, we proposed a computational method to reconstruct association modules containing driver aberrations, passenger mRNA or microRNA expressions, and putative regulators that mediate the effects from drivers to passengers. By applying the module-finding algorithm to the integrated datasets of NCI-60 cancer cell lines, we found that gene expressions were driven by diverse molecular aberrations including chromosomal segments' copy number variations, gene mutations and DNA methylations, microRNA expressions, and the expressions of transcription factors. In-silico validation indicated that passenger genes were enriched with the regulator binding motifs, functional categories or pathways where the drivers were involved, and co-citations with the driver/regulator genes. Moreover, 6 of 11 predicted MYB targets were down-regulated in an MYB-siRNA treated leukemia cell line. In addition, microRNA expressions were driven by distinct mechanisms from mRNA expressions. CONCLUSIONS The results provide rich mechanistic information regarding molecular aberrations and gene expressions in cancer genomes. This kind of integrative analysis will become an important tool for the diagnosis and treatment of cancer in the era of personalized medicine.
Collapse
Affiliation(s)
- Shyh-Dar Li
- Ontario Institute for Cancer Research, 101 College Street, Toronto, Canada
| | | | - Ying-Fu Ho
- Institute of Statistical Science, Academia Sinica, Academia Road, Sec 2, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Academia Road, Sec 2, Taipei, Taiwan
| |
Collapse
|
24
|
Abstract
Metabolic reactions and gene regulation are two primary processes of cells. In response to environmental changes cells often adjust the regulatory programs and shift the metabolic states. An integrative investigation and modeling of these two processes would improve our understanding about the cellular systems and may generate substantial impacts in medicine, agriculture, environmental protection, and energy production. We review the studies of the various aspects of the crosstalk between metabolic reactions and gene regulation, including models, empirical evidence, and available databases.
Collapse
|
25
|
Abstract
Background Cancer is a complex disease where various types of molecular aberrations drive the development and progression of malignancies. Large-scale screenings of multiple types of molecular aberrations (e.g., mutations, copy number variations, DNA methylations, gene expressions) become increasingly important in the prognosis and study of cancer. Consequently, a computational model integrating multiple types of information is essential for the analysis of the comprehensive data. Results We propose an integrated modeling framework to identify the statistical and putative causal relations of various molecular aberrations and gene expressions in cancer. To reduce spurious associations among the massive number of probed features, we sequentially applied three layers of logistic regression models with increasing complexity and uncertainty regarding the possible mechanisms connecting molecular aberrations and gene expressions. Layer 1 models associate gene expressions with the molecular aberrations on the same loci. Layer 2 models associate expressions with the aberrations on different loci but have known mechanistic links. Layer 3 models associate expressions with nonlocal aberrations which have unknown mechanistic links. We applied the layered models to the integrated datasets of NCI-60 cancer cell lines and validated the results with large-scale statistical analysis. Furthermore, we discovered/reaffirmed the following prominent links: (1)Protein expressions are generally consistent with mRNA expressions. (2)Several gene expressions are modulated by composite local aberrations. For instance, CDKN2A expressions are repressed by either frame-shift mutations or DNA methylations. (3)Amplification of chromosome 6q in leukemia elevates the expression of MYB, and the downstream targets of MYB on other chromosomes are up-regulated accordingly. (4)Amplification of chromosome 3p and hypo-methylation of PAX3 together elevate MITF expression in melanoma, which up-regulates the downstream targets of MITF. (5)Mutations of TP53 are negatively associated with its direct target genes. Conclusions The analysis results on NCI-60 data justify the utility of the layered models for the incoming flow of cancer genomic data. Experimental validations on selected prominent links and application of the layered modeling framework to other integrated datasets will be carried out subsequently.
Collapse
|
26
|
Kanabar PN, Vaske CJ, Yeang CH, Yildiz FH, Stuart JM. Inferring disease-related pathways using a probabilistic epistasis model. Pac Symp Biocomput 2009:480-491. [PMID: 19209724 DOI: 10.1142/9789812836939_0046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
We present a probabilistic model called a Joint Intervention Network (JIN) for inferring interactions among a chosen set of regulator genes. The input to the method are expression changes of downstream indicator genes observed under the knock-out of the regulators. JIN can use any number of perturbation combinations for model inference (e.g. single, double, and triple knock-outs). RESUITS/CONCLUSIONS: We applied JIN to a Vibrio cholerae regulatory network to uncover mechanisms critical to its environmental persistence. V. cholerae is a facultative human pathogen that causes cholera in humans and responsible for seven pandemics. We analyzed the expression response of 17 V. cholerae biofilm indicator genes under various single and multiple knock-outs of three known biofilm regulators. Using the inferred network, we were able to identify new genes involved in biofilm formation more accurately than clustering expression profiles.
Collapse
Affiliation(s)
- P N Kanabar
- Department of Biomolecular Engineering, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA 95062, USA
| | | | | | | | | |
Collapse
|
27
|
Abstract
Many methods have been developed to detect coevolution from aligned sequences. However, all the existing methods require a one-to-one mapping of candidate coevolving partners (nucleotides, amino acids) a priori. When two families of sequences have distinct duplication and loss histories, finding the one-to-one mapping of coevolving partners can be computationally involved. We propose an algorithm to identify the coevolving partners from two families of sequences with distinct phylogenetic trees. The algorithm maps each gene tree to a reference species tree, and builds a joint state of sequence composition and assignments of coevolving partners for each species tree node. By applying dynamic programming on the joint states, the optimal assignments can be identified. Time complexity is quadratic to the size of the species tree, and space complexity is exponential to the maximum number of gene tree nodes mapped to the same species tree node. Analysis on both simulated data and Pfam protein domain sequences demonstrates that the paralog coevolution algorithm picks up the coevolving partners with 60% 88% accuracy. This algorithm extends phylogeny-based coevolutionary models and make them applicable to a wide range of problems such as predicting protein-protein, protein-DNA and DNA-RNA interactions of two distinct families of sequences.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540, U.S.A
| |
Collapse
|
28
|
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems BiologyInstitute for Advanced StudyPrincetonNew JerseyUSA
| | - Frank McCormick
- Helen Diller Family Comprehensive Cancer Center and Cancer Research Institute, University of CaliforniaSan FranciscoCaliforniaUSA
| | - Arnold Levine
- Simons Center for Systems BiologyInstitute for Advanced StudyPrincetonNew JerseyUSA
| |
Collapse
|
29
|
Abstract
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. The sequences of different components within and across genes often undergo coordinated changes in order to maintain the structures or functions of the genes. Identifying the coordinated changes—the “coevolution”—of those components in the context of evolution is important in predicting the structures, interactions, and functions of genes. The authors incur a large-scale screening on all the known protein sequences and build a compendium about the coevolving relations of all protein domains—subunits of proteins. The majority of the coevolving protein domains either belongs to the same proteins, appears in the same protein complexes, or shares the same functional annotations. Furthermore, coevolving positions in the same proteins or protein complexes are spatially coupled, as they tend to be closer than random positions in the 3-D structures of the proteins/protein complexes. More strikingly, many coevolving positions are located at functionally important sites of the molecules. The results provide useful insights about the relations between sequence evolution and protein structures and functions.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems Biology, Institute for Advanced Study, Princeton, New Jersey, United States of America.
| | | |
Collapse
|
30
|
Abstract
A probabilistic graphical model is proposed in order to detect the coevolution between different sites in biological sequences. The model extends the continuous-time Markov process of sequence substitution for single nucleic or amino acids and imposes general constraints regarding simultaneous changes on the substitution rate matrix. Given a multiple sequence alignment for each molecule of interest and a phylogenetic tree, the model can predict potential interactions within or between nucleic acids and proteins. Initial validation of the model is carried out using tRNA and 16S rRNA sequence data. The model accurately identifies the secondary interactions of tRNA as well as several known tertiary interactions. In addition, results on 16S rRNA data indicate this general and simple coevolutionary model outperforms several other parametric and nonparametric methods in predicting secondary interactions. Furthermore, the majority of the putative predictions exhibit either direct contact or proximity of the nucleotide pairs in the 3-dimensional structure of the Thermus thermophilus ribosomal small subunit. The results on RNA data suggest a general model of coevolution might be applied to other types of interactions between protein, DNA, and RNA molecules.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems Biology, Institute for Advanced Study, Princeton, New Jersey, USA.
| | | | | | | |
Collapse
|
31
|
Abstract
Background Gene regulation and metabolic reactions are two primary activities of life. Although many works have been dedicated to study each system, the coupling between them is less well understood. To bridge this gap, we propose a joint model of gene regulation and metabolic reactions. Results We integrate regulatory and metabolic networks by adding links specifying the feedback control from the substrates of metabolic reactions to enzyme gene expressions. We adopt two alternative approaches to build those links: inferring the links between metabolites and transcription factors to fit the data or explicitly encoding the general hypotheses of feedback control as links between metabolites and enzyme expressions. A perturbation data is explained by paths in the joint network if the predicted response along the paths is consistent with the observed response. The consistency requirement for explaining the perturbation data imposes constraints on the attributes in the network such as the functions of links and the activities of paths. We build a probabilistic graphical model over the attributes to specify these constraints, and apply an inference algorithm to identify the attribute values which optimally explain the data. The inferred models allow us to 1) identify the feedback links between metabolites and regulators and their functions, 2) identify the active paths responsible for relaying perturbation effects, 3) computationally test the general hypotheses pertaining to the feedback control of enzyme expressions, 4) evaluate the advantage of an integrated model over separate systems. Conclusion The modeling results provide insight about the mechanisms of the coupling between the two systems and possible "design rules" pertaining to enzyme gene regulation. The model can be used to investigate the less well-probed systems and generate consistent hypotheses and predictions for further validation.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Center for Biomolecular Science & Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Martin Vingron
- Max-Planck Institute for Molecular Genetics, 73 Ihnerstraße, Berlin, Germany
| |
Collapse
|
32
|
Abstract
A considerable fraction of gene promoters are bound by multiple transcription factors. It is therefore important to understand how such factors interact in regulating the genes. In this paper, we propose a computational method to identify groups of co-regulated genes and the corresponding regulatory programs of multiple transcription factors from protein- DNA binding and gene expression data. The key concept is to characterize a regulatory program in terms of two properties of individual transcription factors: the function of a regulator as an activator or a repressor, and its direction of effectiveness as necessary or sufficient. We apply a greedy algorithm to find the regulatory models which best explain the available data. Empirical analysis indicates that the inferred regulatory models agree with known combinatorial interactions between regulators and are robust against various parameter choices.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, 95064, USA.
| | | |
Collapse
|
33
|
Yeang CH, Mak HC, McCuine S, Workman C, Jaakkola T, Ideker T. Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biol 2005; 6:R62. [PMID: 15998451 PMCID: PMC1175993 DOI: 10.1186/gb-2005-6-7-r62] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2005] [Revised: 05/03/2005] [Accepted: 06/03/2005] [Indexed: 01/10/2023] Open
Abstract
As genome-scale measurements lead to increasingly complex models of gene regulation, systematic approaches are needed to validate and refine these models. Towards this goal, we describe an automated procedure for prioritizing genetic perturbations in order to discriminate optimally between alternative models of a gene-regulatory network. Using this procedure, we evaluate 38 candidate regulatory networks in yeast and perform four high-priority gene knockout experiments. The refined networks support previously unknown regulatory mechanisms downstream of SOK2 and SWI4.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Center for Biomolecular Science and Engineering, Baskin School of Engineering, University of California at Santa Cruz, Santa Cruz, CA 95064, USA
| | - H Craig Mak
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093, USA
| | - Scott McCuine
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093, USA
| | - Christopher Workman
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093, USA
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Trey Ideker
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
34
|
Abstract
We develop a new framework for inferring models of transcriptional regulation. The models, which we call physical network models, are annotated molecular interaction graphs. The attributes in the model correspond to verifiable properties of the underlying biological system such as the existence of protein-protein and protein-DNA interactions, the directionality of signal transduction in protein-protein interactions, as well as signs of the immediate effects of these interactions. Possible configurations of these variables are constrained by the available data sources. Some of the data sources, such as factor-binding data, involve measurements that are directly tied to the variables in the model. Other sources, such as gene knock-outs, are functional in nature and provide only indirect evidence about the variables. We associate each observed knock-out effect in the deletion mutant data with a set of causal paths (molecular cascades) that could in principle explain the effect, resulting in aggregate constraints about the physical variables in the model. The most likely settings of all the variables, specifying the most likely graph annotations, are found by a recursive application of the max-product algorithm. By testing our approach on datasets related to the pheromone response pathway in S. cerevisiae, we demonstrate that the resulting model is consistent with previous studies about the pathway. Moreover, we successfully predict gene knock-out effects with a high degree of accuracy in a cross-validation setting. When applying this approach genome-wide, we extract submodels consistent with previous studies. The approach can be readily extended to other data sources or to facilitate automated experimental design.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
35
|
Yeang CH, Ramaswamy S, Tamayo P, Mukherjee S, Rifkin RM, Angelo M, Reich M, Lander E, Mesirov J, Golub T. Molecular classification of multiple tumor types. Bioinformatics 2002; 17 Suppl 1:S316-22. [PMID: 11473023 DOI: 10.1093/bioinformatics/17.suppl_1.s316] [Citation(s) in RCA: 156] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished by their gene expression patterns (Golub et al. 1999, Ben-Dor et al. 2000, Alizadeh et al. 2000). However, the simultaneous classification across a heterogeneous set of tumor types has not been well studied yet. We obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples. We performed multi-class classification by combining the outputs of binary classifiers. Three binary classifiers (k-nearest neighbors, weighted voting, and support vector machines) were applied in conjunction with three combination scenarios (one-vs-all, all-pairs, hierarchical partitioning). We achieved the best cross validation error rate of 18.75% and the best test error rate of 21.74% by using the one-vs-all support vector machine algorithm. The results demonstrate the feasibility of performing clinically useful classification from samples of multiple tumor types.
Collapse
Affiliation(s)
- C H Yeang
- Center for Genome Research, MIT Whitehead Institute, One Kendall Square, Cambridge, MA 02139, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 2001; 98:15149-54. [PMID: 11742071 PMCID: PMC64998 DOI: 10.1073/pnas.211566398] [Citation(s) in RCA: 1085] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a support vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.
Collapse
Affiliation(s)
- S Ramaswamy
- Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, MA 02138, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|