1
|
Liu Z, Huang YF. Deep multiple-instance learning accurately predicts gene haploinsufficiency and deletion pathogenicity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555384. [PMID: 37693607 PMCID: PMC10491176 DOI: 10.1101/2023.08.29.555384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Copy number losses (deletions) are a major contributor to the etiology of severe genetic disorders. Although haploinsufficient genes play a critical role in deletion pathogenicity, current methods for deletion pathogenicity prediction fail to integrate multiple lines of evidence for haploinsufficiency at the gene level, limiting their power to pinpoint deleterious deletions associated with genetic disorders. Here we introduce DosaCNV, a deep multiple-instance learning framework that, for the first time, models deletion pathogenicity jointly with gene haploinsufficiency. By integrating over 30 gene-level features potentially predictive of haploinsufficiency, DosaCNV shows unmatched performance in prioritizing pathogenic deletions associated with a broad spectrum of genetic disorders. Furthermore, DosaCNV outperforms existing methods in predicting gene haploinsufficiency even though it is not trained on known haploinsufficient genes. Finally, DosaCNV leverages a state-of-the-art technique to quantify the contributions of individual gene-level features to haploinsufficiency, allowing for human-understandable explanations of model predictions. Altogether, DosaCNV is a powerful computational tool for both fundamental and translational research.
Collapse
Affiliation(s)
- Zhihan Liu
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Molecular, Cellular, and Integrative Biosciences Program, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
2
|
Liu Y, Yang C, Li HD, Wang J. IsoFrog: a reversible jump Markov Chain Monte Carlo feature selection-based method for predicting isoform functions. Bioinformatics 2023; 39:btad530. [PMID: 37647643 PMCID: PMC10491952 DOI: 10.1093/bioinformatics/btad530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/21/2023] [Accepted: 08/29/2023] [Indexed: 09/01/2023] Open
Abstract
MOTIVATION A single gene may yield several isoforms with different functions through alternative splicing. Continuous efforts are devoted to developing machine-learning methods to predict isoform functions. However, existing methods do not consider the relevance of each feature to specific functions and ignore the noise caused by the irrelevant features. In this case, we hypothesize that constructing a feature selection framework to extract the function-relevant features might help improve the model accuracy in isoform function prediction. RESULTS In this article, we present a feature selection-based approach named IsoFrog to predict isoform functions. First, IsoFrog adopts a reversible jump Markov Chain Monte Carlo (RJMCMC)-based feature selection framework to assess the feature importance to gene functions. Second, a sequential feature selection procedure is applied to select a subset of function-relevant features. This strategy screens the relevant features for the specific function while eliminating irrelevant ones, improving the effectiveness of the input features. Then, the selected features are input into our proposed method modified domain-invariant partial least squares, which prioritizes the most likely positive isoform for each positive MIG and utilizes diPLS for isoform function prediction. Tested on three datasets, our method achieves superior performance over six state-of-the-art methods, and the RJMCMC-based feature selection framework outperforms three classic feature selection methods. We expect this proposed methodology will promote the identification of isoform functions and further inspire the development of new methods. AVAILABILITY AND IMPLEMENTATION IsoFrog is freely available at https://github.com/genemine/IsoFrog.
Collapse
Affiliation(s)
- Yiwei Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Changhuo Yang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
3
|
Munim Twaij B, Jameel Ibraheem L, Al-Shammari RHH, Hasan M, Akter Khoko R, Sunzid Ahomed M, Prodhan SH, Nazmul Hasan M. Identification and characterization of aldehyde dehydrogenase (ALDH) gene superfamily in garlic and expression profiling in response to drought, salinity, and ABA. Gene 2023; 860:147215. [PMID: 36709878 DOI: 10.1016/j.gene.2023.147215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/31/2022] [Accepted: 01/17/2023] [Indexed: 01/27/2023]
Abstract
In response to biotic and abiotic stressors, aldehydes are detoxified and converted to carboxylic acids by aldehyde dehydrogenases (ALDHs), which are enzymes that use NAD+/NADP+ as cofactors. Garlic (Allium sativum L.) has not yet undergone a systematic examination of the ALDH superfamily, despite the genome sequence having been made public. In this investigation, we identified, characterized, and profiled the expression of the garlic ALDH gene family over the entire genome. The ALDH Gene Nomenclature Committee (AGNC) classification was used to classify and name the 34 ALDH genes that were discovered. Except for chromosome 8, all AsALDH genes were dispersed across the chromosomes. AsALDH genes have various localizations, according to predictions about subcellular localization. The AsALDH proteins are more varied and closely related to rice than to Arabidopsis, according to a study of conserved motifs and phylogenetic relationships. The presence of stress modulation pathways is indicated by the abundance of stress-related cis-elements in the AsALDH genes' promoter regions. Analysis of the RNA-seq data showed that AsALDHs expressed differently in various tissues and at various developmental stages. Nine AsALDHs were chosen for study using RT-qPCR, and the results revealed that the majority of the genes were upregulated in response to ABA and downregulated in response to salinity and drought. The results of this study improved our knowledge of the traits, evolutionary background, and biological functions of AsALDHs genes in growth and development.
Collapse
Affiliation(s)
- Baan Munim Twaij
- Department of Biology, College of Science, Mustansiriyah University, Baghdad, Iraq.
| | | | | | - Mahmudul Hasan
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| | - Roksana Akter Khoko
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| | - Md Sunzid Ahomed
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| | - Shamsul H Prodhan
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| | - Md Nazmul Hasan
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| |
Collapse
|
4
|
Zhang X, Smith DR. An overview of online resources for intra-species detection of gene duplications. Front Genet 2022; 13:1012788. [PMCID: PMC9606816 DOI: 10.3389/fgene.2022.1012788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Gene duplication plays an important role in evolutionary mechanism, which can act as a new source of genetic material in genome evolution. However, detecting duplicate genes from genomic data can be challenging. Various bioinformatics resources have been developed to identify duplicate genes from single and/or multiple species. Here, we summarize the metrics used to measure sequence identity among gene duplicates within species, compare several computational approaches that have been used to predict gene duplicates, and review recent advancements of a Basic Local Alignment Search Tool (BLAST)-based web tool and database, allowing future researchers to easily identify intra-species gene duplications. This article is a quick reference guide for research tools used for detecting gene duplicates.
Collapse
Affiliation(s)
- Xi Zhang
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- *Correspondence: Xi Zhang, ; David Roy Smith,
| | - David Roy Smith
- Department of Biology, Western University, London, ON, Canada
- *Correspondence: Xi Zhang, ; David Roy Smith,
| |
Collapse
|
5
|
Zhang X, Hu Y, Smith DR. HSDatabase-a database of highly similar duplicate genes from plants, animals, and algae. Database (Oxford) 2022; 2022:6754190. [PMID: 36208223 PMCID: PMC9547538 DOI: 10.1093/database/baac086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/16/2022] [Accepted: 09/20/2022] [Indexed: 11/30/2022]
Abstract
Gene duplication is an important evolutionary mechanism capable of providing new genetic material, which in some instances can help organisms adapt to various environmental conditions. Recent studies, for example, have indicated that highly similar duplicate genes (HSDs) are aiding adaptation to extreme conditions via gene dosage. However, for most eukaryotic genomes HSDs remain uncharacterized, partly because they can be hard to identify and categorize efficiently and effectively. Here, we collected and curated HSDs in nuclear genomes from various model animals, land plants and algae and indexed them in an online, open-access sequence repository called HSDatabase. Currently, this database contains 117 864 curated HSDs from 40 distinct genomes; it includes statistics on the total number of HSDs per genome as well as individual HSD copy numbers/lengths and provides sequence alignments of the duplicate gene copies. HSDatabase also allows users to download sequences of gene copies, access genome browsers, and link out to other databases, such as Pfam and Kyoto Encyclopedia of Genes and Genomes. What is more, a built-in Basic Local Alignment Search Tool option is available to conveniently explore potential homologous sequences of interest within and across species. HSDatabase has a user-friendly interface and provides easy access to the source data. It can be used on its own for comparative analyses of gene duplicates or in conjunction with HSDFinder, a newly developed bioinformatics tool for identifying, annotating, categorizing and visualizing HSDs. Database URL: http://hsdfinder.com/database/
Collapse
Affiliation(s)
- Xi Zhang
- Institute for Comparative Genomics, Dalhousie University, Halifax, Nova Scotia B3H 4R2, Canada.,Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4R2, Canada
| | - Yining Hu
- Department of Computer Science, University of Western Ontario, London, Ontario N6A 3K7, Canada
| | - David Roy Smith
- Department of Biology, University of Western Ontario, London, Ontario N6A 3K7, Canada
| |
Collapse
|
6
|
Tatman PD, Black JC. Extrachromosomal Circular DNA from TCGA Tumors Is Generated from Common Genomic Loci, Is Characterized by Self-Homology and DNA Motifs near Circle Breakpoints. Cancers (Basel) 2022; 14:cancers14092310. [PMID: 35565439 PMCID: PMC9101409 DOI: 10.3390/cancers14092310] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 04/27/2022] [Accepted: 04/29/2022] [Indexed: 02/06/2023] Open
Abstract
Extrachromosomal circular DNA has emerged as a frequent genomic alteration in tumors. High numbers of circular DNAs correspond to poor prognosis suggesting an important function in tumor biology. However, despite mounting evidence supporting the importance of circular DNA, little is known about their production, maintenance, or selection. To provide insight into these processes, we analyzed circular DNA elements computationally identified in 355 TCGA tumors spanning 22 tumor types. Circular DNAs originated from common genomic loci irrespective of cancer type. Genes found in circularized genomic regions were more likely to be expressed and were enriched in cancer-related pathways. Finally, in support of a model for circle generation through either a homology or microhomology-mediated process, circles exhibit homology near their breakpoint. These breakpoints are also enriched in specific DNA motifs. Our analysis supports a model where gene-containing circles emerge from common, highly transcribed regions through a homology-mediated process.
Collapse
|
7
|
Bartenschlager F, Klymiuk N, Weise C, Kuropka B, Gruber AD, Mundhenk L. Evolutionarily conserved properties of CLCA proteins 1, 3 and 4, as revealed by phylogenetic and biochemical studies in avian homologues. PLoS One 2022; 17:e0266937. [PMID: 35417490 PMCID: PMC9007345 DOI: 10.1371/journal.pone.0266937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 03/30/2022] [Indexed: 12/21/2022] Open
Abstract
Species-specific diversities are particular features of mammalian chloride channel regulator, calcium activated (CLCA) genes. In contrast to four complex gene clusters in mammals, only two CLCA genes appear to exist in chickens. CLCA2 is conserved in both, while only the galline CLCA1 (gCLCA1) displays close genetic distance to mammalian clusters 1, 3 and 4. In this study, sequence analyses and biochemical characterizations revealed that gCLCA1 as a putative avian prototype shares common protein domains and processing features with all mammalian CLCA homologues. It has a transmembrane (TM) domain in the carboxy terminal region and its mRNA and protein were detected in the alimentary canal, where the protein was localized in the apical membrane of enterocytes, similar to CLCA4. Both mammals and birds seem to have at least one TM domain containing CLCA protein with complex glycosylation in the apical membrane of enterocytes. However, some characteristic features of mammalian CLCA1 and 3 including entire protein secretion and expression in cell types other than enterocytes seem to be dispensable for chicken. Phylogenetic analyses including twelve bird species revealed that avian CLCA1 and mammalian CLCA3 form clades separate from a major branch containing mammalian CLCA1 and 4. Overall, our data suggest that gCLCA1 and mammalian CLCA clusters 1, 3 and 4 stem from a common ancestor which underwent complex gene diversification in mammals but not in birds.
Collapse
Affiliation(s)
- Florian Bartenschlager
- Faculty of Veterinary Medicine, Department of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany
| | - Nikolai Klymiuk
- Large Animal Models in Cardiovascular Research, Internal Medical Department I, Technical University of Munich, Munich, Germany
- Center for Innovative Medical Models, Ludwig-Maximilians University Munich, Munich, Germany
| | - Christoph Weise
- Institute of Chemistry and Biochemistry, Core Facility BioSupraMol, Freie Universität Berlin, Berlin, Germany
| | - Benno Kuropka
- Institute of Chemistry and Biochemistry, Core Facility BioSupraMol, Freie Universität Berlin, Berlin, Germany
| | - Achim D. Gruber
- Faculty of Veterinary Medicine, Department of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany
| | - Lars Mundhenk
- Faculty of Veterinary Medicine, Department of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany
- * E-mail:
| |
Collapse
|
8
|
Zhang X, Hu Y, Smith DR. HSDFinder: A BLAST-Based Strategy for Identifying Highly Similar Duplicated Genes in Eukaryotic Genomes. FRONTIERS IN BIOINFORMATICS 2021; 1:803176. [PMID: 36303740 PMCID: PMC9580922 DOI: 10.3389/fbinf.2021.803176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 11/25/2021] [Indexed: 01/01/2023] Open
Abstract
Gene duplication is an important evolutionary mechanism capable of providing new genetic material for adaptive and nonadaptive evolution. However, bioinformatics tools for identifying duplicate genes are often limited to the detection of paralogs in multiple species or to specific types of gene duplicates, such as retrocopies. Here, we present a user-friendly, BLAST-based web tool, called HSDFinder, which can identify, annotate, categorize, and visualize highly similar duplicate genes (HSDs) in eukaryotic nuclear genomes. HSDFinder includes an online heatmap plotting option, allowing users to compare HSDs among different species and visualize the results in different Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway functional categories. The external software requirements are BLAST, InterProScan, and KEGG. The utility of HSDFinder was tested on various model eukaryotic species, including Chlamydomonas reinhardtii, Arabidopsis thaliana, Oryza sativa, and Zea mays as well as the psychrophilic green alga Chlamydomonas sp. UWO241, and was proven to be a practical and accurate tool for gene duplication analyses. The web tool is free to use at http://hsdfinder.com. Documentation and tutorials can be found via the GitHub: https://github.com/zx0223winner/HSDFinder.
Collapse
Affiliation(s)
- Xi Zhang
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
- *Correspondence: Xi Zhang, ; David Roy Smith,
| | - Yining Hu
- Department of Computer Science, Western University, London, ON, Canada
| | - David Roy Smith
- Department of Biology, Western University, London, ON, Canada
- *Correspondence: Xi Zhang, ; David Roy Smith,
| |
Collapse
|
9
|
Khan AH, Smith DJ. Cost-Effective Mapping of Genetic Interactions in Mammalian Cells. Front Genet 2021; 12:703738. [PMID: 34434222 PMCID: PMC8381747 DOI: 10.3389/fgene.2021.703738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/13/2021] [Indexed: 11/23/2022] Open
Abstract
Comprehensive maps of genetic interactions in mammalian cells are daunting to construct because of the large number of potential interactions, ~ 2 × 108 for protein coding genes. We previously used co-inheritance of distant genes from published radiation hybrid (RH) datasets to identify genetic interactions. However, it was necessary to combine six legacy datasets from four species to obtain adequate statistical power. Mapping resolution was also limited by the low density PCR genotyping. Here, we employ shallow sequencing of nascent human RH clones as an economical approach to constructing interaction maps. In this initial study, 15 clones were analyzed, enabling construction of a network with 225 genes and 2,359 interactions (FDR < 0.05). Despite its small size, the network showed significant overlap with the previous RH network and with a protein-protein interaction network. Consumables were ≲$50 per clone, showing that affordable, high quality genetic interaction maps are feasible in mammalian cells.
Collapse
Affiliation(s)
- Arshad H Khan
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Desmond J Smith
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
10
|
Ribeiro DM, Rubinacci S, Ramisch A, Hofmeister RJ, Dermitzakis ET, Delaneau O. The molecular basis, genetic control and pleiotropic effects of local gene co-expression. Nat Commun 2021; 12:4842. [PMID: 34376650 PMCID: PMC8355184 DOI: 10.1038/s41467-021-25129-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 07/23/2021] [Indexed: 01/01/2023] Open
Abstract
Nearby genes are often expressed as a group. Yet, the prevalence, molecular mechanisms and genetic control of local gene co-expression are far from being understood. Here, by leveraging gene expression measurements across 49 human tissues and hundreds of individuals, we find that local gene co-expression occurs in 13% to 53% of genes per tissue. By integrating various molecular assays (e.g. ChIP-seq and Hi-C), we estimate the ability of several mechanisms, such as enhancer-gene interactions, in distinguishing gene pairs that are co-expressed from those that are not. Notably, we identify 32,636 expression quantitative trait loci (eQTLs) which associate with co-expressed gene pairs and often overlap enhancer regions. Due to affecting several genes, these eQTLs are more often associated with multiple human traits than other eQTLs. Our study paves the way to comprehend trait pleiotropy and functional interpretation of QTL and GWAS findings. All local gene co-expression identified here is available through a public database ( https://glcoex.unil.ch/ ).
Collapse
Affiliation(s)
- Diogo M Ribeiro
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Simone Rubinacci
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Anna Ramisch
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland
- Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva, Switzerland
| | - Robin J Hofmeister
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Emmanouil T Dermitzakis
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland
- Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva, Switzerland
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
11
|
van Leeuwen J, Pons C, Tan G, Wang JZ, Hou J, Weile J, Gebbia M, Liang W, Shuteriqi E, Li Z, Lopes M, Ušaj M, Dos Santos Lopes A, van Lieshout N, Myers CL, Roth FP, Aloy P, Andrews BJ, Boone C. Systematic analysis of bypass suppression of essential genes. Mol Syst Biol 2021; 16:e9828. [PMID: 32939983 PMCID: PMC7507402 DOI: 10.15252/msb.20209828] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/11/2020] [Accepted: 08/13/2020] [Indexed: 12/15/2022] Open
Abstract
Essential genes tend to be highly conserved across eukaryotes, but, in some cases, their critical roles can be bypassed through genetic rewiring. From a systematic analysis of 728 different essential yeast genes, we discovered that 124 (17%) were dispensable essential genes. Through whole-genome sequencing and detailed genetic analysis, we investigated the genetic interactions and genome alterations underlying bypass suppression. Dispensable essential genes often had paralogs, were enriched for genes encoding membrane-associated proteins, and were depleted for members of protein complexes. Functionally related genes frequently drove the bypass suppression interactions. These gene properties were predictive of essential gene dispensability and of specific suppressors among hundreds of genes on aneuploid chromosomes. Our findings identify yeast's core essential gene set and reveal that the properties of dispensable essential genes are conserved from yeast to human cells, correlating with human genes that display cell line-specific essentiality in the Cancer Dependency Map (DepMap) project.
Collapse
Affiliation(s)
- Jolanda van Leeuwen
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland.,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | - Guihong Tan
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Jason Zi Wang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Jing Hou
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Marinella Gebbia
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Wendy Liang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Ermira Shuteriqi
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Zhijian Li
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Maykel Lopes
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland
| | - Matej Ušaj
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Andreia Dos Santos Lopes
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland
| | - Natascha van Lieshout
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN, USA
| | - Frederick P Roth
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Brenda J Andrews
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Charles Boone
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
12
|
Jiang JY, Ju CJT, Hao J, Chen M, Wang W. JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites. Bioinformatics 2021; 37:i289-i298. [PMID: 34252942 PMCID: PMC8336595 DOI: 10.1093/bioinformatics/btab288] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation Circular RNA (circRNA) is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circRNA sequences are conserved across species. More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction. Results We present a robust end-to-end framework, Junction Encoder with Deep Interaction (JEDI), for circRNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circRNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circRNA prediction on both isoform level and gene level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve. Availability and implementation The implementation of our framework is available at https://github.com/hallogameboy/JEDI. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jyun-Yu Jiang
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Junheng Hao
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Muhao Chen
- Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| |
Collapse
|
13
|
Li H, Dawood M, Khayat MM, Farek JR, Jhangiani SN, Khan ZM, Mitani T, Coban-Akdemir Z, Lupski JR, Venner E, Posey JE, Sabo A, Gibbs RA. Exome variant discrepancies due to reference-genome differences. Am J Hum Genet 2021; 108:1239-1250. [PMID: 34129815 PMCID: PMC8322936 DOI: 10.1016/j.ajhg.2021.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/19/2021] [Indexed: 12/15/2022] Open
Abstract
Despite release of the GRCh38 human reference genome more than seven years ago, GRCh37 remains more widely used by most research and clinical laboratories. To date, no study has quantified the impact of utilizing different reference assemblies for the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. By calling variants on both the GRCh37 and GRCh38 references, we identified single-nucleotide variants (SNVs) and insertion-deletions (indels) in 1,572 exomes from participants with Mendelian diseases and their family members. We found that a total of 1.5% of SNVs and 2.0% of indels were discordant when different references were used. Notably, 76.6% of the discordant variants were clustered within discrete discordant reference patches (DISCREPs) comprising only 0.9% of loci targeted by exome sequencing. These DISCREPs were enriched for genomic elements including segmental duplications, fix patch sequences, and loci known to contain alternate haplotypes. We identified 206 genes significantly enriched for discordant variants, most of which were in DISCREPs and caused by multi-mapped reads on the reference assembly that lacked the variant call. Among these 206 genes, eight are implicated in known Mendelian diseases and 53 are associated with common phenotypes from genome-wide association studies. In addition, variant interpretations could also be influenced by the reference after lifting-over variant loci to another assembly. Overall, we identified genes and genomic loci affected by reference assembly choice, including genes associated with Mendelian disorders and complex human diseases that require careful evaluation in both research and clinical applications.
Collapse
Affiliation(s)
- He Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moez Dawood
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jesse R Farek
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ziad M Khan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tadahiro Mitani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zeynep Coban-Akdemir
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Texas Children's Hospital, Houston, TX 77030, USA
| | - Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
14
|
Nicewicz AW, Sawadro MK, Nicewicz Ł, Babczyńska AI. Juvenile hormone in spiders. Is this the solution to a mystery? Gen Comp Endocrinol 2021; 308:113781. [PMID: 33862048 DOI: 10.1016/j.ygcen.2021.113781] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 02/25/2021] [Accepted: 04/09/2021] [Indexed: 11/17/2022]
Abstract
The juvenile hormone (JH) plays a crucial role in arthropod physiological processes, e.g., the regulation of metamorphosis, development, and reproduction (the vitellogenesis, the development of gonads, egg production). Still, data about this sesquiterpenoid hormone in spiders (Araneae) are rudimentary and equivocal. The presence of the JH or its precursors (e.g. methyl farnesoate) is not confirmed in spiders. The site of synthesis of its is still undetermined. No receptors of the JH are identified in spiders and thus, the molecular mechanism of action of this group of hormones is still unknown. Here we show by using the phylogenetic analysis and qPCR method the presence of the transcript of the enzyme catalyzing the last phase of the JH biosynthesis pathway (epox CYP15A1), the JH receptor (Met), and a possible candidate to the methyl farnesoate receptor (USP) in the various tissues and stages of ontogenesis in both sexes of spider Parasteatoda tepidariorum. Our results indicate that the juvenile hormone and/or methyl farnesoate presence is possible in the species of spider P. tepidariorum. The presence of the Ptepox CYP15A1 gene suggests that the main site of the juvenile hormone synthesis can be the integument and not the Schneider organ 2. It also seems that the juvenile hormone and/or methyl farnesoate can be hormones with biological activity due to the presence of the transcript of insect and crustacean JH/MG receptor - Met. The Ptepox CYP15A1, PtMet, and Ptusp expression are sex-, tissue-and time-specific. This study is the first report about the presence of the Ptepox CYP15A1 and PtMet transcripts in the Arachnida, which may indicate the presence of the juvenile hormone and/or methyl farnesoate in spiders.
Collapse
Affiliation(s)
- Agata Wanda Nicewicz
- University of Silesia in Katowice, Faculty of Natural Sciences, Institute of Biology, Biotechnology and Environmental Protection, Bankowa 9, 40007 Katowice, Poland.
| | - Marta Katarzyna Sawadro
- University of Silesia in Katowice, Faculty of Natural Sciences, Institute of Biology, Biotechnology and Environmental Protection, Bankowa 9, 40007 Katowice, Poland
| | - Łukasz Nicewicz
- University of Silesia in Katowice, Faculty of Natural Sciences, Institute of Biology, Biotechnology and Environmental Protection, Bankowa 9, 40007 Katowice, Poland
| | - Agnieszka Izabela Babczyńska
- University of Silesia in Katowice, Faculty of Natural Sciences, Institute of Biology, Biotechnology and Environmental Protection, Bankowa 9, 40007 Katowice, Poland
| |
Collapse
|
15
|
Li HD, Yang C, Zhang Z, Yang M, Wu FX, Omenn GS, Wang J. IsoResolve: predicting splice isoform functions by integrating gene and isoform-level features with domain adaptation. Bioinformatics 2021; 37:522-530. [PMID: 32966552 PMCID: PMC8088322 DOI: 10.1093/bioinformatics/btaa829] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/12/2020] [Accepted: 09/09/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. RESULTS We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. AVAILABILITY AND IMPLEMENTATION IsoResolve is freely available at https://github.com/genemine/IsoResolve. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| | - Changhuo Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan 410083, China
| | - Mengyun Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, WA 98101, USA.,Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| |
Collapse
|
16
|
Vershinina AO, Heintzman PD, Froese DG, Zazula G, Cassatt-Johnstone M, Dalén L, Der Sarkissian C, Dunn SG, Ermini L, Gamba C, Groves P, Kapp JD, Mann DH, Seguin-Orlando A, Southon J, Stiller M, Wooller MJ, Baryshnikov G, Gimranov D, Scott E, Hall E, Hewitson S, Kirillova I, Kosintsev P, Shidlovsky F, Tong HW, Tiunov MP, Vartanyan S, Orlando L, Corbett-Detig R, MacPhee RD, Shapiro B. Ancient horse genomes reveal the timing and extent of dispersals across the Bering Land Bridge. Mol Ecol 2021; 30:6144-6161. [PMID: 33971056 DOI: 10.1111/mec.15977] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 03/24/2021] [Accepted: 04/27/2021] [Indexed: 01/02/2023]
Abstract
The Bering Land Bridge (BLB) last connected Eurasia and North America during the Late Pleistocene. Although the BLB would have enabled transfers of terrestrial biota in both directions, it also acted as an ecological filter whose permeability varied considerably over time. Here we explore the possible impacts of this ecological corridor on genetic diversity within, and connectivity among, populations of a once wide-ranging group, the caballine horses (Equus spp.). Using a panel of 187 mitochondrial and eight nuclear genomes recovered from present-day and extinct caballine horses sampled across the Holarctic, we found that Eurasian horse populations initially diverged from those in North America, their ancestral continent, around 1.0-0.8 million years ago. Subsequent to this split our mitochondrial DNA analysis identified two bidirectional long-range dispersals across the BLB ~875-625 and ~200-50 thousand years ago, during the Middle and Late Pleistocene. Whole genome analysis indicated low levels of gene flow between North American and Eurasian horse populations, which probably occurred as a result of these inferred dispersals. Nonetheless, mitochondrial and nuclear diversity of caballine horse populations retained strong phylogeographical structuring. Our results suggest that barriers to gene flow, currently unidentified but possibly related to habitat distribution across Beringia or ongoing evolutionary divergence, played an important role in shaping the early genetic history of caballine horses, including the ancestors of living horses within Equus ferus.
Collapse
Affiliation(s)
- Alisa O Vershinina
- Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Peter D Heintzman
- The Arctic University Museum of Norway, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Duane G Froese
- Department of Earth and Atmospheric Sciences, University of Alberta, Edmonton, AB, Canada
| | - Grant Zazula
- Collections and Research, Canadian Museum of Nature, Station D, Ottawa, ON, Canada.,Government of Yukon, Department of Tourism and Culture, Palaeontology Program, Whitehorse, YT, Canada
| | | | - Love Dalén
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden.,Centre for Palaeogenetics, Stockholm, Sweden
| | - Clio Der Sarkissian
- Centre d'Anthropobiologie et de Génomique de Toulouse UMR5288, Faculté de Médecine Purpan, Université Paul Sabatier, Toulouse, France
| | - Shelby G Dunn
- Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Luca Ermini
- Lundbeck Foundation GeoGenetics Center, University of Copenhagen, Copenhagen, Denmark
| | - Cristina Gamba
- Lundbeck Foundation GeoGenetics Center, University of Copenhagen, Copenhagen, Denmark
| | - Pamela Groves
- Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, CA, USA
| | - Joshua D Kapp
- Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Daniel H Mann
- Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, CA, USA
| | - Andaine Seguin-Orlando
- Centre d'Anthropobiologie et de Génomique de Toulouse UMR5288, Faculté de Médecine Purpan, Université Paul Sabatier, Toulouse, France
| | - John Southon
- Keck-CCAMS Group, Earth System Science Department, University of California, Irvine, CA, USA
| | - Mathias Stiller
- Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA.,Division Molecular Pathology, Institute of Pathology, University Hospital Leipzig, Leipzig, Germany
| | - Matthew J Wooller
- Alaska Stable Isotope Facility, Water and Environmental Research Center, Institute of Northern Engineering, University of Alaska Fairbanks, Fairbanks, AK, USA.,Department of Marine Biology, College of Fisheries and Ocean Sciences, University of Alaska Fairbanks, Fairbanks, AK, USA
| | - Gennady Baryshnikov
- Laboratory of Theriology, Zoological Institute of the Russian Academy of Sciences, St. Petersburg, Russia
| | - Dmitry Gimranov
- Institute of Plant & Animal Ecology of the Russian Academy of Sciences, Ural Branch, Ekaterinburg, Russia.,Ural Federal University named after the first President of Russia B. N. Yeltsin, Ekaterinburg, Russia
| | - Eric Scott
- California State University, San Bernardino, CA, USA
| | - Elizabeth Hall
- Government of Yukon, Department of Tourism and Culture, Palaeontology Program, Whitehorse, YT, Canada
| | - Susan Hewitson
- Government of Yukon, Department of Tourism and Culture, Palaeontology Program, Whitehorse, YT, Canada
| | - Irina Kirillova
- Institute of Geography, Russian Academy of Sciences, Moscow, Russia
| | - Pavel Kosintsev
- Institute of Plant & Animal Ecology of the Russian Academy of Sciences, Ural Branch, Ekaterinburg, Russia
| | | | - Hao-Wen Tong
- Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China.,CAS Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing, China
| | - Mikhail P Tiunov
- Federal Scientific Center of the East Asia Terrestrial Biodiversity, Far Eastern Branch of Russian Academy of Sciences, Vladivostok, Russia
| | - Sergey Vartanyan
- North-East Interdisciplinary Scientific Research Institute N.A. Shilo, Far East Branch, Russian Academy of Sciences, Magadan, Russia
| | - Ludovic Orlando
- Centre d'Anthropobiologie et de Génomique de Toulouse UMR5288, Faculté de Médecine Purpan, Université Paul Sabatier, Toulouse, France
| | | | | | - Beth Shapiro
- Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| |
Collapse
|
17
|
Jaiswal A, Gautam P, Pietilä EA, Timonen S, Nordström N, Akimov Y, Sipari N, Tanoli Z, Fleischer T, Lehti K, Wennerberg K, Aittokallio T. Multi-modal meta-analysis of cancer cell line omics profiles identifies ECHDC1 as a novel breast tumor suppressor. Mol Syst Biol 2021; 17:e9526. [PMID: 33750001 PMCID: PMC7983037 DOI: 10.15252/msb.20209526] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 02/17/2021] [Accepted: 02/19/2021] [Indexed: 12/12/2022] Open
Abstract
Molecular and functional profiling of cancer cell lines is subject to laboratory-specific experimental practices and data analysis protocols. The current challenge therefore is how to make an integrated use of the omics profiles of cancer cell lines for reliable biological discoveries. Here, we carried out a systematic analysis of nine types of data modalities using meta-analysis of 53 omics studies across 12 research laboratories for 2,018 cell lines. To account for a relatively low consistency observed for certain data modalities, we developed a robust data integration approach that identifies reproducible signals shared among multiple data modalities and studies. We demonstrated the power of the integrative analyses by identifying a novel driver gene, ECHDC1, with tumor suppressive role validated both in breast cancer cells and patient tumors. The multi-modal meta-analysis approach also identified synthetic lethal partners of cancer drivers, including a co-dependency of PTEN deficient endometrial cancer cells on RNA helicases.
Collapse
Affiliation(s)
- Alok Jaiswal
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
- Present address:
The Broad Institute of MIT and HarvardCambridgeMAUSA
| | - Prson Gautam
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
| | - Elina A Pietilä
- Individualized Drug Therapy, Research Programs UnitUniversity of HelsinkiHelsinkiFinland
| | - Sanna Timonen
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
- Hematology Research Unit HelsinkiUniversity of Helsinki and Helsinki University Hospital Comprehensive Cancer CenterHelsinkiFinland
- Translational Immunology Research Program and Department of Clinical Chemistry and HematologyUniversity of HelsinkiHelsinkiFinland
| | - Nora Nordström
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
| | - Yevhen Akimov
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
| | - Nina Sipari
- Viikki Metabolomics UnitHelsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
| | - Thomas Fleischer
- Department of Cancer GeneticsInstitute for Cancer ResearchOslo University HospitalOsloNorway
| | - Kaisa Lehti
- Individualized Drug Therapy, Research Programs UnitUniversity of HelsinkiHelsinkiFinland
- Department of Microbiology, Tumor and Cell BiologyKarolinska InstitutetStockholmSweden
- Department of Biomedical Laboratory ScienceNorwegian University of Science and TechnologyTrondheimNorway
| | - Krister Wennerberg
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
- Biotech Research & Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem)University of CopenhagenCopenhagenDenmark
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM)Helsinki Institute of Life Science (HiLIFE)University of HelsinkiHelsinkiFinland
- Department of Cancer GeneticsInstitute for Cancer ResearchOslo University HospitalOsloNorway
- Department of Mathematics and StatisticsUniversity of TurkuTurkuFinland
- Oslo Centre for Biostatistics and Epidemiology (OCBE)University of OsloOsloNorway
| |
Collapse
|
18
|
Amici DR, Jackson JM, Truica MI, Smith RS, Abdulkadir SA, Mendillo ML. FIREWORKS: a bottom-up approach to integrative coessentiality network analysis. Life Sci Alliance 2021; 4:e202000882. [PMID: 33328249 PMCID: PMC7756899 DOI: 10.26508/lsa.202000882] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 12/01/2020] [Accepted: 12/02/2020] [Indexed: 12/11/2022] Open
Abstract
Genetic coessentiality analysis, a computational approach which identifies genes sharing a common effect on cell fitness across large-scale screening datasets, has emerged as a powerful tool to identify functional relationships between human genes. However, widespread implementation of coessentiality to study individual genes and pathways is limited by systematic biases in existing coessentiality approaches and accessibility barriers for investigators without computational expertise. We created FIREWORKS, a method and interactive tool for the construction and statistical analysis of coessentiality networks centered around gene(s) provided by the user. FIREWORKS incorporates a novel bias reduction approach to reduce false discoveries, enables restriction of coessentiality analyses to custom subsets of cell lines, and integrates multiomic and drug-gene interaction datasets to investigate and target contextual gene essentiality. We demonstrate the broad utility of FIREWORKS through case vignettes investigating gene function and specialization, indirect therapeutic targeting of "undruggable" proteins, and context-specific rewiring of genetic networks.
Collapse
Affiliation(s)
- David R Amici
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL, USA
- Simpson Querrey Center for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Robert H Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Medical Scientist Training Program, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Jasen M Jackson
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL, USA
- Simpson Querrey Center for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Robert H Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Mihai I Truica
- Robert H Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Medical Scientist Training Program, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Department of Urology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Roger S Smith
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL, USA
- Simpson Querrey Center for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Robert H Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Medical Scientist Training Program, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Sarki A Abdulkadir
- Robert H Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Department of Urology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Marc L Mendillo
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL, USA
- Simpson Querrey Center for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Robert H Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| |
Collapse
|
19
|
Miao B, Fu S, Lyu C, Gontarz P, Wang T, Zhang B. Tissue-specific usage of transposable element-derived promoters in mouse development. Genome Biol 2020; 21:255. [PMID: 32988383 PMCID: PMC7520981 DOI: 10.1186/s13059-020-02164-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 09/07/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) are a significant component of eukaryotic genomes and play essential roles in genome evolution. Mounting evidence indicates that TEs are highly transcribed in early embryo development and contribute to distinct biological functions and tissue morphology. RESULTS We examine the epigenetic dynamics of mouse TEs during the development of five tissues: intestine, liver, lung, stomach, and kidney. We found that TEs are associated with over 20% of open chromatin regions during development. Close to half of these accessible TEs are only activated in a single tissue and a specific developmental stage. Most accessible TEs are rodent-specific. Across these five tissues, 453 accessible TEs are found to create the transcription start sites of downstream genes in mouse, including 117 protein-coding genes and 144 lincRNA genes, 93.7% of which are mouse-specific. Species-specific TE-derived transcription start sites are found to drive the expression of tissue-specific genes and change their tissue-specific expression patterns during evolution. CONCLUSION Our results suggest that TE insertions increase the regulatory potential of the genome, and some TEs have been domesticated to become a crucial component of gene and regulate tissue-specific expression during mouse tissue development.
Collapse
Affiliation(s)
- Benpeng Miao
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, 63108, USA
- Department of Genetics, Edison Family Center for Genomic Sciences and Systems Biology, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, 63108, USA
| | - Shuhua Fu
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, 63108, USA
| | - Cheng Lyu
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, 63108, USA
| | - Paul Gontarz
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, 63108, USA
| | - Ting Wang
- Department of Genetics, Edison Family Center for Genomic Sciences and Systems Biology, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, 63108, USA.
| | - Bo Zhang
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, 63108, USA.
| |
Collapse
|
20
|
Cao C, Mak L, Jin G, Gordon P, Ye K, Long Q. PRESM: personalized reference editor for somatic mutation discovery in cancer genomics. Bioinformatics 2020; 35:1445-1452. [PMID: 30247633 DOI: 10.1093/bioinformatics/bty812] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 08/27/2018] [Accepted: 09/19/2018] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection. RESULTS We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/precisionomics/PRESM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Lauren Mak
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Guangxu Jin
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Paul Gordon
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Kai Ye
- Department of Bioinformatics, Electronic and Information Engineering School, Xi'an Jiaotong University, Xi'an, China
| | - Quan Long
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| |
Collapse
|
21
|
Fragoza R, Das J, Wierbowski SD, Liang J, Tran TN, Liang S, Beltran JF, Rivera-Erick CA, Ye K, Wang TY, Yao L, Mort M, Stenson PD, Cooper DN, Wei X, Keinan A, Schimenti JC, Clark AG, Yu H. Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations. Nat Commun 2019; 10:4141. [PMID: 31515488 PMCID: PMC6742646 DOI: 10.1038/s41467-019-11959-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 08/06/2019] [Indexed: 12/19/2022] Open
Abstract
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual's genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.
Collapse
Affiliation(s)
- Robert Fragoza
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Jishnu Das
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Shayne D Wierbowski
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Jin Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Tina N Tran
- Department of Biomedical Science, Cornell University, Ithaca, NY, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14853, USA
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Juan F Beltran
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Christen A Rivera-Erick
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Kaixiong Ye
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Ting-Yi Wang
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Li Yao
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Xiaomu Wei
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Alon Keinan
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - John C Schimenti
- Department of Biomedical Science, Cornell University, Ithaca, NY, 14853, USA
| | - Andrew G Clark
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA.
| |
Collapse
|
22
|
Sawadro MK, Bednarek AW, Molenda AE, Babczyńska AI. Expression profile of genes encoding allatoregulatory neuropeptides in females of the spider Parasteatoda tepidariorum (Araneae, Theridiidae). PLoS One 2019; 14:e0222274. [PMID: 31504071 PMCID: PMC6736302 DOI: 10.1371/journal.pone.0222274] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 08/26/2019] [Indexed: 12/23/2022] Open
Abstract
Allatoregulatory neuropeptides are multifunctional proteins that take part in the synthesis and secretion of juvenile hormones. In insects, allatostatins are inhibitors of juvenile hormone biosynthesis in the corpora allata while allatotropins, act as stimulators. By quantitative real-time PCR, we analyzed the gene expression of allatostatin A (PtASTA), allatostatin B (PtASTB), allatostatin C (PtASTC), allatotropin (PtAT) and their receptors (PtASTA-R, PtASTB-R, PtASTC-R, PtAT-R) in various tissues in different age groups of female spiders. In the presented manuscript, the presence of allatostatin A, allatostatin C, and allatotropin are reported in females of the spider P. tepidariorum. The obtained results indicated substantial differences in gene expression levels for allatoregulatory neuropeptides and their receptors in the different tissues. Additionally, the gene expression levels also varied depending on the female age. Strong expression was observed coinciding with sexual maturation in the neuroendocrine and nervous system, and to a lower extent in the digestive tissues and ovaries. Reverse trends were observed for the expression of genes encoding the receptors of these neuropeptides. In conclusion, our study is the first hint that the site of synthesis and secretion is fulfilled by similar structures as observed in other arthropods. In addition, the results of the analysis of spider physiology give evidence that the general functions like regulation of the juvenile hormone synthesis, regulation of the digestive tract and ovaries action, control of vitellogenesis process by the neuropeptides seem to be conserved among arthropods and are the milestone to future functional studies.
Collapse
Affiliation(s)
- Marta Katarzyna Sawadro
- Department of Animal Physiology and Ecotoxicology, University of Silesia in Katowice, Bankowa, Katowice, Poland
| | - Agata Wanda Bednarek
- Department of Animal Physiology and Ecotoxicology, University of Silesia in Katowice, Bankowa, Katowice, Poland
| | - Agnieszka Ewa Molenda
- Department of Animal Physiology and Ecotoxicology, University of Silesia in Katowice, Bankowa, Katowice, Poland
| | | |
Collapse
|
23
|
Kim P, Jang YE, Lee S. FusionScan: accurate prediction of fusion genes from RNA-Seq data. Genomics Inform 2019; 17:e26. [PMID: 31610622 PMCID: PMC6808644 DOI: 10.5808/gi.2019.17.3.e26] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 03/21/2019] [Indexed: 01/10/2023] Open
Abstract
Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.
Collapse
Affiliation(s)
- Pora Kim
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul 03760, Korea
| | - Ye Eun Jang
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul 03760, Korea.,Department of Bio-Information Science, Ewha Womans University, Seoul 03760, Korea
| | - Sanghyuk Lee
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul 03760, Korea.,Department of Bio-Information Science, Ewha Womans University, Seoul 03760, Korea.,Department of Life Science, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
24
|
Jeon H, Le MT, Ahn B, Cho HS, Le VCQ, Yum J, Hong K, Kim JH, Song H, Park C. Copy number variation of PR-39 cathelicidin, and identification of PR-35, a natural variant of PR-39 with reduced mammalian cytotoxicity. Gene 2019; 692:88-93. [PMID: 30641213 DOI: 10.1016/j.gene.2018.12.065] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Revised: 12/12/2018] [Accepted: 12/30/2018] [Indexed: 01/10/2023]
Abstract
Proline-arginine-rich (PR)-39 is neutrophil antimicrobial peptide that has potent antimicrobial activity against a broad spectrum of microorganisms, including bacteria, fungi, and some enveloped viruses as a part of the innate immune system. We analyzed the nucleotide sequence variations of PR-39 exon 4, which is the mature peptide region responsible for antimicrobial activity, from 48 pigs of six breeds using sequence-based typing. The analysis identified four alleles including allele PR-35 with a 12-bp deletion near the N-terminus. Interestingly, 16.7% of individuals showed the presence of three alleles per individual, but only in the Berkshire and Duroc breeds. We further analyzed the genetic diversity of PR-39 for the entire genomic region of the gene from PR-39 exon 1 to the 3' untranslated region for different alleles by PCR amplification and cloning. The antimicrobial activity of chemically synthesized PR-35 was similar to that of PR-39, but the level of mammalian cell cytotoxicity was lower than the wild type. Better knowledge of the genetic diversity of PR-39 among different individuals and breeds may contribute to improved immune defense of pigs. PR-35, as a natural antimicrobial peptide variant, could be an interesting candidate for the development of peptide antibiotics.
Collapse
Affiliation(s)
- Hyoim Jeon
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Minh Thong Le
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Byeongyong Ahn
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Hye-Sun Cho
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Van Chanh Quy Le
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Joori Yum
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Kwonho Hong
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Jin-Hoi Kim
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Hyuk Song
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea
| | - Chankyu Park
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Republic of Korea.
| |
Collapse
|
25
|
Batrakou DG, Heron ED, Nieduszynski CA. Rapid high-resolution measurement of DNA replication timing by droplet digital PCR. Nucleic Acids Res 2018; 46:e112. [PMID: 29986073 PMCID: PMC6212846 DOI: 10.1093/nar/gky590] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 06/11/2018] [Accepted: 06/18/2018] [Indexed: 02/03/2023] Open
Abstract
Genomes are replicated in a reproducible temporal pattern. Current methods for assaying allele replication timing are time consuming and/or expensive. These include high-throughput sequencing which can be used to measure DNA copy number as a proxy for allele replication timing. Here, we use droplet digital PCR to study DNA replication timing at multiple loci in budding yeast and human cells. We establish that the method has temporal and spatial resolutions comparable to the high-throughput sequencing approaches, while being faster than alternative locus-specific methods. Furthermore, the approach is capable of allele discrimination. We apply this method to determine relative replication timing across timing transition zones in cultured human cells. Finally, multiple samples can be analysed in parallel, allowing us to rapidly screen kinetochore mutants for perturbation to centromere replication timing. Therefore, this approach is well suited to the study of locus-specific replication and the screening of cis- and trans-acting mutants to identify mechanisms that regulate local genome replication timing.
Collapse
Affiliation(s)
- Dzmitry G Batrakou
- Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK
| | - Emma D Heron
- Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK
| | - Conrad A Nieduszynski
- Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK
| |
Collapse
|
26
|
Jung J, Kang Y, Paik H, Kwon M, Yu H, Lee D. Deconvoluting essential gene signatures for cancer growth from genomic expression in compound-treated cells. Bioinformatics 2018; 35:1167-1173. [DOI: 10.1093/bioinformatics/bty774] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 08/10/2018] [Accepted: 08/31/2018] [Indexed: 11/15/2022] Open
Affiliation(s)
- Jinmyung Jung
- Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
- Department of Data Science, College of Information Technology, The University of Suwon, Bongdam-eup, Hwaseong, Republic of Korea
| | - Yeeok Kang
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Hyojung Paik
- Korea Institute of Science and Technology Information, Center for Applied Scientific Computing, Division of Supercomputing, Daejeon, Republic of Korea
| | - Mijin Kwon
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Hasun Yu
- Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Doheon Lee
- Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
| |
Collapse
|
27
|
Bayersdorf R, Fruscalzo A, Catania F. Linking autoimmunity to the origin of the adaptive immune system. EVOLUTION MEDICINE AND PUBLIC HEALTH 2018; 2018:2-12. [PMID: 29423226 PMCID: PMC5793817 DOI: 10.1093/emph/eoy001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In jawed vertebrates, the adaptive immune system (AIS) cooperates with the innate immune system (IIS) to protect hosts from infections. Although targeting non-self-components, the AIS also generates self-reactive antibodies which, when inadequately counter-selected, can give rise to autoimmune diseases (ADs). ADs are on the rise in western countries. Why haven’t ADs been eliminated during the evolution of a ∼500 million-year old system? And why have they become more frequent in recent decades? Self-recognition is an attribute of the phylogenetically more ancient IIS and empirical data compellingly show that some self-reactive antibodies, which are classifiable as elements of the IIS rather then the AIS, may protect from (rather than cause) ADs. Here, we propose that the IIS’s self-recognition system originally fathered the AIS and, as a consequence of this relationship, its activity is dampened in hygienic environments. Rather than a mere breakdown or failure of the mechanisms of self-tolerance, ADs might thus arise from architectural constraints.
Collapse
Affiliation(s)
- Robert Bayersdorf
- Institute for Genome Stability in Aging and Disease, Medical Faculty, University of Cologne, 50931 Cologne, Germany.,Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Arrigo Fruscalzo
- Clinic of Obstetrics and Gynecology, St Franziskus Hospital, 59227 Ahlen, Germany.,Department of Obstetrics and Gynecology, University Hospital of Münster, 48149 Münster, Germany
| | - Francesco Catania
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| |
Collapse
|
28
|
Diss G, Gagnon-Arsenault I, Dion-Coté AM, Vignaud H, Ascencio DI, Berger CM, Landry CR. Gene duplication can impart fragility, not robustness, in the yeast protein interaction network. Science 2017; 355:630-634. [PMID: 28183979 DOI: 10.1126/science.aai7685] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 01/13/2017] [Indexed: 12/18/2022]
Abstract
The maintenance of duplicated genes is thought to protect cells from genetic perturbations, but the molecular basis of this robustness is largely unknown. By measuring the interaction of yeast proteins with their partners in wild-type cells and in cells lacking a paralog, we found that 22 out of 56 paralog pairs compensate for the lost interactions. An equivalent number of pairs exhibit the opposite behavior and require each other's presence for maintaining their interactions. These dependent paralogs generally interact physically, regulate each other's abundance, and derive from ancestral self-interacting proteins. This reveals that gene duplication may actually increase mutational fragility instead of robustness in a large number of cases.
Collapse
Affiliation(s)
- Guillaume Diss
- Département de Biologie, Université Laval, Québec, QC, Canada.,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada.,EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Doctor Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Isabelle Gagnon-Arsenault
- Département de Biologie, Université Laval, Québec, QC, Canada.,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada
| | - Anne-Marie Dion-Coté
- Département de Biologie, Université Laval, Québec, QC, Canada.,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada
| | - Hélène Vignaud
- Département de Biologie, Université Laval, Québec, QC, Canada.,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada
| | - Diana I Ascencio
- Département de Biologie, Université Laval, Québec, QC, Canada.,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada.,Laboratorio Nacional de Genómica para la Biodiversidad (LANGEBIO), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Guanajuato, Mexico
| | - Caroline M Berger
- Département de Biologie, Université Laval, Québec, QC, Canada.,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada
| | - Christian R Landry
- Département de Biologie, Université Laval, Québec, QC, Canada. .,The Quebec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada
| |
Collapse
|
29
|
Kirk IK, Weinhold N, Brunak S, Belling K. The impact of the protein interactome on the syntenic structure of mammalian genomes. PLoS One 2017; 12:e0179112. [PMID: 28910296 PMCID: PMC5598925 DOI: 10.1371/journal.pone.0179112] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 05/10/2017] [Indexed: 02/06/2023] Open
Abstract
Conserved synteny denotes evolutionary preserved gene order across species. It is not well understood to which degree functional relationships between genes are preserved in syntenic blocks. Here we investigate whether protein-coding genes conserved in mammalian syntenic blocks encode gene products that serve the common functional purpose of interacting at protein level, i.e. connectivity. High connectivity among protein-protein interactions (PPIs) was only moderately associated with conserved synteny on a genome-wide scale. However, we observed a smaller subset of 3.6% of all syntenic blocks with high-confidence PPIs that had significantly higher connectivity than expected by random. Additionally, syntenic blocks with high-confidence PPIs contained significantly more chromatin loops than the remaining blocks, indicating functional preservation among these syntenic blocks. Conserved synteny is typically defined by sequence similarity. In this study, we also examined whether a functional relationship, here PPI connectivity, can identify syntenic blocks independently of orthology. While orthology-based syntenic blocks with high-confident PPIs and the connectivity-based syntenic blocks largely overlapped, the connectivity-based approach identified additional syntenic blocks that were not found by conventional sequence-based methods alone. Additionally, the connectivity-based approach enabled identification of potential orthologous genes between species. Our analyses demonstrate that subsets of syntenic blocks are associated with highly connected proteins, and that PPI connectivity can be used to detect conserved synteny even if sequence conservation drifts beyond what orthology algorithms normally can identify.
Collapse
Affiliation(s)
- Isa Kristina Kirk
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Nils Weinhold
- Memorial Sloan Kettering Cancer Center, Computational Biology Program, New York, NY, United States of America
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kirstine Belling
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|
30
|
Soler-Oliva ME, Guerrero-Martínez JA, Bachetti V, Reyes JC. Analysis of the relationship between coexpression domains and chromatin 3D organization. PLoS Comput Biol 2017; 13:e1005708. [PMID: 28902867 PMCID: PMC5612749 DOI: 10.1371/journal.pcbi.1005708] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Revised: 09/25/2017] [Accepted: 08/03/2017] [Indexed: 01/08/2023] Open
Abstract
Gene order is not random in eukaryotic chromosomes, and co-regulated genes tend to be clustered. The mechanisms that determine co-regulation of large regions of the genome and its connection with chromatin three-dimensional (3D) organization are still unclear however. Here we have adapted a recently described method for identifying chromatin topologically associating domains (TADs) to identify coexpression domains (which we term “CODs”). Using human normal breast and breast cancer RNA-seq data, we have identified approximately 500 CODs. CODs in the normal and breast cancer genomes share similar characteristics but differ in their gene composition. COD genes have a greater tendency to be coexpressed with genes that reside in other CODs than with non-COD genes. Such inter-COD coexpression is maintained over large chromosomal distances in the normal genome but is partially lost in the cancer genome. Analyzing the relationship between CODs and chromatin 3D organization using Hi-C contact data, we find that CODs do not correspond to TADs. In fact, intra-TAD gene coexpression is the same as random for most chromosomes. However, the contact profile is similar between gene pairs that reside either in the same COD or in coexpressed CODs. These data indicate that co-regulated genes in the genome present similar patterns of contacts irrespective of the frequency of physical chromatin contacts between them. Prokaryotic operons normally comprise functionally related genes whose expression is coordinated. Even though operons do not exist in most eukaryotes, results from the last fifteen years indicate that gene order is nonetheless not random in eukaryotes, and that coexpressed genes tend to be grouped in the genome. We identify here about 500 coexpression domain (CODs) in normal breast tissue. Interestingly, we find that genes within CODs often are coexpressed with other genes that reside in other CODs placed very far away in the same chromosome, which is indicative of long-range inter-COD co-regulation. Furthermore, we find that coexpressed genes within CODs or within co-regulated CODs display similar three-dimensional chromatin contacts, suggesting a spatial coordination of CODs.
Collapse
Affiliation(s)
- María E. Soler-Oliva
- Centro Andaluz de Biología Molecular y Medicina Regenerativa-CABIMER, Consejo Superior de Investigaciones Científicas-Universidad de Sevilla-Universidad Pablo de Olavide (CSIC-USE-UPO), Sevilla, Spain
| | - José A. Guerrero-Martínez
- Centro Andaluz de Biología Molecular y Medicina Regenerativa-CABIMER, Consejo Superior de Investigaciones Científicas-Universidad de Sevilla-Universidad Pablo de Olavide (CSIC-USE-UPO), Sevilla, Spain
| | - Valentina Bachetti
- Centro Andaluz de Biología Molecular y Medicina Regenerativa-CABIMER, Consejo Superior de Investigaciones Científicas-Universidad de Sevilla-Universidad Pablo de Olavide (CSIC-USE-UPO), Sevilla, Spain
| | - José C. Reyes
- Centro Andaluz de Biología Molecular y Medicina Regenerativa-CABIMER, Consejo Superior de Investigaciones Científicas-Universidad de Sevilla-Universidad Pablo de Olavide (CSIC-USE-UPO), Sevilla, Spain
- * E-mail:
| |
Collapse
|
31
|
Babbi G, Martelli PL, Profiti G, Bovo S, Savojardo C, Casadio R. eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes. BMC Genomics 2017; 18:554. [PMID: 28812536 PMCID: PMC5558190 DOI: 10.1186/s12864-017-3911-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in the compilation of lists of genes related to diseases and show that an increasing number of diseases is associated with multiple genes. Investigating functional relations among genes associated with the same disease contributes to highlighting molecular mechanisms of the pathogenesis. RESULTS We present eDGAR, a database collecting and organizing the data on gene/disease associations as derived from OMIM, Humsavar and ClinVar. For each disease-associated gene, eDGAR collects information on its annotation. Specifically, for lists of genes, eDGAR provides information on: i) interactions retrieved from PDB, BIOGRID and STRING; ii) co-occurrence in stable and functional structural complexes; iii) shared Gene Ontology annotations; iv) shared KEGG and REACTOME pathways; v) enriched functional annotations computed with NET-GE; vi) regulatory interactions derived from TRRUST; vii) localization on chromosomes and/or co-localisation in neighboring loci. The present release of eDGAR includes 2672 diseases, related to 3658 different genes, for a total number of 5729 gene-disease associations. 71% of the genes are linked to 621 multigenic diseases and eDGAR highlights their common GO terms, KEGG/REACTOME pathways, physical and regulatory interactions. eDGAR includes a network based enrichment method for detecting statistically significant functional terms associated to groups of genes. CONCLUSIONS eDGAR offers a resource to analyze disease-gene associations. In multigenic diseases genes can share physical interactions and/or co-occurrence in the same functional processes. eDGAR is freely available at: edgar.biocomp.unibo.it.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | | | - Giuseppe Profiti
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | | | - Rita Casadio
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy.,Interdepartmental Center «Giorgio Prodi» for Cancer Research, University of Bologna, Bologna, Italy
| |
Collapse
|
32
|
Voloshanenko O, Gmach P, Winter J, Kranz D, Boutros M. Mapping of Wnt-Frizzled interactions by multiplex CRISPR targeting of receptor gene families. FASEB J 2017; 31:4832-4844. [PMID: 28733458 PMCID: PMC5636703 DOI: 10.1096/fj.201700144r] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2017] [Accepted: 07/05/2017] [Indexed: 12/19/2022]
Abstract
Signaling pathway modules are often encoded by several closely related paralogous genes that can have redundant roles and are therefore difficult to analyze by loss-of-function analysis. A typical example is the Wnt signaling pathway, which in mammals is mediated by 19 Wnt ligands that can bind to 10 Frizzled (FZD) receptors. Although significant progress in understanding Wnt-FZD receptor interactions has been made in recent years, tools to generate systematic interaction maps have been largely lacking. Here we generated cell lines with multiplex mutant alleles of FZD1, FZD2, and FZD7 and demonstrate that these cells are unresponsive to canonical Wnt ligands. Subsequently, we performed genetic rescue experiments with combinations of FZDs and canonical Wnts to create a functional ligand–receptor interaction map. These experiments showed that whereas several Wnt ligands, such as Wnt3a, induce signaling through a broad spectrum of FZD receptors, others, such as Wnt8a, act through a restricted set of FZD genes. Together, our results map functional interactions of FZDs and 10 Wnt ligands and demonstrate how multiplex targeting by clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 can be used to systematically elucidate the functions of multigene families.—Voloshanenko, O., Gmach, P., Winter, J., Kranz, D., Boutros, M. Mapping of Wnt-Frizzled interactions by multiplex CRISPR targeting of receptor gene families.
Collapse
Affiliation(s)
- Oksana Voloshanenko
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany; and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Philipp Gmach
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany; and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Jan Winter
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany; and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Dominique Kranz
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany; and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Michael Boutros
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany; and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| |
Collapse
|
33
|
Germline BRCA2 mutations drive prostate cancers with distinct evolutionary trajectories. Nat Commun 2017; 8:13671. [PMID: 28067867 PMCID: PMC5227331 DOI: 10.1038/ncomms13671] [Citation(s) in RCA: 164] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 10/20/2016] [Indexed: 12/12/2022] Open
Abstract
Germline mutations in the BRCA2 tumour suppressor are associated with both an increased lifetime risk of developing prostate cancer (PCa) and increased risk of aggressive disease. To understand this aggression, here we profile the genomes and methylomes of localized PCa from 14 carriers of deleterious germline BRCA2 mutations (BRCA2-mutant PCa). We show that BRCA2-mutant PCa harbour increased genomic instability and a mutational profile that more closely resembles metastastic than localized disease. BRCA2-mutant PCa shows genomic and epigenomic dysregulation of the MED12L/MED12 axis, which is frequently dysregulated in metastatic castration-resistant prostate cancer (mCRPC). This dysregulation is enriched in BRCA2-mutant PCa harbouring intraductal carcinoma (IDC). Microdissection and sequencing of IDC and juxtaposed adjacent non-IDC invasive carcinoma in 10 patients demonstrates a common ancestor to both histopathologies. Overall we show that localized castration-sensitive BRCA2-mutant tumours are uniquely aggressive, due to de novo aberration in genes usually associated with metastatic disease, justifying aggressive initial treatment.
Men that carrier BRCA2 germline mutations are at risk of developing prostate cancer. Here, the authors analyse the genomes of prostate cancer from these individuals and demonstrate increased genomic instability in comparison to sporadic prostate cancer.
Collapse
|
34
|
Genomic hallmarks of localized, non-indolent prostate cancer. Nature 2017; 541:359-364. [PMID: 28068672 DOI: 10.1038/nature20788] [Citation(s) in RCA: 409] [Impact Index Per Article: 58.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 11/14/2016] [Indexed: 12/25/2022]
Abstract
Prostate tumours are highly variable in their response to therapies, but clinically available prognostic factors can explain only a fraction of this heterogeneity. Here we analysed 200 whole-genome sequences and 277 additional whole-exome sequences from localized, non-indolent prostate tumours with similar clinical risk profiles, and carried out RNA and methylation analyses in a subset. These tumours had a paucity of clinically actionable single nucleotide variants, unlike those seen in metastatic disease. Rather, a significant proportion of tumours harboured recurrent non-coding aberrations, large-scale genomic rearrangements, and alterations in which an inversion repressed transcription within its boundaries. Local hypermutation events were frequent, and correlated with specific genomic profiles. Numerous molecular aberrations were prognostic for disease recurrence, including several DNA methylation events, and a signature comprised of these aberrations outperformed well-described prognostic biomarkers. We suggest that intensified treatment of genomically aggressive localized prostate cancer may improve cure rates.
Collapse
|
35
|
Waks Z, Weissbrod O, Carmeli B, Norel R, Utro F, Goldschmidt Y. Driver gene classification reveals a substantial overrepresentation of tumor suppressors among very large chromatin-regulating proteins. Sci Rep 2016; 6:38988. [PMID: 28008934 PMCID: PMC5180091 DOI: 10.1038/srep38988] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 11/07/2016] [Indexed: 12/18/2022] Open
Abstract
Compiling a comprehensive list of cancer driver genes is imperative for oncology diagnostics and drug development. While driver genes are typically discovered by analysis of tumor genomes, infrequently mutated driver genes often evade detection due to limited sample sizes. Here, we address sample size limitations by integrating tumor genomics data with a wide spectrum of gene-specific properties to search for rare drivers, functionally classify them, and detect features characteristic of driver genes. We show that our approach, CAnceR geNe similarity-based Annotator and Finder (CARNAF), enables detection of potentially novel drivers that eluded over a dozen pan-cancer/multi-tumor type studies. In particular, feature analysis reveals a highly concentrated pool of known and putative tumor suppressors among the <1% of genes that encode very large, chromatin-regulating proteins. Thus, our study highlights the need for deeper characterization of very large, epigenetic regulators in the context of cancer causality.
Collapse
Affiliation(s)
- Zeev Waks
- Machine Learning for Healthcare and Life Sciences, IBM Research - Haifa, Mount Carmel Campus, Israel
| | - Omer Weissbrod
- Machine Learning for Healthcare and Life Sciences, IBM Research - Haifa, Mount Carmel Campus, Israel
| | - Boaz Carmeli
- Machine Learning for Healthcare and Life Sciences, IBM Research - Haifa, Mount Carmel Campus, Israel
| | - Raquel Norel
- Computational Biology Center, IBM T. J. Watson Research, Yorktown Heights, NY 10598, USA
| | - Filippo Utro
- Computational Biology Center, IBM T. J. Watson Research, Yorktown Heights, NY 10598, USA
| | - Yaara Goldschmidt
- Machine Learning for Healthcare and Life Sciences, IBM Research - Haifa, Mount Carmel Campus, Israel
| |
Collapse
|
36
|
Karathia H, Kingsford C, Girvan M, Hannenhalli S. A pathway-centric view of spatial proximity in the 3D nucleome across cell lines. Sci Rep 2016; 6:39279. [PMID: 27976707 PMCID: PMC5157015 DOI: 10.1038/srep39279] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 11/22/2016] [Indexed: 12/29/2022] Open
Abstract
In various contexts, spatially proximal genes have been shown to be functionally related. However, the extent to which spatial proximity of genes in a pathway contributes to the pathway's context-specific activity is not known. Leveraging Hi-C data in six human cell-lines, we show that spatial proximity of genes in a pathway is highly correlated with the pathway's context-specific expression and function. Furthermore, spatial proximity of pathway genes correlates with interactions of their protein products, and the specific pathway genes that are proximal to one another tend to occupy higher levels in the regulatory hierarchy. In addition to intra-pathway proximity, related pathways are spatially proximal to one another and housekeeping-genes tend to be proximal to several other pathways suggesting their coordinating role. Substantially extending previous works, our study reveals a pathway-centric organization of 3D-nucleome, whereby, functionally related interacting driver genes tend to be in spatial-proximity in a context-specific manner.
Collapse
Affiliation(s)
- Hiren Karathia
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Michelle Girvan
- Department of Physics, University of Maryland, College Park, MD, USA
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
37
|
Quintero-Cadena P, Sternberg PW. Enhancer Sharing Promotes Neighborhoods of Transcriptional Regulation Across Eukaryotes. G3 (BETHESDA, MD.) 2016; 6:4167-4174. [PMID: 27799341 PMCID: PMC5144984 DOI: 10.1534/g3.116.036228] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/15/2016] [Indexed: 01/08/2023]
Abstract
Enhancers physically interact with transcriptional promoters, looping over distances that can span multiple regulatory elements. Given that enhancer-promoter (EP) interactions generally occur via common protein complexes, it is unclear whether EP pairing is predominantly deterministic or proximity guided. Here, we present cross-organismic evidence suggesting that most EP pairs are compatible, largely determined by physical proximity rather than specific interactions. By reanalyzing transcriptome datasets, we find that the transcription of gene neighbors is correlated over distances that scale with genome size. We experimentally show that nonspecific EP interactions can explain such correlation, and that EP distance acts as a scaling factor for the transcriptional influence of an enhancer. We propose that enhancer sharing is commonplace among eukaryotes, and that EP distance is an important layer of information in gene regulation.
Collapse
Affiliation(s)
- Porfirio Quintero-Cadena
- Division of Biology and Biological Engineering, California Institute of Technology, Howard Hughes Medical Institute, Pasadena, California 91125
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Howard Hughes Medical Institute, Pasadena, California 91125
| |
Collapse
|
38
|
Perry BR, Assis R. CDROM: Classification of Duplicate gene RetentiOn Mechanisms. BMC Evol Biol 2016; 16:82. [PMID: 27080514 PMCID: PMC4832533 DOI: 10.1186/s12862-016-0644-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 03/24/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene duplication is a major source of new genes that is thought to play an important role in phenotypic innovation. Though several mechanisms have been hypothesized to drive the functional evolution and long-term retention of duplicate genes, there are currently no software tools for assessing their genome-wide contributions. Thus, the evolutionary mechanisms by which duplicate genes acquire novel functions remain unclear in a number of taxa. RESULTS In a recent study, researchers developed a phylogenetic approach that uses gene expression data from two species to classify the mechanisms underlying the retention of duplicate genes (Proc Natl Acad Sci USA 110:1740917414, 2013). We have implemented their classification method, as well as a more generalized method, in the R package CDROM, enabling users to apply these methods to their data and gain insights into the origin of novel biological functions after gene duplication. The CDROM R package, source code, and user manual for the R package are available for download from CRAN at https://cran.rstudio.com/web/packages/CDROM/ . Additionally, the CDROM R source code, user manual for running CDROM from the source code, and sample dataset used in this manuscript can be accessed at www.personal.psu.edu/rua15/software.html . CONCLUSIONS CDROM is the first software package that enables genome-wide classification of the mechanisms driving the long-term retention of duplicate genes. It is user-friendly and flexible, providing researchers with a tool for studying the functional evolution of duplicate genes in a variety of taxa.
Collapse
Affiliation(s)
- Brent R Perry
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Raquel Assis
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
39
|
Barrera LA, Vedenko A, Kurland JV, Rogers JM, Gisselbrecht SS, Rossin EJ, Woodard J, Mariani L, Kock KH, Inukai S, Siggers T, Shokri L, Gordân R, Sahni N, Cotsapas C, Hao T, Yi S, Kellis M, Daly MJ, Vidal M, Hill DE, Bulyk ML. Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 2016; 351:1450-1454. [PMID: 27013732 PMCID: PMC4825693 DOI: 10.1126/science.aad2257] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 02/18/2016] [Indexed: 12/13/2022]
Abstract
Sequencing of exomes and genomes has revealed abundant genetic variation affecting the coding sequences of human transcription factors (TFs), but the consequences of such variation remain largely unexplored. We developed a computational, structure-based approach to evaluate TF variants for their impact on DNA binding activity and used universal protein-binding microarrays to assay sequence-specific DNA binding activity across 41 reference and 117 variant alleles found in individuals of diverse ancestries and families with Mendelian diseases. We found 77 variants in 28 genes that affect DNA binding affinity or specificity and identified thousands of rare alleles likely to alter the DNA binding activity of human sequence-specific TFs. Our results suggest that most individuals have unique repertoires of TF DNA binding activities, which may contribute to phenotypic variation.
Collapse
Affiliation(s)
- Luis A. Barrera
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
- Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School, Boston, MA 02115, USA
- Computer Science and Artificial Intelligence Laboratory,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Anastasia Vedenko
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Jesse V. Kurland
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Julia M. Rogers
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
| | - Stephen S. Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Elizabeth J. Rossin
- Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School, Boston, MA 02115, USA
- Analytic and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
| | - Jaie Woodard
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
| | - Luca Mariani
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Kian Hong Kock
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Program in Biological and Biomedical Sciences, Harvard
University, Cambridge, MA 02138, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Trevor Siggers
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Leila Shokri
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Raluca Gordân
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Nidhi Sahni
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Chris Cotsapas
- Analytic and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
| | - Tong Hao
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Song Yi
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
- Center for Human Genetics Research and Center for
Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA
02114, USA
| | - Marc Vidal
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - David E. Hill
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
- Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
- Program in Biological and Biomedical Sciences, Harvard
University, Cambridge, MA 02138, USA
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Pathology, Brigham and Women's Hospital
and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
40
|
Yang L, Wang S, Zhou M, Chen X, Zuo Y, Sun D, Lv Y. Comparative analysis of housekeeping and tissue-selective genes in human based on network topologies and biological properties. Mol Genet Genomics 2016; 291:1227-41. [PMID: 26897376 DOI: 10.1007/s00438-016-1178-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/26/2016] [Indexed: 01/14/2023]
Abstract
Housekeeping genes are genes that are turned on most of the time in almost every tissue to maintain cellular functions. Tissue-selective genes are predominantly expressed in one or a few biologically relevant tissue types. Benefitting from the massive gene expression microarray data obtained over the past decades, the properties of housekeeping and tissue-selective genes can now be investigated on a large-scale manner. In this study, we analyzed the topological properties of housekeeping and tissue-selective genes in the protein-protein interaction (PPI) network. Furthermore, we compared the biological properties and amino acid usage between these two gene groups. The results indicated that there were significant differences in topological properties between housekeeping and tissue-selective genes in the PPI network, and housekeeping genes had higher centrality properties and may play important roles in the complex biological network environment. We also found that there were significant differences in multiple biological properties and many amino acid compositions. The functional genes enrichment and subcellular localizations analysis was also performed to investigate the characterization of housekeeping and tissue-selective genes. The results indicated that the two gene groups showed significant different enrichment in drug targets, disease genes and toxin targets, and located in different subcellular localizations. At last, the discriminations between the properties of two gene groups were measured by the F-score, and expression stage had the most discriminative index in all properties. These findings may elucidate the biological mechanisms for understanding housekeeping and tissue-selective genes and may contribute to better annotate housekeeping and tissue-selective genes in other organisms.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The National Research Center for Animal Transgenic Biotechnology, Inner Mongolia University, Hohhot, 010021, China
| | - Dianjun Sun
- Center for Endemic Disease Control, Chinese Center for Disease Control and Prevention, Harbin Medical University, Harbin, 150081, China.
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
41
|
Agarwal D, Qi Y, Jiang T, Liu X, Shi W, Wali VB, Turk B, Ross JS, Fraser Symmans W, Pusztai L, Hatzis C. Characterization of DNA variants in the human kinome in breast cancer. Sci Rep 2015; 5:14736. [PMID: 26420498 PMCID: PMC4588561 DOI: 10.1038/srep14736] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 09/07/2015] [Indexed: 02/04/2023] Open
Abstract
Kinases play a key role in cancer biology, and serve as potential clinically useful targets for designing cancer therapies. We examined nucleic acid variations in the human kinome and several known cancer-related genes in breast cancer. DNA was extracted from fine needle biopsies of 73 primary breast cancers and 19 metastatic lesions. Targeted sequencing of 518 kinases and 68 additional cancer related genes was performed using the SOLiD sequencing platform. We detected 1561 unique, non-synonymous variants in kinase genes in the 92 cases, and 74 unique variants in 43 kinases that were predicted to have major functional impact on the protein. Three kinase groups—CMGC, STE and TKL—showed greater mutational load in metastatic compared to primary cancer samples, however, after correction for multiple testing the difference was significant only for the TKL group (P = 0.04). We also observed that a higher proportion of histologic grade 1 and 2 cases had high functional impact variants in the SCYL2 gene compared with grade 3 cases. Our findings indicate that individual breast cancers harbor a substantial number of potentially functionally important nucleotide variations in kinase genes, most of which are present in unique combinations and include both somatic and germline functional variants.
Collapse
Affiliation(s)
- Divyansh Agarwal
- Department of Breast Medical Oncology of Yale University, New Haven, CT, USA.,Molecular, Cellular and Developmental Biology of Yale University, New Haven, CT, USA
| | - Yuan Qi
- Department of Quantitative Sciences of the University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Tingting Jiang
- Department of Breast Medical Oncology of Yale University, New Haven, CT, USA
| | - Xiuping Liu
- Experimental Therapeutics of the University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Weiwei Shi
- Department of Breast Medical Oncology of Yale University, New Haven, CT, USA
| | - Vikram B Wali
- Department of Breast Medical Oncology of Yale University, New Haven, CT, USA
| | - Benjamin Turk
- Department of Pharmacology of Yale University, New Haven, CT, USA
| | - Jeffrey S Ross
- Department of Pathology and Laboratory Medicine, Albany Medical College, Albany, NY, USA.,Foundation Medicine, Cambridge, MA, USA
| | - W Fraser Symmans
- Pathology of the University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Lajos Pusztai
- Department of Breast Medical Oncology of Yale University, New Haven, CT, USA
| | - Christos Hatzis
- Department of Breast Medical Oncology of Yale University, New Haven, CT, USA
| |
Collapse
|
42
|
Girgis HZ. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 2015. [PMID: 26206263 PMCID: PMC4513396 DOI: 10.1186/s12859-015-0654-5] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background With rapid advancements in technology, the sequences of thousands of species’ genomes are becoming available. Within the sequences are repeats that comprise significant portions of genomes. Successful annotations thus require accurate discovery of repeats. As species-specific elements, repeats in newly sequenced genomes are likely to be unknown. Therefore, annotating newly sequenced genomes requires tools to discover repeats de-novo. However, the currently available de-novo tools have limitations concerning the size of the input sequence, ease of use, sensitivities to major types of repeats, consistency of performance, speed, and false positive rate. Results To address these limitations, I designed and developed Red, applying Machine Learning. Red is the first repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker. On human genes with five or more copies, Red was more specific than RepeatScout by a wide margin. When tested on genomes of unusual nucleotide compositions, Red located repeats with high sensitivities and maintained moderate false positive rates. Red outperformed the related tools on a bacterial genome. Red identified 46,405 novel repetitive segments in the human genome. Finally, Red is capable of processing assembled and unassembled genomes. Conclusions Red’s innovative methodology and its excellent performance on seven different genomes represent a valuable advancement in the field of repeats discovery. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0654-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hani Z Girgis
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894, MD, USA. .,Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104, OK, USA.
| |
Collapse
|
43
|
Boutros PC, Fraser M, Harding NJ, de Borja R, Trudel D, Lalonde E, Meng A, Hennings-Yeomans PH, McPherson A, Sabelnykova VY, Zia A, Fox NS, Livingstone J, Shiah YJ, Wang J, Beck TA, Have CL, Chong T, Sam M, Johns J, Timms L, Buchner N, Wong A, Watson JD, Simmons TT, P'ng C, Zafarana G, Nguyen F, Luo X, Chu KC, Prokopec SD, Sykes J, Dal Pra A, Berlin A, Brown A, Chan-Seng-Yue MA, Yousif F, Denroche RE, Chong LC, Chen GM, Jung E, Fung C, Starmans MHW, Chen H, Govind SK, Hawley J, D'Costa A, Pintilie M, Waggott D, Hach F, Lambin P, Muthuswamy LB, Cooper C, Eeles R, Neal D, Tetu B, Sahinalp C, Stein LD, Fleshner N, Shah SP, Collins CC, Hudson TJ, McPherson JD, van der Kwast T, Bristow RG. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat Genet 2015; 47:736-45. [PMID: 26005866 DOI: 10.1038/ng.3315] [Citation(s) in RCA: 347] [Impact Index Per Article: 38.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 05/01/2015] [Indexed: 12/12/2022]
Abstract
Herein we provide a detailed molecular analysis of the spatial heterogeneity of clinically localized, multifocal prostate cancer to delineate new oncogenes or tumor suppressors. We initially determined the copy number aberration (CNA) profiles of 74 patients with index tumors of Gleason score 7. Of these, 5 patients were subjected to whole-genome sequencing using DNA quantities achievable in diagnostic biopsies, with detailed spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity in focal genomics. Multifocal tumors are highly heterogeneous for single-nucleotide variants (SNVs), CNAs and genomic rearrangements. We identified and validated a new recurrent amplification of MYCL, which is associated with TP53 deletion and unique profiles of DNA damage and transcriptional dysregulation. Moreover, we demonstrate divergent tumor evolution in multifocal cancer and, in some cases, tumors of independent clonal origin. These data represent the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcome and inform the development of novel biomarkers that reflect individual prognosis.
Collapse
Affiliation(s)
- Paul C Boutros
- 1] Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
| | - Michael Fraser
- Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | | | | | - Dominique Trudel
- Department of Pathology and Laboratory Medicine, Toronto General Hospital, University Health Network, Toronto, Ontario, Canada
| | - Emilie Lalonde
- 1] Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Alice Meng
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
| | | | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | | | - Amin Zia
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Natalie S Fox
- 1] Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | | - Yu-Jia Shiah
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Jianxin Wang
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Timothy A Beck
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Cherry L Have
- Department of Pathology and Laboratory Medicine, Toronto General Hospital, University Health Network, Toronto, Ontario, Canada
| | - Taryne Chong
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Michelle Sam
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Jeremy Johns
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Lee Timms
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Ada Wong
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - John D Watson
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Trent T Simmons
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Christine P'ng
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Gaetano Zafarana
- Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Francis Nguyen
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Xuemei Luo
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Kenneth C Chu
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Jenna Sykes
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Alan Dal Pra
- Department of Radiation Oncology, University of Toronto, Toronto, Ontario, Canada
| | - Alejandro Berlin
- Department of Radiation Oncology, University of Toronto, Toronto, Ontario, Canada
| | - Andrew Brown
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Fouad Yousif
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Lauren C Chong
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Gregory M Chen
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Esther Jung
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Clement Fung
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Hanbo Chen
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - James Hawley
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Alister D'Costa
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Melania Pintilie
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Daryl Waggott
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Philippe Lambin
- Department of Radiotherapy, Maastricht University, Maastricht, the Netherlands
| | | | - Colin Cooper
- 1] Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton, UK. [2] Department of Biological Sciences, University of East Anglia, Norwich, UK. [3] School of Medicine, University of East Anglia, Norwich, UK
| | - Rosalind Eeles
- 1] Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton, UK. [2] Royal Marsden National Health Service (NHS) Foundation Trust, London and Sutton, UK
| | - David Neal
- 1] Urological Research Laboratory, Cancer Research UK Cambridge Research Institute, Cambridge, UK. [2] Department of Surgical Oncology, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Bernard Tetu
- Department of Pathology, Laval University, Quebec City, Quebec, Canada
| | - Cenk Sahinalp
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Lincoln D Stein
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Neil Fleshner
- Division of Urology, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Sohrab P Shah
- 1] Department of Pathology, University of British Columbia, Vancouver, British Columbia, Canada. [2] Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada. [3] British Columbia Cancer Agency Research Centre, Vancouver, British Columbia, Canada
| | - Colin C Collins
- 1] Department of Urologic Sciences, University of British Columbia, Vancouver, British Columbia, Canada. [2] Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada
| | - Thomas J Hudson
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Theodorus van der Kwast
- Department of Pathology and Laboratory Medicine, Toronto General Hospital, University Health Network, Toronto, Ontario, Canada
| | - Robert G Bristow
- 1] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [2] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. [3] Department of Radiation Oncology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
44
|
Li W, Freudenberg J, Oswald M. Principles for the organization of gene-sets. Comput Biol Chem 2015; 59 Pt B:139-49. [PMID: 26188561 DOI: 10.1016/j.compbiolchem.2015.04.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/08/2015] [Indexed: 12/23/2022]
Abstract
A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| | - Michaela Oswald
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| |
Collapse
|
45
|
Weinreb I, Piscuoglio S, Martelotto LG, Waggott D, Ng CKY, Perez-Ordonez B, Harding NJ, Alfaro J, Chu KC, Viale A, Fusco N, da Cruz Paula A, Marchio C, Sakr RA, Lim R, Thompson LDR, Chiosea SI, Seethala RR, Skalova A, Stelow EB, Fonseca I, Assaad A, How C, Wang J, de Borja R, Chan-Seng-Yue M, Howlett CJ, Nichols AC, Wen YH, Katabi N, Buchner N, Mullen L, Kislinger T, Wouters BG, Liu FF, Norton L, McPherson JD, Rubin BP, Clarke BA, Weigelt B, Boutros PC, Reis-Filho JS. Hotspot activating PRKD1 somatic mutations in polymorphous low-grade adenocarcinomas of the salivary glands. Nat Genet 2014; 46:1166-9. [PMID: 25240283 DOI: 10.1038/ng.3096] [Citation(s) in RCA: 159] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 08/27/2014] [Indexed: 12/15/2022]
Abstract
Polymorphous low-grade adenocarcinoma (PLGA) is the second most frequent type of malignant tumor of the minor salivary glands. We identified PRKD1 hotspot mutations encoding p.Glu710Asp in 72.9% of PLGAs but not in other salivary gland tumors. Functional studies demonstrated that this kinase-activating alteration likely constitutes a driver of PLGA.
Collapse
Affiliation(s)
- Ilan Weinreb
- Department of Pathology, University Health Network, Toronto, Ontario, Canada
| | - Salvatore Piscuoglio
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Luciano G Martelotto
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Daryl Waggott
- 1] Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Onatrio, Canada. [3] Campbell Family Institute for Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Charlotte K Y Ng
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | | | - Nicholas J Harding
- Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Javier Alfaro
- 1] Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Onatrio, Canada. [3] Campbell Family Institute for Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. [4] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Kenneth C Chu
- Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Agnes Viale
- Integrated Genomics Operation, Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Nicola Fusco
- 1] Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA. [2] School of Pathology, University of Milan, Milan, Italy
| | - Arnaud da Cruz Paula
- 1] Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA. [2] Instituto Português de Oncologia, Oporto, Portugal
| | - Caterina Marchio
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Rita A Sakr
- Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Raymond Lim
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Lester D R Thompson
- Department of Pathology, Kaiser Permanente, Woodland Hills Medical Center, Woodland Hills, California, USA
| | - Simion I Chiosea
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Raja R Seethala
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Alena Skalova
- Department of Pathology and Laboratory Medicine, Charles University in Prague, Plzen, Czech Republic
| | - Edward B Stelow
- Department of Pathology, University of Virginia Medical Center, Charlottesville, Virginia, USA
| | - Isabel Fonseca
- 1] Instituto Português de Oncologia Francisco Gentil, Lisbon, Portugal. [2] Faculdade de Medicina de Lisboa, Lisbon, Portugal
| | - Adel Assaad
- Department of Pathology, Virginia Mason Hospital and Seattle Medical Center, Seattle, Washington, USA
| | - Christine How
- 1] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Onatrio, Canada. [2] Campbell Family Institute for Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Jianxin Wang
- Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Richard de Borja
- Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Michelle Chan-Seng-Yue
- Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | | | - Y Hannah Wen
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Nora Katabi
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Nicholas Buchner
- Cancer Genomics Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Laura Mullen
- Cancer Genomics Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Thomas Kislinger
- 1] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Onatrio, Canada. [2] Campbell Family Institute for Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. [3] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Bradly G Wouters
- 1] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Onatrio, Canada. [2] Campbell Family Institute for Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. [3] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Fei-Fei Liu
- 1] Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Onatrio, Canada. [2] Campbell Family Institute for Cancer Research, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. [3] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [4] Department of Radiation Oncology, Princess Margaret Hospital and University of Toronto, Toronto, Ontario, Canada
| | - Larry Norton
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - John D McPherson
- 1] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [2] Department of Pathology, Virginia Mason Hospital and Seattle Medical Center, Seattle, Washington, USA
| | - Brian P Rubin
- 1] Department of Molecular Genetics, Lerner Research Institute, Cleveland, Ohio, USA. [2] Robert J. Tomsich Pathology and Laboratory Medicine Institute, Taussig Cancer Center, Cleveland Clinic, Cleveland, Ohio, USA
| | - Blaise A Clarke
- Department of Pathology, University Health Network, Toronto, Ontario, Canada
| | - Britta Weigelt
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Paul C Boutros
- 1] Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
| | - Jorge S Reis-Filho
- 1] Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA. [2]
| |
Collapse
|
46
|
Modular organization of cancer signaling networks is associated with patient survivability. Biosystems 2013; 113:149-54. [DOI: 10.1016/j.biosystems.2013.06.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 06/13/2013] [Accepted: 06/16/2013] [Indexed: 01/28/2023]
|
47
|
Zhou T, Hu Z, Zhou Z, Guo X, Sha J. Genome-wide analysis of human hotspot intersected genes highlights the roles of meiotic recombination in evolution and disease. BMC Genomics 2013; 14:67. [PMID: 23368819 PMCID: PMC3620679 DOI: 10.1186/1471-2164-14-67] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Accepted: 01/29/2013] [Indexed: 11/21/2022] Open
Abstract
Background Meiotic recombination events are not randomly located, but rather cluster at hotspot regions. Recently, the fine-scale mapping of genome-wide human recombination hotspots was performed. Here, we systematically analyzed the evolutionary and disease-associated features of hotspots that overlapped with protein-coding genes. Results In this study, we defined hotspot intersected genes as HI genes. We found that HI genes were prone to be located in the extracellular part and were functionally enriched in cell-to-cell communication. Tissue-specific genes and secreted protein encoding genes were overrepresented in HI genes, while housekeeping genes were underrepresented. Compared to slowly evolving housekeeping genes and random genes with lower recombination rates, HI genes evolved faster. The fact that brain and blood specific genes were overrepresented in HI genes indicates that they may be involved in the evolution of human intelligence and the immune system. We also found that genes related to disease were enriched in HI genes, especially genes with disease-associated chromosomal rearrangements. Hotspot sequence motifs were overrepresented in common sequences of HI genes and genes with disease-associated chromosomal rearrangements. We further listed repeat elements that were enriched both in hotspots and genes with disease-associated chromosomal rearrangements. Conclusion HI genes are evolving and may be involved in the generation of key features of human during evolution. Disease-associated genes may be by-products of meiotic recombination. In addition, hotspot sequence motifs and repeat elements showed the connection between meiotic recombination and genes with disease-associated chromosomal rearrangements at the sequence level. Our study will enable us to better understand the evolutionary and biological significance of human meiotic recombination.
Collapse
Affiliation(s)
- Tao Zhou
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, 140 Hanzhong Road, Nanjing, Jiangsu Province 210029, People's Republic of China
| | | | | | | | | |
Collapse
|