1
|
Canavati C, Sherill-Rofe D, Kamal L, Bloch I, Zahdeh F, Sharon E, Terespolsky B, Allan IA, Rabie G, Kawas M, Kassem H, Avraham KB, Renbaum P, Levy-Lahad E, Kanaan M, Tabach Y. Using multi-scale genomics to associate poorly annotated genes with rare diseases. Genome Med 2024; 16:4. [PMID: 38178268 PMCID: PMC10765705 DOI: 10.1186/s13073-023-01276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 12/15/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) has significantly transformed the landscape of identifying disease-causing genes associated with genetic disorders. However, a substantial portion of sequenced patients remains undiagnosed. This may be attributed not only to the challenges posed by harder-to-detect variants, such as non-coding and structural variations but also to the existence of variants in genes not previously associated with the patient's clinical phenotype. This study introduces EvORanker, an algorithm that integrates unbiased data from 1,028 eukaryotic genomes to link mutated genes to clinical phenotypes. METHODS EvORanker utilizes clinical data, multi-scale phylogenetic profiling, and other omics data to prioritize disease-associated genes. It was evaluated on solved exomes and simulated genomes, compared with existing methods, and applied to 6260 knockout genes with mouse phenotypes lacking human associations. Additionally, EvORanker was made accessible as a user-friendly web tool. RESULTS In the analyzed exomic cohort, EvORanker accurately identified the "true" disease gene as the top candidate in 69% of cases and within the top 5 candidates in 95% of cases, consistent with results from the simulated dataset. Notably, EvORanker outperformed existing methods, particularly for poorly annotated genes. In the case of the 6260 knockout genes with mouse phenotypes, EvORanker linked 41% of these genes to observed human disease phenotypes. Furthermore, in two unsolved cases, EvORanker successfully identified DLGAP2 and LPCAT3 as disease candidates for previously uncharacterized genetic syndromes. CONCLUSIONS We highlight clade-based phylogenetic profiling as a powerful systematic approach for prioritizing potential disease genes. Our study showcases the efficacy of EvORanker in associating poorly annotated genes to disease phenotypes observed in patients. The EvORanker server is freely available at https://ccanavati.shinyapps.io/EvORanker/ .
Collapse
Affiliation(s)
- Christina Canavati
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
| | - Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Lara Kamal
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Fouad Zahdeh
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Batel Terespolsky
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
| | - Islam Abu Allan
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
| | - Grace Rabie
- Hereditary Research Laboratory and Department of Life Sciences, Bethlehem University, Bethlehem, 72372, Palestine
| | - Mariana Kawas
- Hereditary Research Laboratory and Department of Life Sciences, Bethlehem University, Bethlehem, 72372, Palestine
| | - Hanin Kassem
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
| | - Karen B Avraham
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Paul Renbaum
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
| | - Ephrat Levy-Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Moien Kanaan
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
- Hereditary Research Laboratory and Department of Life Sciences, Bethlehem University, Bethlehem, 72372, Palestine
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel.
| |
Collapse
|
2
|
Gao S, Chen S, Yang M, Wu J, Chen S, Li H. Mining salt stress-related genes in Spartina alterniflora via analyzing co-evolution signal across 365 plant species using phylogenetic profiling. ABIOTECH 2023; 4:291-302. [PMID: 38106430 PMCID: PMC10721760 DOI: 10.1007/s42994-023-00125-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/23/2023] [Indexed: 12/19/2023]
Abstract
With the increasing number of sequenced species, phylogenetic profiling (PP) has become a powerful method to predict functional genes based on co-evolutionary information. However, its potential in plant genomics has not yet been fully explored. In this context, we combined the power of machine learning and PP to identify salt stress-related genes in a halophytic grass, Spartina alterniflora, using evolutionary information generated from 365 plant species. Our results showed that the genes highly co-evolved with known salt stress-related genes are enriched in biological processes of ion transport, detoxification and metabolic pathways. For ion transport, five identified genes coding two sodium and three potassium transporters were validated to be able to uptake Na+. In addition, we identified two orthologs of trichome-related AtR3-MYB genes, SaCPC1 and SaCPC2, which may be involved in salinity responses. Genes co-evolved with SaCPCs were enriched in functions related to the circadian rhythm and abiotic stress responses. Overall, this work demonstrates the feasibility of mining salt stress-related genes using evolutionary information, highlighting the potential of PP as a valuable tool for plant functional genomics. Supplementary Information The online version contains supplementary material available at 10.1007/s42994-023-00125-5.
Collapse
Affiliation(s)
- Shang Gao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
| | - Shoukun Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572024 China
| | - Maogeng Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
- Key Laboratory of Plant Molecular & Developmental Biology, College of Life Sciences, Yantai University, Yantai, 264005 China
| | - Jinran Wu
- The Institute for Learning Sciences and Teacher Education, Australian Catholic University, Brisbane, QLD 4001 Australia
| | - Shihua Chen
- Key Laboratory of Plant Molecular & Developmental Biology, College of Life Sciences, Yantai University, Yantai, 264005 China
| | - Huihui Li
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
| |
Collapse
|
3
|
Langschied F, Leisegang MS, Brandes RP, Ebersberger I. ncOrtho: efficient and reliable identification of miRNA orthologs. Nucleic Acids Res 2023; 51:e71. [PMID: 37260093 PMCID: PMC10359484 DOI: 10.1093/nar/gkad467] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators that finetune gene expression via translational repression or degradation of their target mRNAs. Despite their functional relevance, frameworks for the scalable and accurate detection of miRNA orthologs are missing. Consequently, there is still no comprehensive picture of how miRNAs and their associated regulatory networks have evolved. Here we present ncOrtho, a synteny informed pipeline for the targeted search of miRNA orthologs in unannotated genome sequences. ncOrtho matches miRNA annotations from multi-tissue transcriptomes in precision, while scaling to the analysis of hundreds of custom-selected species. The presence-absence pattern of orthologs to 266 human miRNA families across 402 vertebrate species reveals four bursts of miRNA acquisition, of which the most recent event occurred in the last common ancestor of higher primates. miRNA families are rarely modified or lost, but notable exceptions for both events exist. miRNA co-ortholog numbers faithfully indicate lineage-specific whole genome duplications, and miRNAs are powerful markers for phylogenomic analyses. Their exceptionally low genetic diversity makes them suitable to resolve clades where the phylogenetic signal is blurred by incomplete lineage sorting of ancestral alleles. In summary, ncOrtho allows to routinely consider miRNAs in evolutionary analyses that were thus far reserved to protein-coding genes.
Collapse
Affiliation(s)
- Felix Langschied
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Matthias S Leisegang
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ralf P Brandes
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
4
|
Moi D, Dessimoz C. Phylogenetic profiling in eukaryotes comes of age. Proc Natl Acad Sci U S A 2023; 120:e2305013120. [PMID: 37126713 PMCID: PMC10175774 DOI: 10.1073/pnas.2305013120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Open
Affiliation(s)
- David Moi
- Department of Computational Biology, University of Lausanne, 1015Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015Lausanne, Switzerland
| |
Collapse
|
5
|
Dembech E, Malatesta M, De Rito C, Mori G, Cavazzini D, Secchi A, Morandin F, Percudani R. Identification of hidden associations among eukaryotic genes through statistical analysis of coevolutionary transitions. Proc Natl Acad Sci U S A 2023; 120:e2218329120. [PMID: 37043529 PMCID: PMC10120013 DOI: 10.1073/pnas.2218329120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 03/10/2023] [Indexed: 04/13/2023] Open
Abstract
Coevolution at the gene level, as reflected by correlated events of gene loss or gain, can be revealed by phylogenetic profile analysis. The optimal method and metric for comparing phylogenetic profiles, especially in eukaryotic genomes, are not yet established. Here, we describe a procedure suitable for large-scale analysis, which can reveal coevolution based on the assessment of the statistical significance of correlated presence/absence transitions between gene pairs. This metric can identify coevolution in profiles with low overall similarities and is not affected by similarities lacking coevolutionary information. We applied the procedure to a large collection of 60,912 orthologous gene groups (orthogroups) in 1,264 eukaryotic genomes extracted from OrthoDB. We found significant cotransition scores for 7,825 orthogroups associated in 2,401 coevolving modules linking known and unknown genes in protein complexes and biological pathways. To demonstrate the ability of the method to predict hidden gene associations, we validated through experiments the involvement of vertebrate malate synthase-like genes in the conversion of (S)-ureidoglycolate into glyoxylate and urea, the last step of purine catabolism. This identification explains the presence of glyoxylate cycle genes in metazoa and suggests an anaplerotic role of purine degradation in early eukaryotes.
Collapse
Affiliation(s)
- Elena Dembech
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| | - Marco Malatesta
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| | - Carlo De Rito
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| | - Giulia Mori
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| | - Davide Cavazzini
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| | - Andrea Secchi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| | - Francesco Morandin
- Department of Mathematical, Physical and Computer Sciences, University of Parma, Parma43124, Italy
| | - Riccardo Percudani
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma43124, Italy
| |
Collapse
|
6
|
Rickert CA, Lieleg O. Machine learning approaches for biomolecular, biophysical, and biomaterials research. BIOPHYSICS REVIEWS 2022; 3:021306. [PMID: 38505413 PMCID: PMC10914139 DOI: 10.1063/5.0082179] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 05/12/2022] [Indexed: 03/21/2024]
Abstract
A fluent conversation with a virtual assistant, person-tailored news feeds, and deep-fake images created within seconds-all those things that have been unthinkable for a long time are now a part of our everyday lives. What these examples have in common is that they are realized by different means of machine learning (ML), a technology that has fundamentally changed many aspects of the modern world. The possibility to process enormous amount of data in multi-hierarchical, digital constructs has paved the way not only for creating intelligent systems but also for obtaining surprising new insight into many scientific problems. However, in the different areas of biosciences, which typically rely heavily on the collection of time-consuming experimental data, applying ML methods is a bit more challenging: Here, difficulties can arise from small datasets and the inherent, broad variability, and complexity associated with studying biological objects and phenomena. In this Review, we give an overview of commonly used ML algorithms (which are often referred to as "machines") and learning strategies as well as their applications in different bio-disciplines such as molecular biology, drug development, biophysics, and biomaterials science. We highlight how selected research questions from those fields were successfully translated into machine readable formats, discuss typical problems that can arise in this context, and provide an overview of how to resolve those encountered difficulties.
Collapse
|
7
|
Jiang Y, Luo J, Huang D, Liu Y, Li DD. Machine Learning Advances in Microbiology: A Review of Methods and Applications. Front Microbiol 2022; 13:925454. [PMID: 35711777 PMCID: PMC9196628 DOI: 10.3389/fmicb.2022.925454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/09/2022] [Indexed: 12/18/2022] Open
Abstract
Microorganisms play an important role in natural material and elemental cycles. Many common and general biology research techniques rely on microorganisms. Machine learning has been gradually integrated with multiple fields of study. Machine learning, including deep learning, aims to use mathematical insights to optimize variational functions to aid microbiology using various types of available data to help humans organize and apply collective knowledge of various research objects in a systematic and scaled manner. Classification and prediction have become the main achievements in the development of microbial community research in the direction of computational biology. This review summarizes the application and development of machine learning and deep learning in the field of microbiology and shows and compares the advantages and disadvantages of different algorithm tools in four fields: microbiome and taxonomy, microbial ecology, pathogen and epidemiology, and drug discovery.
Collapse
|
8
|
Ji F, Bonilla G, Krykbaev R, Ruvkun G, Tabach Y, Sadreyev RI. DEPCOD: a tool to detect and visualize co-evolution of protein domains. Nucleic Acids Res 2022; 50:W246-W253. [PMID: 35536332 PMCID: PMC9252791 DOI: 10.1093/nar/gkac349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/26/2022] [Indexed: 11/14/2022] Open
Abstract
Proteins with similar phylogenetic patterns of conservation or loss across evolutionary taxa are strong candidates to work in the same cellular pathways or engage in physical or functional interactions. Our previously published tools implemented our method of normalized phylogenetic sequence profiling to detect functional associations between non-homologous proteins. However, many proteins consist of multiple protein domains subjected to different selective pressures, so using protein domain as the unit of analysis improves the detection of similar phylogenetic patterns. Here we analyze sequence conservation patterns across the whole tree of life for every protein domain from a set of widely studied organisms. The resulting new interactive webserver, DEPCOD (DEtection of Phylogenetically COrrelated Domains), performs searches with either a selected pre-defined protein domain or a user-supplied sequence as a query to detect other domains from the same organism that have similar conservation patterns. Top similarities on two evolutionary scales (the whole tree of life or eukaryotic genomes) are displayed along with known protein interactions and shared complexes, pathway enrichment among the hits, and detailed visualization of sources of detected similarities. DEPCOD reveals functional relationships between often non-homologous domains that could not be detected using whole-protein sequences. The web server is accessible at http://genetics.mgh.harvard.edu/DEPCOD.
Collapse
Affiliation(s)
- Fei Ji
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Gracia Bonilla
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Rustem Krykbaev
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
| | - Gary Ruvkun
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem 9112102, Israel
| | - Ruslan I Sadreyev
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Labes S, Stupp D, Wagner N, Bloch I, Lotem M, L Lahad E, Polak P, Pupko T, Tabach Y. Machine-learning of complex evolutionary signals improves classification of SNVs. NAR Genom Bioinform 2022; 4:lqac025. [PMID: 35402908 PMCID: PMC8988715 DOI: 10.1093/nargab/lqac025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 02/08/2022] [Accepted: 03/28/2022] [Indexed: 12/12/2022] Open
Abstract
Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
Collapse
Affiliation(s)
- Sapir Labes
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Michal Lotem
- Sharett Institute of Oncology, Hadassah University Medical Center, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Ephrat L Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem9103102, Israel
| | - Paz Polak
- Oncological Sciences, Icahn School of Medicine at Mount Sinai, NY10029, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| |
Collapse
|
10
|
Sherill-Rofe D, Raban O, Findlay S, Rahat D, Unterman I, Samiei A, Yasmeen A, Kaiser Z, Kuasne H, Park M, Foulkes WD, Bloch I, Zick A, Gotlieb WH, Tabach Y, Orthwein A. Multi-omics data integration analysis identifies the spliceosome as a key regulator of DNA double-strand break repair. NAR Cancer 2022; 4:zcac013. [PMID: 35399185 PMCID: PMC8991968 DOI: 10.1093/narcan/zcac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 02/25/2022] [Accepted: 03/23/2022] [Indexed: 11/14/2022] Open
Abstract
DNA repair by homologous recombination (HR) is critical for the maintenance of genome stability. Germline and somatic mutations in HR genes have been associated with an increased risk of developing breast (BC) and ovarian cancers (OvC). However, the extent of factors and pathways that are functionally linked to HR with clinical relevance for BC and OvC remains unclear. To gain a broader understanding of this pathway, we used multi-omics datasets coupled with machine learning to identify genes that are associated with HR and to predict their sub-function. Specifically, we integrated our phylogenetic-based co-evolution approach (CladePP) with 23 distinct genetic and proteomic screens that monitored, directly or indirectly, DNA repair by HR. This omics data integration analysis yielded a new database (HRbase) that contains a list of 464 predictions, including 76 gold standard HR genes. Interestingly, the spliceosome machinery emerged as one major pathway with significant cross-platform interactions with the HR pathway. We functionally validated 6 spliceosome factors, including the RNA helicase SNRNP200 and its co-factor SNW1. Importantly, their RNA expression correlated with BC/OvC patient outcome. Altogether, we identified novel clinically relevant DNA repair factors and delineated their specific sub-function by machine learning. Our results, supported by evolutionary and multi-omics analyses, suggest that the spliceosome machinery plays an important role during the repair of DNA double-strand breaks (DSBs).
Collapse
Affiliation(s)
- Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Oded Raban
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Steven Findlay
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Dolev Rahat
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Irene Unterman
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Arash Samiei
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Amber Yasmeen
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Zafir Kaiser
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Hellen Kuasne
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Morag Park
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - William D Foulkes
- The Research Institute of the McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Aviad Zick
- Department of Oncology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Ein-Kerem, Jerusalem 91120, Israel
| | - Walter H Gotlieb
- Division of Gynecology Oncology, Segal Cancer Center, Jewish General Hospital, McGill University, Montreal, QC H3T 1E2, Canada
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Alexandre Orthwein
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| |
Collapse
|
11
|
Takatsuka H, Fahmi M, Hamanishi K, Sakuratani T, Kubota Y, Ito M. In silico Analysis of SARS-CoV-2 ORF8-Binding Proteins Reveals the Involvement of ORF8 in Acquired-Immune and Innate-Immune Systems. Front Med (Lausanne) 2022; 9:824622. [PMID: 35178414 PMCID: PMC8844466 DOI: 10.3389/fmed.2022.824622] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/03/2022] [Indexed: 11/13/2022] Open
Abstract
SARS-CoV-2 is the causative agent of a new type of coronavirus infection, COVID-19, which has rapidly spread worldwide. The overall genome sequence homology between SARS-CoV-2 and SARS-CoV is 79%. However, the homology of the ORF8 protein between these two coronaviruses is low, at ~26%. Previously, it has been suggested that infection by the ORF8-deleted variant of SARS-CoV-2 results in less severe symptoms than in the case of wild-type SARS-CoV-2. Although we found that ORF8 is involved in the proteasome autoimmunity system, the precise role of ORF8 in infection and pathology has not been fully clarified. In this study, we determined a new network of ORF8-interacting proteins by performing in silico analysis of the binding proteins against the previously described 47 ORF8-binding proteins. We used as a dataset 431 human protein candidates from Uniprot that physically interacted with 47 ORF8-binding proteins, as identified using STRING. Homology and phylogenetic profile analyses of the protein dataset were performed on 446 eukaryotic species whose genome sequences were available in KEGG OC. Based on the phylogenetic profile results, clustering analysis was performed using Ward's method. Our phylogenetic profiling showed that the interactors of the ORF8-interacting proteins were clustered into three classes that were conserved across chordates (Class 1: 152 proteins), metazoans (Class 2: 163 proteins), and eukaryotes (Class 3: 114 proteins). Following the KEGG pathway analysis, classification of cellular localization, tissue-specific expression analysis, and a literature study on each class of the phylogenetic profiling cluster tree, we predicted that the following: protein members in Class 1 could contribute to COVID-19 pathogenesis via complement and coagulation cascades and could promote sarcoidosis; the members of Class 1 and 2, together, may contribute to the downregulation of Interferon-β; and Class 3 proteins are associated with endoplasmic reticulum stress and the degradation of human leukocyte antigen.
Collapse
Affiliation(s)
- Hisashi Takatsuka
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Japan
| | - Muhamad Fahmi
- Research Department, Research Institute for Humanity and Nature, Kyoto, Japan.,Research Organization of Science and Technology, Ritsumeikan University, Kusatsu, Japan
| | - Kotono Hamanishi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Japan
| | - Takuya Sakuratani
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Japan
| | - Yukihiko Kubota
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Japan
| | - Masahiro Ito
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Japan
| |
Collapse
|
12
|
Fukunaga T, Iwasaki W. Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics 2022; 38:1794-1800. [PMID: 35060594 PMCID: PMC8963296 DOI: 10.1093/bioinformatics/btac034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/11/2022] [Accepted: 01/13/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. RESULTS To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/fukunagatsu/Ipm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 1130032, Japan,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba 2770882, Japan,Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 1130032, Japan,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo 1130032, Japan
| |
Collapse
|