1
|
Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther 2022; 7:156. [PMID: 35538061 PMCID: PMC9090746 DOI: 10.1038/s41392-022-00994-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 03/14/2022] [Accepted: 04/05/2022] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.
Collapse
|
2
|
Ovens K, Eames BF, McQuillan I. Comparative Analyses of Gene Co-expression Networks: Implementations and Applications in the Study of Evolution. Front Genet 2021; 12:695399. [PMID: 34484293 PMCID: PMC8414652 DOI: 10.3389/fgene.2021.695399] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Similarities and differences in the associations of biological entities among species can provide us with a better understanding of evolutionary relationships. Often the evolution of new phenotypes results from changes to interactions in pre-existing biological networks and comparing networks across species can identify evidence of conservation or adaptation. Gene co-expression networks (GCNs), constructed from high-throughput gene expression data, can be used to understand evolution and the rise of new phenotypes. The increasing abundance of gene expression data makes GCNs a valuable tool for the study of evolution in non-model organisms. In this paper, we cover motivations for why comparing these networks across species can be valuable for the study of evolution. We also review techniques for comparing GCNs in the context of evolution, including local and global methods of graph alignment. While some protein-protein interaction (PPI) bioinformatic methods can be used to compare co-expression networks, they often disregard highly relevant properties, including the existence of continuous and negative values for edge weights. Also, the lack of comparative datasets in non-model organisms has hindered the study of evolution using PPI networks. We also discuss limitations and challenges associated with cross-species comparison using GCNs, and provide suggestions for utilizing co-expression network alignments as an indispensable tool for evolutionary studies going forward.
Collapse
Affiliation(s)
- Katie Ovens
- Augmented Intelligence & Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - B. Frank Eames
- Department of Anatomy, Physiology, & Pharmacology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
3
|
Galpert D, Fernández A, Herrera F, Antunes A, Molina-Ruiz R, Agüero-Chapin G. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics 2018; 19:166. [PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencia de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Alberto Fernández
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Agostinho Antunes
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba.
| |
Collapse
|
4
|
Reyes PFL, Michoel T, Joshi A, Devailly G. Meta-analysis of Liver and Heart Transcriptomic Data for Functional Annotation Transfer in Mammalian Orthologs. Comput Struct Biotechnol J 2017; 15:425-432. [PMID: 29187960 PMCID: PMC5691612 DOI: 10.1016/j.csbj.2017.08.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 11/30/2022] Open
Abstract
Functional annotation transfer across multi-gene family
orthologs can lead to functional misannotations. We hypothesised that co-expression
network will help predict functional orthologs amongst complex homologous gene
families. To explore the use of transcriptomic data available in public domain to
identify functionally equivalent ones from all predicted orthologs, we collected
genome wide expression data in mouse and rat liver from over 1500 experiments with
varied treatments. We used a hyper-graph clustering method to identify clusters of
orthologous genes co-expressed in both mouse and rat. We validated these clusters by
analysing expression profiles in each species separately, and demonstrating a high
overlap. We then focused on genes in 18 homology groups with one-to-many or
many-to-many relationships between two species, to discriminate between functionally
equivalent and non-equivalent orthologs. Finally, we further applied our method by
collecting heart transcriptomic data (over 1400 experiments) in rat and mouse to
validate the method in an independent tissue.
Collapse
Affiliation(s)
| | - Tom Michoel
- The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Anagha Joshi
- The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Guillaume Devailly
- The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| |
Collapse
|
5
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
6
|
Campbell AR, Regan K, Bhave N, Pattanayak A, Parihar R, Stiff AR, Trikha P, Scoville SD, Liyanarachchi S, Kondadasula SV, Lele O, Davuluri R, Payne PRO, Carson WE. Gene expression profiling of the human natural killer cell response to Fc receptor activation: unique enhancement in the presence of interleukin-12. BMC Med Genomics 2015; 8:66. [PMID: 26470881 PMCID: PMC4608307 DOI: 10.1186/s12920-015-0142-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 10/07/2015] [Indexed: 01/23/2023] Open
Abstract
Background Traditionally, the CD56dimCD16+ subset of Natural Killer (NK) cells has been thought to mediate cellular cytotoxicity with modest cytokine secretion capacity. However, studies have suggested that this subset may exert a more diverse array of immunological functions. There exists a lack of well-developed functional models to describe the behavior of activated NK cells, and the interactions between signaling pathways that facilitate effector functions are not well understood. In the present study, a combination of genome-wide microarray analyses and systems-level bioinformatics approaches were utilized to elucidate the transcriptional landscape of NK cells activated via interactions with antibody-coated targets in the presence of interleukin-12 (IL-12). Methods We conducted differential gene expression analysis of CD56dimCD16+ NK cells following FcR stimulation in the presence or absence of IL-12. Next, we functionally characterized gene sets according to patterns of gene expression and validated representative genes using RT-PCR. IPA was utilized for biological pathway analysis, and an enriched network of interacting genes was generated using GeneMANIA. Furthermore, PAJEK and the HITS algorithm were employed to identify important genes in the network according to betweeness centrality, hub, and authority node metrics. Results Analyses revealed that CD56dimCD16+ NK cells co-stimulated via the Fc receptor (FcR) and IL-12R led to the expression of a unique set of genes, including genes encoding cytotoxicity receptors, apoptotic proteins, intracellular signaling molecules, and cytokines that may mediate enhanced cytotoxicity and interactions with other immune cells within inflammatory tissues. Network analyses identified a novel set of connected key players, BATF, IRF4, TBX21, and IFNG, within an integrated network composed of differentially expressed genes in NK cells stimulated by various conditions (immobilized IgG, IL-12, or the combination of IgG and IL-12). Conclusions These results are the first to address the global mechanisms by which NK cells mediate their biological functions when encountering antibody-coated targets within inflammatory sites. Moreover, this study has identified a set of high-priority targets for subsequent investigation into strategies to combat cancer by enhancing the anti-tumor activity of CD56dimCD16+ NK cells. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0142-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amanda R Campbell
- The Arthur G. James Comprehensive Cancer Center and Solove Research Institute, The Ohio State University, Columbus, OH, 43210, USA. .,Medical Scientist Training Program and Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH, 43210, USA.
| | - Kelly Regan
- Medical Scientist Training Program and Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH, 43210, USA. .,Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| | - Neela Bhave
- The Arthur G. James Comprehensive Cancer Center and Solove Research Institute, The Ohio State University, Columbus, OH, 43210, USA.
| | - Arka Pattanayak
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| | - Robin Parihar
- Department of Pediatrics, The Cleveland Clinic, Cleveland, OH, 44106, USA.
| | - Andrew R Stiff
- The Arthur G. James Comprehensive Cancer Center and Solove Research Institute, The Ohio State University, Columbus, OH, 43210, USA. .,Medical Scientist Training Program and Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH, 43210, USA.
| | - Prashant Trikha
- The Arthur G. James Comprehensive Cancer Center and Solove Research Institute, The Ohio State University, Columbus, OH, 43210, USA.
| | - Steven D Scoville
- The Arthur G. James Comprehensive Cancer Center and Solove Research Institute, The Ohio State University, Columbus, OH, 43210, USA. .,Medical Scientist Training Program and Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH, 43210, USA.
| | - Sandya Liyanarachchi
- Division of Human Cancer Genetics, The Ohio State University, Columbus, OH, 43210, USA.
| | - Sri Vidya Kondadasula
- Departments of Oncology and Medicine, Wayne State University and Barbara Ann Karmanos Cancer Institute, Detroit, MI, 48201, USA.
| | - Omkar Lele
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| | - Ramana Davuluri
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL, 60611, USA.
| | - Philip R O Payne
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| | - William E Carson
- The Arthur G. James Comprehensive Cancer Center and Solove Research Institute, The Ohio State University, Columbus, OH, 43210, USA. .,Department of Surgery, The Ohio State University, Columbus, OH, 43210, USA. .,The Ohio State University College of Medicine, N924 Doan Hall, 410 West 10th Ave., Columbus, OH, 43210, USA.
| |
Collapse
|
7
|
Sekhwal MK, Sharma V, Sarin R. Identification of MFS proteins in sorghum using semantic similarity. Theory Biosci 2013; 132:105-13. [PMID: 23299296 DOI: 10.1007/s12064-012-0174-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2012] [Accepted: 12/18/2012] [Indexed: 11/26/2022]
Abstract
The antiporters, uniporters and symporters are the functional classes of MFS that play major role in ions homeostasis, regulation of pumps and channels, membrane structure, transporters activity in tolerance to abiotic stresses. Major facilitator superfamily (MFS) encodes Na(+)/H(+) antiporter that are considered as being sensors of the molecule transports. A large number of MFS proteins have been identified in several plants, rice, maize, Arabidopsis etc. However, the majority of proteins in sorghum are described as putative, uncharacterized till date. This suggested that identified proteins of MFS in sorghum are far from saturation. Hence, we developed gene ontology (GO) terms semantic similarity based method using GOSemSim measure of R package. As a result, total 2,568 high (100 %) semantic similar orthologous proteins from 7 plant species were obtained. These data were used to predict function of 257 putative uncharacterized proteins from 18 families of MFS in Sorghum. Consequently, the identified proteins belonged to the function of regulation of pumps and channels, membrane structure, transporters activity, ions homeostasis, transporter mechanisms and binding process. These identified functions appear to have a distinct mechanism of salt-stress adaptation in plants. The proposed method will help in further identifying new proteins that can help in the development of agronomically and economically important plants.
Collapse
Affiliation(s)
- Manoj Kumar Sekhwal
- Department of Bioscience and Biotechnology, Banasthali University, P.O. Banasthali Vidyapith, 304022 Vanasthali, Rajasthan, India
| | | | | |
Collapse
|
8
|
Towfic F, Gupta S, Honavar V, Subramaniam S. B-cell ligand processing pathways detected by large-scale comparative analysis. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:142-52. [PMID: 22917187 PMCID: PMC5054497 DOI: 10.1016/j.gpb.2012.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Revised: 03/05/2012] [Accepted: 03/07/2012] [Indexed: 11/03/2022]
Abstract
The initiation of B-cell ligand recognition is a critical step for the generation of an immune response against foreign bodies. We sought to identify the biochemical pathways involved in the B-cell ligand recognition cascade and sets of ligands that trigger similar immunological responses. We utilized several comparative approaches to analyze the gene coexpression networks generated from a set of microarray experiments spanning 33 different ligands. First, we compared the degree distributions of the generated networks. Second, we utilized a pairwise network alignment algorithm, BiNA, to align the networks based on the hubs in the networks. Third, we aligned the networks based on a set of KEGG pathways. We summarized our results by constructing a consensus hierarchy of pathways that are involved in B cell ligand recognition. The resulting pathways were further validated through literature for their common physiological responses. Collectively, the results based on our comparative analyses of degree distributions, alignment of hubs, and alignment based on KEGG pathways provide a basis for molecular characterization of the immune response states of B-cells and demonstrate the power of comparative approaches (e.g., gene coexpression network alignment algorithms) in elucidating biochemical pathways involved in complex signaling events in cells.
Collapse
Affiliation(s)
- Fadi Towfic
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA 50010, USA.
| | | | | | | |
Collapse
|
9
|
Song B, Wang F, Guo Y, Sang Q, Liu M, Li D, Fang W, Zhang D. Protein-protein interaction network-based detection of functionally similar proteins within species. Proteins 2012; 80:1736-43. [PMID: 22411607 DOI: 10.1002/prot.24066] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Revised: 02/03/2012] [Accepted: 03/03/2012] [Indexed: 02/03/2023]
Abstract
Although functionally similar proteins across species have been widely studied, functionally similar proteins within species showing low sequence similarity have not been examined in detail. Identification of these proteins is of significant importance for understanding biological functions, evolution of protein families, progression of co-evolution, and convergent evolution and others which cannot be obtained by detection of functionally similar proteins across species. Here, we explored a method of detecting functionally similar proteins within species based on graph theory. After denoting protein-protein interaction networks using graphs, we split the graphs into subgraphs using the 1-hop method. Proteins with functional similarities in a species were detected using a method of modified shortest path to compare these subgraphs and to find the eligible optimal results. Using seven protein-protein interaction networks and this method, some functionally similar proteins with low sequence similarity that cannot detected by sequence alignment were identified. By analyzing the results, we found that, sometimes, it is difficult to separate homologous from convergent evolution. Evaluation of the performance of our method by gene ontology term overlap showed that the precision of our method was excellent.
Collapse
Affiliation(s)
- Baoxing Song
- MOA Key Laboratory of Animal Biotechnology of National Ministry of Agriculture, Institute of Veterinary Immunology, Division of Veterinary Microbiology & Virology, Department of Preventive Veterinary Medicine, College of Veterinary Medicine, and Investigation Group of Molecular Virology, Immunology, Oncology & Systems Biology, Center for Bioinformatics, Northwest A & F University, Yangling 712100, Xi'an City, Shaanxi Province, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Van Hemert JL, Dickerson JA. Discriminating response groups in metabolic and regulatory pathway networks. Bioinformatics 2012; 28:947-54. [DOI: 10.1093/bioinformatics/bts039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
11
|
Potential for modulation of the fas apoptotic pathway by epidermal growth factor in sarcomas. Sarcoma 2011; 2011:847409. [PMID: 22135505 PMCID: PMC3206362 DOI: 10.1155/2011/847409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 08/22/2011] [Accepted: 08/23/2011] [Indexed: 11/18/2022] Open
Abstract
One important mechanism by which cancer cells parasitize their host is by escaping apoptosis. Thus, selectively facilitating apoptosis is a therapeutic mechanism by which oncotherapy may prove highly advantageous. One major apoptotic pathway is mediated by Fas ligand (FasL). The death-inducing signaling Ccmplex (DISC) and subsequent death-domain aggregations are created when FasL is bound by its receptor thereby enabling programmed cell death. Conceptually, if a better understanding of the Fas pathway can be garnered, an oncoselective prodeath therapeutic approach can be tailored. Herein, we propose that EGF and CTGF play essential roles in the regulation of the Fas apoptotic pathway in sarcomas. Tumor and in vitro data suggest viable cells counter the prodeath signal induced by FasL by activating EGF, which in turn induces prosurvival CTGF. The prosurvival attributes of CTGF ultimately predominate over the death-inducing FasL. Cells destined for elimination inhibit this prosurvival response via a presently undefined pathway. This scenario represents a novel role for EGF and CTGF as regulators of the Fas pathway in sarcomas.
Collapse
|
12
|
Kristensen DM, Wolf YI, Mushegian AR, Koonin EV. Computational methods for Gene Orthology inference. Brief Bioinform 2011; 12:379-91. [PMID: 21690100 DOI: 10.1093/bib/bbr030] [Citation(s) in RCA: 150] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
Collapse
Affiliation(s)
- David M Kristensen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
13
|
Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res 2010; 39:D1005-10. [PMID: 21097893 PMCID: PMC3013736 DOI: 10.1093/nar/gkq1184] [Citation(s) in RCA: 798] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
Collapse
Affiliation(s)
- Tanya Barrett
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|