Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Börnigen D, Tranchevent LC, Bonachela-Capdevila F, Devriendt K, De Moor B, De Causmaecker P, Moreau Y. An unbiased evaluation of gene prioritization tools. Bioinformatics 2012;28:3081-8. [PMID: 23047555 DOI: 10.1093/bioinformatics/bts581] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

For:	Börnigen D, Tranchevent LC, Bonachela-Capdevila F, Devriendt K, De Moor B, De Causmaecker P, Moreau Y. An unbiased evaluation of gene prioritization tools. Bioinformatics 2012;28:3081-8. [PMID: 23047555 DOI: 10.1093/bioinformatics/bts581] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Number

Cited by Other Article(s)

Molotkov I, Artomov M. Detecting biased validation of predictive models in the positive-unlabeled setting: disease gene prioritization case study. BIOINFORMATICS ADVANCES 2023;3:vbad128. [PMID: 37745001 PMCID: PMC10517638 DOI: 10.1093/bioadv/vbad128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/13/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023]

Boguslav MR, Salem NM, White EK, Sullivan KJ, Bada M, Hernandez TL, Leach SM, Hunter LE. Creating an ignorance-base: Exploring known unknowns in the scientific literature. J Biomed Inform 2023;143:104405. [PMID: 37270143 PMCID: PMC10528083 DOI: 10.1016/j.jbi.2023.104405] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 05/18/2023] [Accepted: 05/21/2023] [Indexed: 06/05/2023]

Abstract

BACKGROUND

Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition.

RESULTS

We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements.

CONCLUSION

Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.

Collapse

Rahaie Z, Rabiee HR, Alinejad-Rokny H. DeepGenePrior: A deep learning model for prioritizing genes affected by copy number variants. PLoS Comput Biol 2023;19:e1011249. [PMID: 37486921 PMCID: PMC10399873 DOI: 10.1371/journal.pcbi.1011249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 08/03/2023] [Accepted: 06/06/2023] [Indexed: 07/26/2023] Open

Henry OJ, Stödberg T, Båtelson S, Rasi C, Stranneheim H, Wedell A. Individualised human phenotype ontology gene panels improve clinical whole exome and genome sequencing analytical efficacy in a cohort of developmental and epileptic encephalopathies. Mol Genet Genomic Med 2023;11:e2167. [PMID: 36967109 PMCID: PMC10337286 DOI: 10.1002/mgg3.2167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 02/21/2023] [Accepted: 03/01/2023] [Indexed: 07/20/2023] Open

Zhang Y, Chen L, Li S. CIPHER-SC: Disease-Gene Association Inference Using Graph Convolution on a Context-Aware Network With Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:819-829. [PMID: 32809944 DOI: 10.1109/tcbb.2020.3017547] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022;23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure.

RESULTS

To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure.

CONCLUSIONS

We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.

Collapse

Yue Z, Yan D, Guo G, Chen JY. Biological Network Mining. Methods Mol Biol 2021;2328:139-151. [PMID: 34251623 DOI: 10.1007/978-1-0716-1534-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Guerra C, Joshi S, Lu Y, Palini F, Ferraro Petrillo U, Rossignac J. Rank-Similarity Measures for Comparing Gene Prioritizations: A Case Study in Autism. J Comput Biol 2020;28:283-295. [PMID: 33103913 DOI: 10.1089/cmb.2020.0244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Paliwal S, de Giorgio A, Neil D, Michel JB, Lacoste AM. Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs. Sci Rep 2020;10:18250. [PMID: 33106501 PMCID: PMC7589557 DOI: 10.1038/s41598-020-74922-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 09/30/2020] [Indexed: 12/04/2022] Open

Jiang S, Zhang CY, Tang L, Zhao LX, Chen HZ, Qiu Y. Integrated Genomic Analysis Revealed Associated Genes for Alzheimer's Disease in APOE4 Non-Carriers. Curr Alzheimer Res 2020;16:753-763. [PMID: 31441725 DOI: 10.2174/1567205016666190823124724] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 07/14/2019] [Accepted: 08/08/2019] [Indexed: 12/31/2022]

Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2020;47:D573-D580. [PMID: 30418591 PMCID: PMC6323914 DOI: 10.1093/nar/gky1126] [Citation(s) in RCA: 114] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/25/2018] [Indexed: 12/15/2022] Open

Cabrera-Andrade A, López-Cortés A, Jaramillo-Koupermann G, Paz-y-Miño C, Pérez-Castillo Y, Munteanu CR, González-Díaz H, Pazos A, Tejera E. Gene Prioritization through Consensus Strategy, Enrichment Methodologies Analysis, and Networking for Osteosarcoma Pathogenesis. Int J Mol Sci 2020;21:E1053. [PMID: 32033398 PMCID: PMC7038221 DOI: 10.3390/ijms21031053] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/30/2020] [Accepted: 01/30/2020] [Indexed: 12/12/2022] Open

Affiliation(s)

Alejandro Cabrera-Andrade Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador; Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito 170125, Ecuador RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
Andrés López-Cortés RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.) Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador;
Gabriela Jaramillo-Koupermann Laboratorio de Biología Molecular, Subproceso de Anatomía Patológica, Hospital de Especialidades Eugenio Espejo, Quito 170403, Ecuador;
César Paz-y-Miño Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador;
Yunierkis Pérez-Castillo Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador; Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito 170125, Ecuador
Cristian R. Munteanu RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.) Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
Humbert González-Díaz Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Spain IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain;
Alejandro Pazos RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.) Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
Eduardo Tejera Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador; Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de Las Américas, Quito 170125, Ecuador

Collapse

Tran VD, Sperduti A, Backofen R, Costa F. Heterogeneous networks integration for disease–gene prioritization with node kernels. Bioinformatics 2020;36:2649-2656. [DOI: 10.1093/bioinformatics/btaa008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 12/19/2019] [Accepted: 01/23/2020] [Indexed: 12/21/2022] Open

Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019;16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open

Yue Z, Willey CD, Hjelmeland AB, Chen JY. BEERE: a web server for biomedical entity expansion, ranking and explorations. Nucleic Acids Res 2019;47:W578-W586. [PMID: 31114876 PMCID: PMC6602520 DOI: 10.1093/nar/gkz428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 05/04/2019] [Accepted: 05/20/2019] [Indexed: 12/02/2022] Open

GPS: Identification of disease genes by rank aggregation of multi-genomic scoring schemes. Genomics 2019;111:612-618. [DOI: 10.1016/j.ygeno.2018.03.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 03/16/2018] [Accepted: 03/21/2018] [Indexed: 12/19/2022]

Kiblawi S, Chasman D, Henning A, Park E, Poon H, Gould M, Ahlquist P, Craven M. Augmenting subnetwork inference with information extracted from the scientific literature. PLoS Comput Biol 2019;15:e1006758. [PMID: 31246951 PMCID: PMC6619809 DOI: 10.1371/journal.pcbi.1006758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 07/10/2019] [Accepted: 01/04/2019] [Indexed: 11/20/2022] Open

Abstract

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference

There is a multitude of publicly available databases that contain information about biological entities (i.e., genes, proteins, and other small molecules) as well as information about how these entities interact together. However, these databases are often incomplete. There is a wealth of information present in the text of the scientific literature that is not yet available in these databases. Using tools that mine the scientific literature we are able to extract some of this potentially relevant information. In this work we show how we can use publicly available databases in conjunction with the information extracted from the scientific literature to infer the networks that are involved in specific biological processes, such as viral replication and cancer tumor growth.

Collapse

Fine RS, Pers TH, Amariuta T, Raychaudhuri S, Hirschhorn JN. Benchmarker: An Unbiased, Association-Data-Driven Strategy to Evaluate Gene Prioritization Algorithms. Am J Hum Genet 2019;104:1025-1039. [PMID: 31056107 PMCID: PMC6556976 DOI: 10.1016/j.ajhg.2019.03.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 03/28/2019] [Indexed: 01/17/2023] Open

Affiliation(s)

Rebecca S Fine Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Ph.D. Program in Biological and Biomedical Sciences, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
Tune H Pers The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark; Department of Epidemiology Research, Statens Serum Institut, 2300 Copenhagen, Denmark
Tiffany Amariuta Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Ph.D. Program in Bioinformatics and Integrative Genomics, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
Soumya Raychaudhuri Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester M13 9PL, UK
Joel N Hirschhorn Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA.

Collapse

Tomar S, Sethi R, Lai PS. Specific phenotype semantics facilitate gene prioritization in clinical exome sequencing. Eur J Hum Genet 2019;27:1389-1397. [PMID: 31053788 DOI: 10.1038/s41431-019-0412-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 02/21/2019] [Accepted: 04/15/2019] [Indexed: 12/13/2022] Open

Stacey D, Fauman EB, Ziemek D, Sun BB, Harshfield EL, Wood AM, Butterworth AS, Suhre K, Paul DS. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res 2019;47:e3. [PMID: 30239796 PMCID: PMC6326795 DOI: 10.1093/nar/gky837] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 08/31/2018] [Accepted: 09/11/2018] [Indexed: 12/27/2022] Open

Luo P, Tian LP, Ruan J, Wu FX. Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:222-232. [PMID: 29990218 DOI: 10.1109/tcbb.2017.2770120] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

López-Cortés A, Paz-Y-Miño C, Cabrera-Andrade A, Barigye SJ, Munteanu CR, González-Díaz H, Pazos A, Pérez-Castillo Y, Tejera E. Gene prioritization, communality analysis, networking and metabolic integrated pathway to better understand breast cancer pathogenesis. Sci Rep 2018;8:16679. [PMID: 30420728 PMCID: PMC6232116 DOI: 10.1038/s41598-018-35149-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 10/16/2018] [Indexed: 12/30/2022] Open

Affiliation(s)

Andrés López-Cortés Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, 170129, Quito, Ecuador. RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, 15071, Coruna, Spain.
César Paz-Y-Miño Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, 170129, Quito, Ecuador
Alejandro Cabrera-Andrade Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador
Stephen J Barigye Department of Chemistry, McGill University, 801 Sherbrooke Street West, Montreal, QC, H3A 0B8, Canada
Cristian R Munteanu RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, 15071, Coruna, Spain INIBIC, Institute of Biomedical Research, CHUAC, UDC, 15006, Coruna, Spain
Humberto González-Díaz Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940, Leioa, Biscay, Spain IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain
Alejandro Pazos RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, 15071, Coruna, Spain INIBIC, Institute of Biomedical Research, CHUAC, UDC, 15006, Coruna, Spain
Yunierkis Pérez-Castillo Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador Escuela de Ciencias Físicas y Matemáticas, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador
Eduardo Tejera Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador. Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de las Américas, Avenue de los Granados, 170125, Quito, Ecuador.

Collapse

Rosenthal SB, Len J, Webster M, Gary A, Birmingham A, Fisch KM. Interactive network visualization in Jupyter notebooks: visJS2jupyter. Bioinformatics 2018;34:126-128. [PMID: 28968701 DOI: 10.1093/bioinformatics/btx581] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/13/2017] [Indexed: 01/20/2023] Open

Zampieri G, Tran DV, Donini M, Navarin N, Aiolli F, Sperduti A, Valle G. Scuba: scalable kernel-based gene prioritization. BMC Bioinformatics 2018;19:23. [PMID: 29370760 PMCID: PMC5785908 DOI: 10.1186/s12859-018-2025-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 01/15/2018] [Indexed: 01/01/2023] Open

Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017;18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance.

METHOD

We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions.

RESULTS

VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time.

CONCLUSION

In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.

Collapse

Frasca M. Gene2DisCo: Gene to disease using disease commonalities. Artif Intell Med 2017;82:34-46. [DOI: 10.1016/j.artmed.2017.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 07/24/2017] [Accepted: 08/13/2017] [Indexed: 01/10/2023]

Tejera E, Cruz-Monteagudo M, Burgos G, Sánchez ME, Sánchez-Rodríguez A, Pérez-Castillo Y, Borges F, Cordeiro MNDS, Paz-Y-Miño C, Rebelo I. Consensus strategy in genes prioritization and combined bioinformatics analysis for preeclampsia pathogenesis. BMC Med Genomics 2017;10:50. [PMID: 28789679 PMCID: PMC5549357 DOI: 10.1186/s12920-017-0286-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 07/28/2017] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

Preeclampsia is a multifactorial disease with unknown pathogenesis. Even when recent studies explored this disease using several bioinformatics tools, the main objective was not directed to pathogenesis. Additionally, consensus prioritization was proved to be highly efficient in the recognition of genes-disease association. However, not information is available about the consensus ability to early recognize genes directly involved in pathogenesis. Therefore our aim in this study is to apply several theoretical approaches to explore preeclampsia; specifically those genes directly involved in the pathogenesis.

METHODS

We firstly evaluated the consensus between 12 prioritization strategies to early recognize pathogenic genes related to preeclampsia. A communality analysis in the protein-protein interaction network of previously selected genes was done including further enrichment analysis. The enrichment analysis includes metabolic pathways as well as gene ontology. Microarray data was also collected and used in order to confirm our results or as a strategy to weight the previously enriched pathways.

RESULTS

The consensus prioritized gene list was rationally filtered to 476 genes using several criteria. The communality analysis showed an enrichment of communities connected with VEGF-signaling pathway. This pathway is also enriched considering the microarray data. Our result point to VEGF, FLT1 and KDR as relevant pathogenic genes, as well as those connected with NO metabolism.

CONCLUSION

Our results revealed that consensus strategy improve the detection and initial enrichment of pathogenic genes, at least in preeclampsia condition. Moreover the combination of the first percent of the prioritized genes with protein-protein interaction network followed by communality analysis reduces the gene space. This approach actually identifies well known genes related with pathogenesis. However, genes like HSP90, PAK2, CD247 and others included in the first 1% of the prioritized list need to be further explored in preeclampsia pathogenesis through experimental approaches.

Collapse

Hassani-Pak K, Rawlings C. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes. J Integr Bioinform 2017;14:/j/jib.ahead-of-print/jib-2016-0002/jib-2016-0002.xml. [PMID: 28609292 PMCID: PMC6042805 DOI: 10.1515/jib-2016-0002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 02/16/2017] [Indexed: 02/06/2023] Open

Guala D, Sonnhammer ELL. A large-scale benchmark of gene prioritization methods. Sci Rep 2017;7:46598. [PMID: 28429739 PMCID: PMC5399445 DOI: 10.1038/srep46598] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 03/22/2017] [Indexed: 11/16/2022] Open

GenePANDA-a novel network-based gene prioritizing tool for complex diseases. Sci Rep 2017;7:43258. [PMID: 28252032 PMCID: PMC5333103 DOI: 10.1038/srep43258] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/23/2017] [Indexed: 02/08/2023] Open

Liu JL, Zhao M. A PubMed-wide study of endometriosis. Genomics 2016;108:151-157. [DOI: 10.1016/j.ygeno.2016.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 09/30/2016] [Accepted: 10/12/2016] [Indexed: 12/18/2022]

Li J, Lin X, Teng Y, Qi S, Xiao D, Zhang J, Kang Y. A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization. PLoS One 2016;11:e0159457. [PMID: 27415759 PMCID: PMC4944959 DOI: 10.1371/journal.pone.0159457] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 07/01/2016] [Indexed: 12/31/2022] Open

Börnigen D, Tyekucheva S, Wang X, Rider JR, Lee GS, Mucci LA, Sweeney C, Huttenhower C. Computational Reconstruction of NFκB Pathway Interaction Mechanisms during Prostate Cancer. PLoS Comput Biol 2016;12:e1004820. [PMID: 27078000 PMCID: PMC4831844 DOI: 10.1371/journal.pcbi.1004820] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 02/19/2016] [Indexed: 12/21/2022] Open

Abstract

Molecular research in cancer is one of the largest areas of bioinformatic investigation, but it remains a challenge to understand biomolecular mechanisms in cancer-related pathways from high-throughput genomic data. This includes the Nuclear-factor-kappa-B (NFκB) pathway, which is central to the inflammatory response and cell proliferation in prostate cancer development and progression. Despite close scrutiny and a deep understanding of many of its members’ biomolecular activities, the current list of pathway members and a systems-level understanding of their interactions remains incomplete. Here, we provide the first steps toward computational reconstruction of interaction mechanisms of the NFκB pathway in prostate cancer. We identified novel roles for ATF3, CXCL2, DUSP5, JUNB, NEDD9, SELE, TRIB1, and ZFP36 in this pathway, in addition to new mechanistic interactions between these genes and 10 known NFκB pathway members. A newly predicted interaction between NEDD9 and ZFP36 in particular was validated by co-immunoprecipitation, as was NEDD9's potential biological role in prostate cancer cell growth regulation. We combined 651 gene expression datasets with 1.4M gene product interactions to predict the inclusion of 40 additional genes in the pathway. Molecular mechanisms of interaction among pathway members were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in a total of 112 interactions in the fully reconstructed NFκB pathway: 13 (11%) previously known, 29 (26%) supported by existing literature, and 70 (63%) novel. This method is generalizable to other tissue types, cancers, and organisms, and this new information about the NFκB pathway will allow us to further understand prostate cancer and to develop more effective prevention and treatment strategies.

In molecular research in cancer it remains challenging to uncover biomolecular mechanisms in cancer-related pathways from high-throughput genomic data, including the Nuclear-factor-kappa-B (NFκB) pathway. Despite close scrutiny and a deep understanding of many of the NFκB pathway members’ biomolecular activities, the current list of pathway members and a systems-level understanding of their interactions remains incomplete. In this study, we provide the first steps toward computational reconstruction of interaction mechanisms of the NFκB pathway in prostate cancer. We identified novel roles for 8 genes in this pathway and new mechanistic interactions between these genes and 10 known pathway members. We combined 651 gene expression datasets with 1.4M interactions to predict the inclusion of 40 additional genes in the pathway. Molecular mechanisms of interaction were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in 112 interactions in the fully reconstructed NFκB pathway. This method is generalizable, and this new information about the NFκB pathway will allow us to further understand prostate cancer.

Collapse

Sung YJ, Pérusse L, Sarzynski MA, Fornage M, Sidney S, Sternfeld B, Rice T, Terry G, Jacobs DR, Katzmarzyk P, Curran JE, Carr JJ, Blangero J, Ghosh S, Després JP, Rankinen T, Rao D, Bouchard C. Genome-wide association studies suggest sex-specific loci associated with abdominal and visceral fat. Int J Obes (Lond) 2016;40:662-74. [PMID: 26480920 PMCID: PMC4821694 DOI: 10.1038/ijo.2015.217] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 10/05/2015] [Accepted: 10/06/2015] [Indexed: 12/21/2022]

Abstract

BACKGROUND

To identify loci associated with abdominal fat and replicate prior findings, we performed genome-wide association (GWA) studies of abdominal fat traits: subcutaneous adipose tissue (SAT); visceral adipose tissue (VAT); total adipose tissue (TAT) and visceral to subcutaneous adipose tissue ratio (VSR).

SUBJECTS AND METHODS

Sex-combined and sex-stratified analyses were performed on each trait with (TRAIT-BMI) or without (TRAIT) adjustment for body mass index (BMI), and cohort-specific results were combined via a fixed effects meta-analysis. A total of 2513 subjects of European descent were available for the discovery phase. For replication, 2171 European Americans and 772 African Americans were available.

RESULTS

A total of 52 single-nucleotide polymorphisms (SNPs) encompassing 7 loci showed suggestive evidence of association (P<1.0 × 10(-6)) with abdominal fat in the sex-combined analyses. The strongest evidence was found on chromosome 7p14.3 between a SNP near BBS9 gene and VAT (rs12374818; P=1.10 × 10(-7)), an association that was replicated (P=0.02). For the BMI-adjusted trait, the strongest evidence of association was found between a SNP near CYCSP30 and VAT-BMI (rs10506943; P=2.42 × 10(-7)). Our sex-specific analyses identified one genome-wide significant (P<5.0 × 10(-8)) locus for SAT in women with 11 SNPs encompassing the MLLT10, DNAJC1 and EBLN1 genes on chromosome 10p12.31 (P=3.97 × 10(-8) to 1.13 × 10(-8)). The THNSL2 gene previously associated with VAT in women was also replicated (P=0.006). The six gene/loci showing the strongest evidence of association with VAT or VAT-BMI were interrogated for their functional links with obesity and inflammation using the Biograph knowledge-mining software. Genes showing the closest functional links with obesity and inflammation were ADCY8 and KCNK9, respectively.

CONCLUSIONS

Our results provide evidence for new loci influencing abdominal visceral (BBS9, ADCY8, KCNK9) and subcutaneous (MLLT10/DNAJC1/EBLN1) fat, and confirmed a locus (THNSL2) previously reported to be associated with abdominal fat in women.

Collapse

Affiliation(s)

Yun Ju Sung Division of Biostatistics, Washington University School of Medicine, St-Louis, MO
Louis Pérusse Department of Kinesiology, School of Medicine and Institute of Nutrition and Functional Foods, Laval University, Québec, QC
Mark A. Sarzynski Human Genomics Laboratory, Pennington Biomedical Research Center, Baton Rouge, LA
Myriam Fornage Center for Human Genetics, University of Texas Health Science Center, Houston, TX
Steve Sidney Division of Research, Kaiser Permanente Northern California, Oakland, CA
Barbara Sternfeld Division of Research, Kaiser Permanente Northern California, Oakland, CA
Treva Rice Division of Biostatistics, Washington University School of Medicine, St-Louis, MO
Gregg Terry Department of Radiology, School of Medicine, Vanderbilt University, Nahsville, TN
David R. Jacobs Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN
Peter Katzmarzyk Human Genomics Laboratory, Pennington Biomedical Research Center, Baton Rouge, LA
Joanne E Curran South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, TX
John Jeffrey Carr Department of Radiology, School of Medicine, Vanderbilt University, Nahsville, TN
John Blangero South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, TX
Sujoy Ghosh Cardiovascular and Metabolic Disorders Program and Center for Computational Biology, Duke-NUS Graduate Medical School, Singapore
Jean-Pierre Després Department of Kinesiology, School of Medicine and Institute of Nutrition and Functional Foods, Laval University, Québec, QC Centre de recherché de l’Institut universitaire de cardiologie et de pneumologie de Québec, Québec, QC
Tuomo Rankinen Human Genomics Laboratory, Pennington Biomedical Research Center, Baton Rouge, LA
D.C. Rao Division of Biostatistics, Washington University School of Medicine, St-Louis, MO
Claude Bouchard Human Genomics Laboratory, Pennington Biomedical Research Center, Baton Rouge, LA

Collapse

ElShal S, Tranchevent LC, Sifrim A, Ardeshirdavani A, Davis J, Moreau Y. Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res 2016;44:e18. [PMID: 26384564 PMCID: PMC4737179 DOI: 10.1093/nar/gkv905] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 08/25/2015] [Accepted: 08/29/2015] [Indexed: 01/06/2023] Open

Weichenberger CX, Blankenburg H, Palermo A, D'Elia Y, König E, Bernstein E, Domingues FS. Dintor: functional annotation of genomic and proteomic data. BMC Genomics 2015;16:1081. [PMID: 26691694 PMCID: PMC4687148 DOI: 10.1186/s12864-015-2279-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 12/08/2015] [Indexed: 11/16/2022] Open

Abstract

Background

During the last decade, a great number of extremely valuable large-scale genomics and proteomics datasets have become available to the research community. In addition, dropping costs for conducting high-throughput sequencing experiments and the option to outsource them considerably contribute to an increasing number of researchers becoming active in this field. Even though various computational approaches have been developed to analyze these data, it is still a laborious task involving prudent integration of many heterogeneous and frequently updated data sources, creating a barrier for interested scientists to accomplish their own analysis.

Results

We have implemented Dintor, a data integration framework that provides a set of over 30 tools to assist researchers in the exploration of genomics and proteomics datasets. Each of the tools solves a particular task and several tools can be combined into data processing pipelines. Dintor covers a wide range of frequently required functionalities, from gene identifier conversions and orthology mappings to functional annotation of proteins and genetic variants up to candidate gene prioritization and Gene Ontology-based gene set enrichment analysis. Since the tools operate on constantly changing datasets, we provide a mechanism to unambiguously link tools with different versions of archived datasets, which guarantees reproducible results for future tool invocations. We demonstrate a selection of Dintor’s capabilities by analyzing datasets from four representative publications. The open source software can be downloaded and installed on a local Unix machine. For reasons of data privacy it can be configured to retrieve local data only. In addition, the Dintor tools are available on our public Galaxy web service at http://dintor.eurac.edu.

Conclusions

Dintor is a computational annotation framework for the analysis of genomic and proteomic datasets, providing a rich set of tools that cover the most frequently encountered tasks. A major advantage is its capability to consistently handle multiple versions of tool-associated datasets, supporting the researcher in delivering reproducible results.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-2279-5) contains supplementary material, which is available to authorized users.

Collapse

Verleyen W, Ballouz S, Gillis J. Positive and negative forms of replicability in gene network analysis. Bioinformatics 2015;32:1065-73. [PMID: 26668004 DOI: 10.1093/bioinformatics/btv734] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 12/09/2015] [Indexed: 02/07/2023] Open

Freytag S, Gagnon-Bartsch J, Speed TP, Bahlo M. Systematic noise degrades gene co-expression signals but can be corrected. BMC Bioinformatics 2015;16:309. [PMID: 26403471 PMCID: PMC4583191 DOI: 10.1186/s12859-015-0745-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 09/16/2015] [Indexed: 12/31/2022] Open

Abstract

Background

In the past decade, the identification of gene co-expression has become a routine part of the analysis of high-dimensional microarray data. Gene co-expression, which is mostly detected via the Pearson correlation coefficient, has played an important role in the discovery of molecular pathways and networks. Unfortunately, the presence of systematic noise in high-dimensional microarray datasets corrupts estimates of gene co-expression. Removing systematic noise from microarray data is therefore crucial. Many cleaning approaches for microarray data exist, however these methods are aimed towards improving differential expression analysis and their performances have been primarily tested for this application. To our knowledge, the performances of these approaches have never been systematically compared in the context of gene co-expression estimation.

Results

Using simulations we demonstrate that standard cleaning procedures, such as background correction and quantile normalization, fail to adequately remove systematic noise that affects gene co-expression and at times further degrade true gene co-expression. Instead we show that a global version of removal of unwanted variation (RUV), a data-driven approach, removes systematic noise but also allows the estimation of the true underlying gene-gene correlations. We compare the performance of all noise removal methods when applied to five large published datasets on gene expression in the human brain. RUV retrieves the highest gene co-expression values for sets of genes known to interact, but also provides the greatest consistency across all five datasets. We apply the method to prioritize epileptic encephalopathy candidate genes.

Conclusions

Our work raises serious concerns about the quality of many published gene co-expression analyses. RUV provides an efficient and flexible way to remove systematic noise from high-dimensional microarray datasets when the objective is gene co-expression analysis. The RUV method as applicable in the context of gene-gene correlation estimation is available as a BioconductoR-package: RUVcorr.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0745-3) contains supplementary material, which is available to authorized users.

Collapse

NetRanker: A network-based gene ranking tool using protein-protein interaction and gene expression data. BIOCHIP JOURNAL 2015. [DOI: 10.1007/s13206-015-9407-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Antanaviciute A, Watson CM, Harrison SM, Lascelles C, Crinnion L, Markham AF, Bonthron DT, Carr IM. OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization. Bioinformatics 2015;31:3822-9. [PMID: 26272982 PMCID: PMC4653395 DOI: 10.1093/bioinformatics/btv473] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 08/09/2015] [Indexed: 12/13/2022] Open

Antanaviciute A, Daly C, Crinnion LA, Markham AF, Watson CM, Bonthron DT, Carr IM. GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles. Bioinformatics 2015;31:2728-35. [PMID: 25861967 PMCID: PMC4528628 DOI: 10.1093/bioinformatics/btv196] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 04/01/2015] [Indexed: 12/12/2022] Open

Priedigkeit N, Wolfe N, Clark NL. Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene-based networks. PLoS Genet 2015;11:e1004967. [PMID: 25679399 PMCID: PMC4334549 DOI: 10.1371/journal.pgen.1004967] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 12/19/2014] [Indexed: 12/27/2022] Open

Abstract

Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC), is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting “disease map” network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.

Molecular evolution has informed our understanding of gene function; however, classical methods have largely been static in their implementation, focusing on single genes. Here, we present and prove the utility of a dynamic, network-based understanding of molecular evolution to infer relationships between genes associated with human diseases. We have shown previously that groups of genes within functional niches tend to share similar evolutionary histories. Exploiting the availability of whole genomes from multiple species, these histories can be numerically scored and dynamically compared to one another using a sequence-based signature termed Evolutionary Rate Covariation (ERC). To explore potential applications, we characterized ERC amongst disease genes and found that many diseases contain significant ERC signatures between their contributing genes. We show that ERC can also prioritize “true” disease genes amongst unrelated gene candidates. Lastly, these signatures can serve as a foundation for creating instructive gene-based networks, unveiling novel relationships between diseases thought to be clinically distinct. Our hope is that this study will add to the increasing evidence that advancing our understanding of molecular evolution can be a crucial asset in large-scale gene discovery pursuits (Link to our webserver that provides intuitive ERC analysis tools: http://csb.pitt.edu/erc_analysis/).

Collapse

Soul J, Hardingham TE, Boot-Handford RP, Schwartz JM. PhenomeExpress: a refined network analysis of expression datasets by inclusion of known disease phenotypes. Sci Rep 2015;5:8117. [PMID: 25631385 PMCID: PMC4822650 DOI: 10.1038/srep08117] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 12/19/2014] [Indexed: 12/19/2022] Open

Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AHC, Speijer D, Moerland PD. Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets. BMC Bioinformatics 2015;16:25. [PMID: 25627479 PMCID: PMC4384356 DOI: 10.1186/s12859-015-0455-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 01/09/2015] [Indexed: 01/17/2023] Open

Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk ADJ. Prioritization of candidate genes in QTL regions based on associations between traits and biological processes. BMC PLANT BIOLOGY 2014;14:330. [PMID: 25492368 PMCID: PMC4274756 DOI: 10.1186/s12870-014-0330-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 11/10/2014] [Indexed: 05/18/2023]

Joice R, Yasuda K, Shafquat A, Morgan XC, Huttenhower C. Determining microbial products and identifying molecular targets in the human microbiome. Cell Metab 2014;20:731-741. [PMID: 25440055 PMCID: PMC4254638 DOI: 10.1016/j.cmet.2014.10.003] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Natarajan N, Dhillon IS. Inductive matrix completion for predicting gene-disease associations. Bioinformatics 2014;30:i60-68. [PMID: 24932006 PMCID: PMC4058925 DOI: 10.1093/bioinformatics/btu269] [Citation(s) in RCA: 134] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Abstract

MOTIVATION

Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive.

RESULTS

Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature.

AVAILABILITY

Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.

Collapse

Jiang L, Edwards SM, Thomsen B, Workman CT, Guldbrandtsen B, Sørensen P. A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records. BMC Bioinformatics 2014;15:315. [PMID: 25253562 PMCID: PMC4181406 DOI: 10.1186/1471-2105-15-315] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 09/17/2014] [Indexed: 12/12/2022] Open

Abstract

BACKGROUND

Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization.

RESULTS

We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance.

CONCLUSION

We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.

Collapse

Oellrich A, Koehler S, Washington N, Mungall C, Lewis S, Haendel M, Robinson PN, Smedley D. The influence of disease categories on gene candidate predictions from model organism phenotypes. J Biomed Semantics 2014;5:S4. [PMID: 25093073 PMCID: PMC4108905 DOI: 10.1186/2041-1480-5-s1-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Guala D, Sjölund E, Sonnhammer ELL. MaxLink: network-based prioritization of genes tightly linked to a disease seed set. Bioinformatics 2014;30:2689-90. [DOI: 10.1093/bioinformatics/btu344] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open