1
|
Antonazzo G, Gaudet P, Lovering RC, Attrill H. Representation of non-coding RNA-mediated regulation of gene expression using the Gene Ontology. RNA Biol 2024; 21:36-48. [PMID: 39374113 DOI: 10.1080/15476286.2024.2408523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 09/16/2024] [Accepted: 09/20/2024] [Indexed: 10/09/2024] Open
Abstract
Regulatory non-coding RNAs (ncRNAs) are increasingly recognized as integral to the control of biological processes. This is often through the targeted regulation of mRNA expression, but this is by no means the only mechanism through which regulatory ncRNAs act. The Gene Ontology (GO) has long been used for the systematic annotation of protein-coding and ncRNA gene function, but rapid progress in the understanding of ncRNAs meant that the ontology needed to be revised to accurately reflect current knowledge. Here, a targeted effort to revise GO terms used for the annotation of regulatory ncRNAs is described, focusing on microRNAs (miRNAs), long non-coding RNAs (lncRNAs), small interfering RNAs (siRNAs) and PIWI-interacting RNAs (piRNAs). This paper provides guidance to biocurators annotating ncRNA-mediated processes using the GO and serves as background for researchers wishing to make use of the GO in their studies of ncRNAs and the biological processes they regulate.
Collapse
Affiliation(s)
- Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, Swiss-Prot Group, Geneva, Switzerland
| | - Ruth C Lovering
- Functional Gene Annotation, Institute of Cardiovascular Science, University College London, London, UK
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| |
Collapse
|
2
|
Chen J, Goudey B, Zobel J, Geard N, Verspoor K. Exploring automatic inconsistency detection for literature-based gene ontology annotation. Bioinformatics 2022; 38:i273-i281. [PMID: 35758780 PMCID: PMC9235499 DOI: 10.1093/bioinformatics/btac230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/08/2022] [Indexed: 11/12/2022] Open
Abstract
Motivation Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. Results We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.
Collapse
Affiliation(s)
- Jiyu Chen
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Benjamin Goudey
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Justin Zobel
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia.,School of Computer Technologies, RMIT University, Melbourne, VIC 3000, Australia
| |
Collapse
|
3
|
Zhang L, Wang F, Gao G, Yan X, Liu H, Liu Z, Wang Z, He L, Lv Q, Wang Z, Wang R, Zhang Y, Li J, Su R. Genome-Wide Association Study of Body Weight Traits in Inner Mongolia Cashmere Goats. Front Vet Sci 2021; 8:752746. [PMID: 34926636 PMCID: PMC8673091 DOI: 10.3389/fvets.2021.752746] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/27/2021] [Indexed: 11/13/2022] Open
Abstract
Objective: Body weight is an important economic trait for a goat, which greatly affects animal growth and survival. The purpose of this study was to identify genes associated with birth weight (BW), weaning weight (WW), and yearling weight (YW). Materials and Methods: In this study, a genome-wide association study (GWAS) of BW, WW, and YW was determined using the GGP_Goat_70K single-nucleotide polymorphism (SNP) chip in 1,920 Inner Mongolia cashmere goats. Results: We discovered that 21 SNPs were significantly associated with BW on the genome-wide levels. These SNPs were located in 10 genes, e.g., Mitogen-Activated Protein Kinase 3 (MAPK3), LIM domain binding 2 (LDB2), and low-density lipoprotein receptor-related protein 1B (LRP1B), which may be related to muscle growth and development in Inner Mongolia Cashmere goats. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis revealed that these genes were significantly enriched in the regulation of actin cytoskeleton and phospholipase D signaling pathway etc. Conclusion: In summary, this study will improve the marker-assisted breeding of Inner Mongolia cashmere goats and the molecular mechanisms of important economic traits.
Collapse
Affiliation(s)
- Lei Zhang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Jinlai Livestock Technology Co., Ltd, Hohhot, China
| | - Fenghong Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Gong Gao
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Xiaochun Yan
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Hongfu Liu
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Zhihong Liu
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China.,Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture and Rural Affairs, Hohhot, China.,Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
| | - Zhixin Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China.,Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture and Rural Affairs, Hohhot, China
| | - Libing He
- Inner Mongolia Jinlai Livestock Technology Co., Ltd, Hohhot, China
| | - Qi Lv
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Zhiying Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Ruijun Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yanjun Zhang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China.,Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture and Rural Affairs, Hohhot, China.,Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
| | - Jinquan Li
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China.,Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture and Rural Affairs, Hohhot, China.,Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
| | - Rui Su
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
4
|
Saverimuttu SCC, Kramarz B, Rodríguez-López M, Garmiri P, Attrill H, Thurlow KE, Makris M, de Miranda Pinheiro S, Orchard S, Lovering RC. Gene Ontology curation of the blood-brain barrier to improve the analysis of Alzheimer's and other neurological diseases. Database (Oxford) 2021; 2021:baab067. [PMID: 34697638 PMCID: PMC8546235 DOI: 10.1093/database/baab067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 09/07/2021] [Accepted: 10/06/2021] [Indexed: 01/08/2023]
Abstract
The role of the blood-brain barrier (BBB) in Alzheimer's and other neurodegenerative diseases is still the subject of many studies. However, those studies using high-throughput methods have been compromised by the lack of Gene Ontology (GO) annotations describing the role of proteins in the normal function of the BBB. The GO Consortium provides a gold-standard bioinformatics resource used for analysis and interpretation of large biomedical data sets. However, the GO is also used by other research communities and, therefore, must meet a variety of demands on the breadth and depth of information that is provided. To meet the needs of the Alzheimer's research community we have focused on the GO annotation of the BBB, with over 100 transport or junctional proteins prioritized for annotation. This project has led to a substantial increase in the number of human proteins associated with BBB-relevant GO terms as well as more comprehensive annotation of these proteins in many other processes. Furthermore, data describing the microRNAs that regulate the expression of these priority proteins have also been curated. Thus, this project has increased both the breadth and depth of annotation for these prioritized BBB proteins. Database URLhttps://www.ebi.ac.uk/QuickGO/.
Collapse
Affiliation(s)
- Shirin C C Saverimuttu
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), Rayne Building, 5 University Street, London WC1E 6JF, UK
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1ST, UK
| | - Barbara Kramarz
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), Rayne Building, 5 University Street, London WC1E 6JF, UK
| | - Milagros Rodríguez-López
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1ST, UK
| | - Penelope Garmiri
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1ST, UK
| | - Helen Attrill
- FlyBase, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Katherine E Thurlow
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), Rayne Building, 5 University Street, London WC1E 6JF, UK
| | - Marios Makris
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), Rayne Building, 5 University Street, London WC1E 6JF, UK
| | - Sandra de Miranda Pinheiro
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), Rayne Building, 5 University Street, London WC1E 6JF, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1ST, UK
| | - Ruth C Lovering
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), Rayne Building, 5 University Street, London WC1E 6JF, UK
| |
Collapse
|
5
|
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
6
|
Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter MC, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, Masson P, Morgat A, Neto TB, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Pozzato M, Pruess M, Rivoire C, Sigrist C, Sonesson K, Stutz A, Sundaram S, Tognolli M, Verbregue L, Wu CH, Arighi CN, Arminski L, Chen C, Chen Y, Garavelli JS, Huang H, Laiho K, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Q, Wang Y, Yeh LS, Zhang J, Ruch P, Teodoro D. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49:D480-D489. [PMID: 33237286 PMCID: PMC7778908 DOI: 10.1093/nar/gkaa1100] [Citation(s) in RCA: 3821] [Impact Index Per Article: 1273.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/21/2020] [Accepted: 11/02/2020] [Indexed: 02/07/2023] Open
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
7
|
Sweeney BA, Petrov AI, Ribas CE, Finn RD, Bateman A, Szymanski M, Karlowski WM, Seemann SE, Gorodkin J, Cannone JJ, Gutell RR, Kay S, Marygold S, dos Santos G, Frankish A, Mudge JM, Barshir R, Fishilevich S, Chan PP, Lowe TM, Seal R, Bruford E, Panni S, Porras P, Karagkouni D, Hatzigeorgiou AG, Ma L, Zhang Z, Volders PJ, Mestdagh P, Griffiths-Jones S, Fromm B, Peterson KJ, Kalvari I, Nawrocki EP, Petrov AS, Weng S, Bouchard-Bourelle P, Scott M, Lui LM, Hoksza D, Lovering RC, Kramarz B, Mani P, Ramachandran S, Weinberg Z. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res 2021; 49:D212-D220. [PMID: 33106848 PMCID: PMC7779037 DOI: 10.1093/nar/gkaa921] [Citation(s) in RCA: 143] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 10/05/2020] [Indexed: 12/16/2022] Open
Abstract
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world's largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org.
Collapse
|
8
|
Wood V, Carbon S, Harris MA, Lock A, Engel SR, Hill DP, Van Auken K, Attrill H, Feuermann M, Gaudet P, Lovering RC, Poux S, Rutherford KM, Mungall CJ. Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. Open Biol 2020; 10:200149. [PMID: 32875947 PMCID: PMC7536087 DOI: 10.1098/rsob.200149] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 08/06/2020] [Indexed: 12/11/2022] Open
Abstract
Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.
Collapse
Affiliation(s)
- Valerie Wood
- Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Midori A. Harris
- Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Antonia Lock
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6B, UK
| | - Stacia R. Engel
- Department of Genetics, Stanford University, Palo Alto, CA 94304-5477, USA
| | - David P. Hill
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Marc Feuermann
- Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
| | - Pascale Gaudet
- Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
| | - Ruth C. Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London, London WC1E 6JF, UK
| | - Sylvain Poux
- Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
| | - Kim M. Rutherford
- Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
9
|
Kong Y, Qiao Z, Ren Y, Genchev GZ, Ge M, Xiao H, Zhao H, Lu H. Integrative Analysis of Membrane Proteome and MicroRNA Reveals Novel Lung Cancer Metastasis Biomarkers. Front Genet 2020; 11:1023. [PMID: 33005184 PMCID: PMC7483668 DOI: 10.3389/fgene.2020.01023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 08/11/2020] [Indexed: 12/12/2022] Open
Abstract
Lung cancer is one of the most common human cancers both in incidence and mortality, with prognosis particularly poor in metastatic cases. Metastasis in lung cancer is a multifarious process driven by a complex regulatory landscape involving many mechanisms, genes, and proteins. Membrane proteins play a crucial role in the metastatic journey both inside tumor cells and the extra-cellular matrix and are a viable area of research focus with the potential to uncover biomarkers and drug targets. In this work we performed membrane proteome analysis of highly and poorly metastatic lung cells which integrated genomic, proteomic, and transcriptional data. A total of 1,762 membrane proteins were identified, and within this set, there were 163 proteins with significant changes between the two cell lines. We applied the Tied Diffusion through Interacting Events method to integrate the differentially expressed disease-related microRNAs and functionally dys-regulated membrane protein information to further explore the role of key membrane proteins and microRNAs in multi-omics context. Has-miR-137 was revealed as a key gene involved in the activity of membrane proteins by targeting MET and PXN, affecting membrane proteins through protein-protein interaction mechanism. Furthermore, we found that the membrane proteins CDH2, EGFR, ITGA3, ITGA5, ITGB1, and CALR may have significant effect on cancer prognosis and outcomes, which were further validated in vitro. Our study provides multi-omics-based network method of integrating microRNAs and membrane proteome information, and uncovers a differential molecular signatures of highly and poorly metastatic lung cancer cells; these molecules may serve as potential targets for giant-cell lung metastasis treatment and prognosis.
Collapse
Affiliation(s)
- Yan Kong
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhi Qiao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yongyong Ren
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Georgi Z Genchev
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai, China.,Bulgarian Institute for Genomics and Precision Medicine, Sofia, Bulgaria
| | - Maolin Ge
- State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Rui Jin Hospital, School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Hua Xiao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, United States
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai, China
| |
Collapse
|
10
|
Breuza L, Arighi CN, Argoud-Puy G, Casals-Casas C, Estreicher A, Famiglietti ML, Georghiou G, Gos A, Gruaz-Gumowski N, Hinz U, Hyka-Nouspikel N, Kramarz B, Lovering RC, Lussi Y, Magrane M, Masson P, Perfetto L, Poux S, Rodriguez-Lopez M, Stoeckert C, Sundaram S, Wang LS, Wu E, Orchard S. A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets. J Alzheimers Dis 2020; 77:257-273. [PMID: 32716361 PMCID: PMC7592670 DOI: 10.3233/jad-200206] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2020] [Indexed: 01/08/2023]
Abstract
BACKGROUND The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. OBJECTIVE To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer's disease research. METHODS We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. RESULTS Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. CONCLUSION This represents a significant enhancement in the expert curated data pertinent to Alzheimer's disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.
Collapse
Affiliation(s)
- Lionel Breuza
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Cecilia N. Arighi
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
- Protein Information Resource, University of Delaware, Newark, DE, USA
| | - Ghislaine Argoud-Puy
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Cristina Casals-Casas
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Anne Estreicher
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Maria Livia Famiglietti
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - George Georghiou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Arnaud Gos
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Nadine Gruaz-Gumowski
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Ursula Hinz
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Nevila Hyka-Nouspikel
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Barbara Kramarz
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
| | - Ruth C. Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
| | - Yvonne Lussi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Michele Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Patrick Masson
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Milagros Rodriguez-Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Christian Stoeckert
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Shyamala Sundaram
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Li-San Wang
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - IMEx Consortium, UniProt Consortium
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
- Protein Information Resource, University of Delaware, Newark, DE, USA
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Alzforum, Cambridge, MA, USA
| |
Collapse
|