1
|
Yousefi K, Abdullah SNA, Hatta MAM, Ling KL. Genomics and Transcriptomics Reveal Genetic Contribution to Population Diversity and Specific Traits in Coconut. PLANTS (BASEL, SWITZERLAND) 2023; 12:plants12091913. [PMID: 37176970 PMCID: PMC10181077 DOI: 10.3390/plants12091913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 04/10/2023] [Accepted: 04/11/2023] [Indexed: 05/15/2023]
Abstract
Coconut is an economically important palm species with a long history of human use. It has applications in various food, nutraceuticals, and cosmetic products, and there has been renewed interest in coconut in recent years due to its unique nutritional and medicinal properties. Unfortunately, the sustainable growth of the coconut industry has been hampered due to a shortage of good quality seedlings. Genetic improvement through the traditional breeding approach faced considerable obstacles due to its perennial nature, protracted juvenile period, and high heterozygosity. Molecular biotechnological tools, including molecular markers and next-generation sequencing (NGS), could expedite genetic improvement efforts in coconut. Researchers have employed various molecular markers to reveal genetic diversity among coconut populations and for the construction of a genetic map for exploitation in coconut breeding programs worldwide. Whole genome sequencing and transcriptomics on the different varieties have generated a massive amount of publicly accessible sequence data, substantially improving the ability to analyze and understand molecular mechanisms affecting crop performance. The production of high-yielding and disease-resilient coconuts and the deciphering of the complex coconut genome's structure can profit tremendously from these technologies. This paper aims to provide a comprehensive review of the progress of coconut research, using genomics, transcriptomics, and molecular markers initiatives.
Collapse
Affiliation(s)
- Kobra Yousefi
- Department of Agriculture Technology, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
| | - Siti Nor Akmar Abdullah
- Department of Agriculture Technology, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
- Laboratory of Sustainable Agronomy and Crop Protection, Institute of Plantation Studies, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
| | - Muhammad Asyraf Md Hatta
- Department of Agriculture Technology, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
| | - Kong Lih Ling
- Laboratory of Sustainable Agronomy and Crop Protection, Institute of Plantation Studies, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
| |
Collapse
|
2
|
D’Esposito D, Guadagno A, Amoroso CG, Cascone P, Cencetti G, Michelozzi M, Guerrieri E, Ercolano MR. Genomic and metabolic profiling of two tomato contrasting cultivars for tolerance to Tuta absoluta. PLANTA 2023; 257:47. [PMID: 36708391 PMCID: PMC9884263 DOI: 10.1007/s00425-023-04073-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 01/11/2023] [Indexed: 06/18/2023]
Abstract
Dissimilar patterns of variants affecting genes involved in response to herbivory, including those leading to difference in VOC production, were identified in tomato lines with contrasting response to Tuta absoluta. Tuta absoluta is one of the most destructive insect pest affecting tomato production, causing important yield losses both in open field and greenhouse. The selection of tolerant varieties to T. absoluta is one of the sustainable approaches to control this invasive leafminer. In this study, the genomic diversity of two tomato varieties, one tolerant and the other susceptible to T. absoluta infestation was explored, allowing us to identify chromosome regions with highly dissimilar pattern. Genes affected by potential functional variants were involved in several processes, including response to herbivory and secondary metabolism. A metabolic analysis for volatile organic compounds (VOCs) was also performed, highlighting a difference in several classes of chemicals in the two genotypes. Taken together, these findings can aid tomato breeding programs aiming to develop tolerant plants to T. absoluta.
Collapse
Affiliation(s)
- Daniela D’Esposito
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, NA Italy
| | - Anna Guadagno
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, NA Italy
| | - Ciro Gianmaria Amoroso
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, NA Italy
| | - Pasquale Cascone
- Institute for Sustainable Plant Protection, National Research Council of Italy, 80055 Portici, NA Italy
| | - Gabriele Cencetti
- Institute of Biosciences and Bioresources, National Research Council of Italy, 50019 Sesto Fiorentino, FI Italy
| | - Marco Michelozzi
- Institute of Biosciences and Bioresources, National Research Council of Italy, 50019 Sesto Fiorentino, FI Italy
| | - Emilio Guerrieri
- Institute for Sustainable Plant Protection, National Research Council of Italy, 80055 Portici, NA Italy
| | | |
Collapse
|
3
|
Rifat MH, Ahmed J, Ahmed M, Ahmed F, Gulshan A, Hasan M. Prediction and expression analysis of deleterious nonsynonymous SNPs of Arabidopsis ACD11 gene by combining computational algorithms and molecular docking approach. PLoS Comput Biol 2022; 18:e1009539. [PMID: 35709304 PMCID: PMC9242461 DOI: 10.1371/journal.pcbi.1009539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 06/29/2022] [Accepted: 05/09/2022] [Indexed: 11/18/2022] Open
Abstract
Accelerated cell death 11 (ACD11) is an autoimmune gene that suppresses pathogen infection in plants by preventing plant cells from becoming infected by any pathogen. This gene is widely known for growth inhibition, premature leaf chlorosis, and defense-related programmed cell death (PCD) in seedlings before flowering in Arabidopsis plant. Specific amino acid changes in the ACD11 protein’s highly conserved domains are linked to autoimmune symptoms including constitutive defensive responses and necrosis without pathogen awareness. The molecular aspect of the aberrant activity of the ACD11 protein is difficult to ascertain. The purpose of our study was to find the most deleterious mutation position in the ACD11 protein and correlate them with their abnormal expression pattern. Using several computational methods, we discovered PCD vulnerable single nucleotide polymorphisms (SNPs) in ACD11. We analysed the RNA-Seq data, identified the detrimental nonsynonymous SNPs (nsSNP), built genetically mutated protein structures and used molecular docking to assess the impact of mutation. Our results demonstrated that the A15T and A39D mutations in the GLTP domain were likely to be extremely detrimental mutations that inhibit the expression of the ACD11 protein domain by destabilizing its composition, as well as disrupt its catalytic effectiveness. When compared to the A15T mutant, the A39D mutant was more likely to destabilize the protein structure. In conclusion, these mutants can aid in the better understanding of the vast pool of PCD susceptibilities connected to ACD11 gene GLTP domain activation. Non synonymous single nucleotide polymorphism (nsSNP) is a process in which amino acid sequence of a protein is altered as a result of single nucleotide alteration in the coding region (mRNA) of any living organism. Therefore, the entire protein structure, interactions and stability are altered, which may have a negative impact on living organisms. Hence, to completely comprehend this biological process, we must first solve the unresolved mutational protein structure and mutated protein interactions. The major goal of our research is to identify the most harmful mutation in our target protein structure and how it interacts within cells. However, it was discovered that only a few alterations in residues had the largest negative impact on the protein’s internal structure and also on the protein-ligand interactions. We show that based on the amino acid sequence of a protein computationally, it is feasible to discover mutational positions in the sequence, generate mutation protein structure and interactions with related ligands. Our findings show that the essential mechanisms underlying protein mutations generated by this process are identical. The capacity to correctly detect mutations from sequence allows the annotation and study of protein-ligand interactions throughout a whole organism, which might aid function prediction and gene expression.
Collapse
Affiliation(s)
| | - Jamil Ahmed
- Department of Biochemistry and Chemistry, Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh
- * E-mail:
| | - Milad Ahmed
- Department of Animal and Fish Biotechnology, Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Foeaz Ahmed
- Department of Molecular Biology and Genetic Engineering, Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Airin Gulshan
- Department of Pharmaceuticals and Industrial Biotechnology, Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Mahmudul Hasan
- Department of Pharmaceuticals and Industrial Biotechnology, Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh
| |
Collapse
|
4
|
The structure-based cancer-related single amino acid variation prediction. Sci Rep 2021; 11:13599. [PMID: 34193921 PMCID: PMC8245468 DOI: 10.1038/s41598-021-92793-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/16/2021] [Indexed: 11/09/2022] Open
Abstract
Single amino acid variation (SAV) is an amino acid substitution of the protein sequence that can potentially influence the entire protein structure or function, as well as its binding affinity. Protein destabilization is related to diseases, including several cancers, although using traditional experiments to clarify the relationship between SAVs and cancer uses much time and resources. Some SAV prediction methods use computational approaches, with most predicting SAV-induced changes in protein stability. In this investigation, all SAV characteristics generated from protein sequences, structures and the microenvironment were converted into feature vectors and fed into an integrated predicting system using a support vector machine and genetic algorithm. Critical features were used to estimate the relationship between their properties and cancers caused by SAVs. We describe how we developed a prediction system based on protein sequences and structure that is capable of distinguishing if the SAV is related to cancer or not. The five-fold cross-validation performance of our system is 89.73% for the accuracy, 0.74 for the Matthews correlation coefficient, and 0.81 for the F1 score. We have built an online prediction server, CanSavPre ( http://bioinfo.cmu.edu.tw/CanSavPre/ ), which is expected to become a useful, practical tool for cancer research and precision medicine.
Collapse
|
5
|
Savojardo C, Babbi G, Martelli PL, Casadio R. Mapping OMIM Disease-Related Variations on Protein Domains Reveals an Association Among Variation Type, Pfam Models, and Disease Classes. Front Mol Biosci 2021; 8:617016. [PMID: 34026820 PMCID: PMC8138129 DOI: 10.3389/fmolb.2021.617016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/09/2021] [Indexed: 12/23/2022] Open
Abstract
Human genome resequencing projects provide an unprecedented amount of data about single-nucleotide variations occurring in protein-coding regions and often leading to observable changes in the covalent structure of gene products. For many of these variations, links to Online Mendelian Inheritance in Man (OMIM) genetic diseases are available and are reported in many databases that are collecting human variation data such as Humsavar. However, the current knowledge on the molecular mechanisms that are leading to diseases is, in many cases, still limited. For understanding the complex mechanisms behind disease insurgence, the identification of putative models, when considering the protein structure and chemico-physical features of the variations, can be useful in many contexts, including early diagnosis and prognosis. In this study, we investigate the occurrence and distribution of human disease–related variations in the context of Pfam domains. The aim of this study is the identification and characterization of Pfam domains that are statistically more likely to be associated with disease-related variations. The study takes into consideration 2,513 human protein sequences with 22,763 disease-related variations. We describe patterns of disease-related variation types in biunivocal relation with Pfam domains, which are likely to be possible markers for linking Pfam domains to OMIM diseases. Furthermore, we take advantage of the specific association between disease-related variation types and Pfam domains for clustering diseases according to the Human Disease Ontology, and we establish a relation among variation types, Pfam domains, and disease classes. We find that Pfam models are specific markers of patterns of variation types and that they can serve to bridge genes, diseases, and disease classes. Data are available as Supplementary Material for 1,670 Pfam models, including 22,763 disease-related variations associated to 3,257 OMIM diseases.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| |
Collapse
|
6
|
Bermúdez-Guzmán L, Jimenez-Huezo G, Arguedas A, Leal A. Mutational survivorship bias: The case of PNKP. PLoS One 2020; 15:e0237682. [PMID: 33332469 PMCID: PMC7746193 DOI: 10.1371/journal.pone.0237682] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 11/23/2020] [Indexed: 01/21/2023] Open
Abstract
The molecular function of a protein relies on its structure. Understanding how variants alter structure and function in multidomain proteins is key to elucidate the generation of a pathological phenotype. However, one may fall into the logical bias of assessing protein damage only based on the variants that are visible (survivorship bias), which can lead to partial conclusions. This is the case of PNKP, an important nuclear and mitochondrial DNA repair enzyme with both kinase and phosphatase function. Most variants in PNKP are confined to the kinase domain, leading to a pathological spectrum of three apparently distinct clinical entities. Since proteins and domains may have a different tolerability to variation, we evaluated whether variants in PNKP are under survivorship bias. Here, we provide the evidence that supports a higher tolerance in the kinase domain even when all variants reported are deleterious. Instead, the phosphatase domain is less tolerant due to its lower variant rates, a higher degree of sequence conservation, lower dN/dS ratios, and the presence of more disease-propensity hotspots. Together, our results support previous experimental evidence that demonstrated that the phosphatase domain is functionally more necessary and relevant for DNA repair, especially in the context of the development of the central nervous system. Finally, we propose the term "Wald’s domain" for future studies analyzing the possible survivorship bias in multidomain proteins.
Collapse
Affiliation(s)
- Luis Bermúdez-Guzmán
- Section of Genetics and Biotechnology, School of Biology, University de Costa Rica, San Pedro, San José, Costa Rica
| | - Gabriel Jimenez-Huezo
- Section of Genetics and Biotechnology, School of Biology, University de Costa Rica, San Pedro, San José, Costa Rica
| | - Andrés Arguedas
- School of Statistics, University de Costa Rica, San Pedro, San José, Costa Rica
| | - Alejandro Leal
- Section of Genetics and Biotechnology, School of Biology, University de Costa Rica, San Pedro, San José, Costa Rica
| |
Collapse
|
7
|
Iqbal S, Pérez-Palma E, Jespersen JB, May P, Hoksza D, Heyne HO, Ahmed SS, Rifat ZT, Rahman MS, Lage K, Palotie A, Cottrell JR, Wagner FF, Daly MJ, Campbell AJ, Lal D. Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc Natl Acad Sci U S A 2020; 117:28201-28211. [PMID: 33106425 PMCID: PMC7668189 DOI: 10.1073/pnas.2002660117] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.
Collapse
Affiliation(s)
- Sumaiya Iqbal
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142;
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114
| | - Eduardo Pérez-Palma
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195
| | - Jakob B Jespersen
- Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4365 Esch-sur-Alzette, Luxembourg
| | - David Hoksza
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4365 Esch-sur-Alzette, Luxembourg
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague 11636, Czech Republic
| | - Henrike O Heyne
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114
- Institute for Molecular Medicine Finland, University of Helsinki, 00100 Helsinki, Finland
| | - Shehab S Ahmed
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Zaara T Rifat
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - M Sohel Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Kasper Lage
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
- Department of Surgery, Massachusetts General Hospital, Boston, MA 02114
| | - Aarno Palotie
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Institute for Molecular Medicine Finland, University of Helsinki, 00100 Helsinki, Finland
| | - Jeffrey R Cottrell
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
| | - Florence F Wagner
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
| | - Mark J Daly
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114
- Institute for Molecular Medicine Finland, University of Helsinki, 00100 Helsinki, Finland
| | - Arthur J Campbell
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142;
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142
| | - Dennis Lal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142;
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195
- Cologne Center for Genomics, University of Cologne, 50931 Cologne, Germany
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195
| |
Collapse
|
8
|
Insights into changes in binding affinity caused by disease mutations in protein-protein complexes. Comput Biol Med 2020; 123:103829. [DOI: 10.1016/j.compbiomed.2020.103829] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/20/2020] [Accepted: 05/20/2020] [Indexed: 01/11/2023]
|
9
|
Zhou J, Yang L, Yu J, Zhang K, Xu Z, Cao Z, Luan P, Li H, Zhang H. Association of
PCSK1
gene polymorphisms with abdominal fat content in broilers. Anim Sci J 2020; 91:e13371. [PMID: 32285539 DOI: 10.1111/asj.13371] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 03/03/2020] [Accepted: 03/10/2020] [Indexed: 12/29/2022]
Abstract
Protein proteolytic enzymes (Proprotein Convertase, PC) is a Ca2+ -dependent serine protease family, whose main function is to cleave precursors of biologically inactive proteins or peptide chains into active functional molecules. Proprotein convertase subtilisin/kexin type 1 (PCSK1) gene is mainly expressed in nerve and endocrine tissues. In this study, PCSK1 was selected as an important candidate gene for abdominal fat content in broilers. We cloned the exon region of chicken PCSK1 gene and found six single-nucleotide polymorphisms (SNPs). Association analysis was carried out and we found that the polymorphisms of these six SNPs were significantly associated with abdominal fat content in G19 and G20 populations. Five of these SNPs were significantly associated with abdominal fat content in G19 and G20 combined population. The polymorphism of these five SNPs was significantly correlated with the abdominal fat content of AA broilers. Together, our study demonstrated that c.927T>C, c.1880C>T, c.*900G>A, and c.*1164C>T were significantly associated with abdominal fat content in populations used in this study, which means that these SNPs in PCSK1 gene could be used as candidate markers to select lean broiler lines.
Collapse
Affiliation(s)
- Jiamei Zhou
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Lili Yang
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Jiaqiang Yu
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Ke Zhang
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Zichun Xu
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Zhiping Cao
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Peng Luan
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Hui Li
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| | - Hui Zhang
- Key Laboratory of Chicken Genetics and Breeding Ministry of Agriculture and Rural Affairs Key Laboratory of Animal Genetics, Breeding and Reproduction Education Department of Heilongjiang Province College of Animal Science and Technology Northeast Agricultural University Harbin P. R. China
| |
Collapse
|
10
|
Iancu D, Ashton E. Inherited Renal Tubulopathies-Challenges and Controversies. Genes (Basel) 2020; 11:genes11030277. [PMID: 32150856 PMCID: PMC7140864 DOI: 10.3390/genes11030277] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 02/29/2020] [Accepted: 02/29/2020] [Indexed: 12/23/2022] Open
Abstract
Electrolyte homeostasis is maintained by the kidney through a complex transport function mostly performed by specialized proteins distributed along the renal tubules. Pathogenic variants in the genes encoding these proteins impair this function and have consequences on the whole organism. Establishing a genetic diagnosis in patients with renal tubular dysfunction is a challenging task given the genetic and phenotypic heterogeneity, functional characteristics of the genes involved and the number of yet unknown causes. Part of these difficulties can be overcome by gathering large patient cohorts and applying high-throughput sequencing techniques combined with experimental work to prove functional impact. This approach has led to the identification of a number of genes but also generated controversies about proper interpretation of variants. In this article, we will highlight these challenges and controversies.
Collapse
Affiliation(s)
- Daniela Iancu
- UCL-Centre for Nephrology, Royal Free Campus, University College London, Rowland Hill Street, London NW3 2PF, UK
- Correspondence: ; Tel.: +44-2381204172; Fax: +44-020-74726476
| | - Emma Ashton
- Rare & Inherited Disease Laboratory, London North Genomic Laboratory Hub, Great Ormond Street Hospital for Children National Health Service Foundation Trust, Levels 4-6 Barclay House 37, Queen Square, London WC1N 3BH, UK;
| |
Collapse
|
11
|
Guajardo V, Solís S, Almada R, Saski C, Gasic K, Moreno MÁ. Genome-wide SNP identification in Prunus rootstocks germplasm collections using Genotyping-by-Sequencing: phylogenetic analysis, distribution of SNPs and prediction of their effect on gene function. Sci Rep 2020; 10:1467. [PMID: 32001784 PMCID: PMC6992769 DOI: 10.1038/s41598-020-58271-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 12/15/2019] [Indexed: 01/09/2023] Open
Abstract
Genotyping-by-Sequencing (GBS) was applied in a set of 53 diploid Prunus rootstocks and five scion cultivars from three subgenera (Amygdalus, Prunus and Cerasus) for genome-wide SNP identification and to assess genetic diversity of both Chilean and Spanish germplasm collections. A group of 45,382 high quality SNPs (MAF >0.05; missing data <5%) were selected for analysis of this group of 58 accessions. These SNPs were distributed in genic and intergenic regions in the eight pseudomolecules of the peach genome (Peach v2.0), with an average of 53% located in exonic regions. The genetic diversity detected among the studied accessions divided them in three groups, which are in agreement with their current taxonomic classification. SNPs were classified based on their putative effect on annotated genes and KOG analysis was carried out to provide a deeper understanding of the function of 119 genes affected by high-impact SNPs. Results demonstrate the high utility for Prunus rootstocks identification and studies of diversity in Prunus species. Also, given the high number of SNPs identified in exonic regions, this strategy represents an important tool for finding candidate genes underlying traits of interest and potential functional markers for use in marker-assisted selection.
Collapse
Affiliation(s)
| | - Simón Solís
- Centro de Estudios Avanzados en Fruticultura (CEAF), Rengo, Chile
| | - Rubén Almada
- Centro de Estudios Avanzados en Fruticultura (CEAF), Rengo, Chile
| | - Christopher Saski
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, 29634, USA
| | - Ksenija Gasic
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, 29634, USA
| | - María Ángeles Moreno
- Department of Pomology, Estación Experimental de Aula Dei-CSIC, 50059, Zaragoza, Spain.
| |
Collapse
|
12
|
Quan L, Wu H, Lyu Q, Zhang Y. DAMpred: Recognizing Disease-Associated nsSNPs through Bayes-Guided Neural-Network Model Built on Low-Resolution Structure Prediction of Proteins and Protein-Protein Interactions. J Mol Biol 2019; 431:2449-2459. [PMID: 30796987 DOI: 10.1016/j.jmb.2019.02.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 02/09/2019] [Accepted: 02/11/2019] [Indexed: 02/05/2023]
Abstract
Nearly one-third of non-synonymous single-nucleotide polymorphism (nsSNPs) are deleterious to human health, but recognition of the disease-associated mutations remains a significant unsolved problem. We proposed a new algorithm, DAMpred, to identify disease-causing nsSNPs through the coupling of evolutionary profiles with structure predictions of proteins and protein-protein interactions. The pipeline was trained by a novel Bayes-guided artificial neural network algorithm that incorporates posterior probabilities of distinct feature classifiers with the network training process. DAMpred was tested on a large-scale data set involving 10,635 nsSNPs from 2154 ORFs in the human genome and recognized disease-associated nsSNPs with an accuracy 0.80 and a Matthews correlation coefficient of 0.601, which is 9.1% higher than the best of other state-of-the-art methods. In the blind test on the TP53 gene, DAMpred correctly recognized the mutations causative of Li-Fraumeni-like syndrome with a Matthews correlation coefficient that is 27% higher than the control methods. The study demonstrates an efficient avenue to quantitatively model the association of nsSNPs with human diseases from low-resolution protein structure prediction, which should find important usefulness in diagnosis and treatment of genetic diseases.
Collapse
Affiliation(s)
- Lijun Quan
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215000, China; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hongjie Wu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China
| | - Qiang Lyu
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215000, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, Jiangsu 215000, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
13
|
Ashford P, Pang CSM, Moya-García AA, Adeyelu T, Orengo CA. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci Rep 2019; 9:263. [PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/13/2018] [Indexed: 12/31/2022] Open
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.
Collapse
Affiliation(s)
- Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Aurelio A Moya-García
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.,Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, Málaga, Spain
| | - Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
14
|
Cornish AJ, David A, Sternberg MJE. PhenoRank: reducing study bias in gene prioritization through simulation. Bioinformatics 2018; 34:2087-2095. [PMID: 29360927 PMCID: PMC5949213 DOI: 10.1093/bioinformatics/bty028] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 01/10/2018] [Accepted: 01/16/2018] [Indexed: 02/07/2023] Open
Abstract
Motivation Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10-16). Availability and implementation PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex J Cornish
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Alessia David
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| |
Collapse
|
15
|
Drug Target Protein-Protein Interaction Networks: A Systematic Perspective. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1289259. [PMID: 28691014 PMCID: PMC5485489 DOI: 10.1155/2017/1289259] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Revised: 03/09/2017] [Accepted: 05/10/2017] [Indexed: 01/17/2023]
Abstract
The identification and validation of drug targets are crucial in biomedical research and many studies have been conducted on analyzing drug target features for getting a better understanding on principles of their mechanisms. But most of them are based on either strong biological hypotheses or the chemical and physical properties of those targets separately. In this paper, we investigated three main ways to understand the functional biomolecules based on the topological features of drug targets. There are no significant differences between targets and common proteins in the protein-protein interactions network, indicating the drug targets are neither hub proteins which are dominant nor the bridge proteins. According to some special topological structures of the drug targets, there are significant differences between known targets and other proteins. Furthermore, the drug targets mainly belong to three typical communities based on their modularity. These topological features are helpful to understand how the drug targets work in the PPI network. Particularly, it is an alternative way to predict potential targets or extract nontargets to test a new drug target efficiently and economically. By this way, a drug target's homologue set containing 102 potential target proteins is predicted in the paper.
Collapse
|
16
|
Single-Nucleotide Polymorphism of PPARγ, a Protein at the Crossroads of Physiological and Pathological Processes. Int J Mol Sci 2017; 18:ijms18020361. [PMID: 28208577 PMCID: PMC5343896 DOI: 10.3390/ijms18020361] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 01/24/2017] [Accepted: 02/01/2017] [Indexed: 01/28/2023] Open
Abstract
Genome polymorphisms are responsible for phenotypic differences between humans and for individual susceptibility to genetic diseases and therapeutic responses. Non-synonymous single-nucleotide polymorphisms (nsSNPs) lead to protein variants with a change in the amino acid sequence that may affect the structure and/or function of the protein and may be utilized as efficient structural and functional markers of association to complex diseases. This study is focused on nsSNP variants of the ligand binding domain of PPARγ a nuclear receptor in the superfamily of ligand inducible transcription factors that play an important role in regulating lipid metabolism and in several processes ranging from cellular differentiation and development to carcinogenesis. Here we selected nine nsSNPs variants of the PPARγ ligand binding domain, V290M, R357A, R397C, F360L, P467L, Q286P, R288H, E324K, and E460K, expressed in cancer tissues and/or associated with partial lipodystrophy and insulin resistance. The effects of a single amino acid change on the thermodynamic stability of PPARγ, its spectral properties, and molecular dynamics have been investigated. The nsSNPs PPARγ variants show alteration of dynamics and tertiary contacts that impair the correct reciprocal positioning of helices 3 and 12, crucially important for PPARγ functioning.
Collapse
|
17
|
Bhardwaj A, Dhar YV, Asif MH, Bag SK. In Silico identification of SNP diversity in cultivated and wild tomato species: insight from molecular simulations. Sci Rep 2016; 6:38715. [PMID: 27929054 PMCID: PMC5144076 DOI: 10.1038/srep38715] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 11/15/2016] [Indexed: 12/12/2022] Open
Abstract
Single Nucleotide Polymorphisms (SNPs), an important source of genetic variations, are often used in crop improvement programme. The present study represented comprehensive In silico analysis of nucleotide polymorphisms in wild (Solanum habrochaites) and cultivated (Solanum lycopersicum) species of tomato to explore the consequence of substitutions both at sequence and structure level. A total of 8978 SNPs having Ts/Tv (Transition/Transversion) ratio 1.75 were identified from the Expressed Sequence Tag (EST) and Next Generation Sequence (NGS) data of both the species available in public databases. Out of these, 1838 SNPs were non-synonymous and distributed in 988 protein coding genes. Among these, 23 genes containing 96 SNPs were involved in traits markedly different between the two species. Furthermore, there were 28 deleterious SNPs distributed in 27 genes and a few of these genes were involved in plant pathogen interaction and plant hormone pathways. Molecular docking and simulations of several selected proteins showed the effect of SNPs in terms of compactness, conformation and interaction ability. Observed SNPs exhibited various types of motif binding effects due to nucleotide changes. SNPs that provide the evidence of differential motif binding and interaction behaviour could be effectively used for the crop improvement program.
Collapse
Affiliation(s)
- Archana Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India
- Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, 226001, India
| | - Yogeshwar Vikram Dhar
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India
- Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, 226001, India
| | - Mehar Hasan Asif
- Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, 226001, India
| | - Sumit K Bag
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India
- Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, 226001, India
| |
Collapse
|
18
|
Lu HC, Herrera Braga J, Fraternali F. PinSnps: structural and functional analysis of SNPs in the context of protein interaction networks. ACTA ACUST UNITED AC 2016; 32:2534-6. [PMID: 27153707 PMCID: PMC4978923 DOI: 10.1093/bioinformatics/btw153] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 03/15/2016] [Indexed: 12/24/2022]
Abstract
Summary: We present a practical computational pipeline to readily perform data analyses of protein–protein interaction networks by using genetic and functional information mapped onto protein structures. We provide a 3D representation of the available protein structure and its regions (surface, interface, core and disordered) for the selected genetic variants and/or SNPs, and a prediction of the mutants’ impact on the protein as measured by a range of methods. We have mapped in total 2587 genetic disorder-related SNPs from OMIM, 587 873 cancer-related variants from COSMIC, and 1 484 045 SNPs from dbSNP. All result data can be downloaded by the user together with an R-script to compute the enrichment of SNPs/variants in selected structural regions. Availability and Implementation: PinSnps is available as open-access service at http://fraternalilab.kcl.ac.uk/PinSnps/ Contact:franca.fraternali@kcl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hui-Chun Lu
- Randall Division of Cell and Molecular Biophysics, King's College London, London SE1 1UL, UK
| | - Julián Herrera Braga
- Randall Division of Cell and Molecular Biophysics, King's College London, London SE1 1UL, UK
| | - Franca Fraternali
- Randall Division of Cell and Molecular Biophysics, King's College London, London SE1 1UL, UK
| |
Collapse
|
19
|
Pang E, Wu X, Lin K. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences. Mol Genet Genomics 2016; 291:1127-36. [PMID: 26833483 PMCID: PMC4875946 DOI: 10.1007/s00438-016-1170-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/18/2016] [Indexed: 11/30/2022]
Abstract
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
Collapse
Affiliation(s)
- Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| | - Xiaomei Wu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 310036, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
20
|
Abstract
Interleukin (IL-)23 is a central cytokine controlling TH17 development. Overshooting IL-23 signaling contribute to autoimmune diseases. Moreover, GWAS studies have identified several SNPs within the IL-23 receptor, which are associated with autoimmune diseases. IL-23 is a member of the IL-12-type cytokine family and consists of IL-23p19 and p40. Within the IL-12 family, IL-12 and IL-23 share the p40 cytokine subunit and the IL-12Rβ1 as one chain of the receptor complex. For signaling, IL-23 triggers heterodimerization of IL-12Rβ1 and the IL-23R. Subsequently, signal transduction pathways including JAK/STAT, MAPK and PI3K are activated. Most studies have investigated the biological relevance of IL-23 in the development of TH17 cells and autoimmunity, whereas less is known about the molecular context of IL-23 biology. Therefore, we focused on IL-23 receptor complex assembly, signal transduction and functional relevance of IL-23R SNPs in the context of IL-23-inhibitory principles.
Collapse
|
21
|
van den Berg BA, Reinders MJT, de Ridder D, de Beer TAP. Insight into neutral and disease-associated human genetic variants through interpretable predictors. PLoS One 2015; 10:e0120729. [PMID: 25826299 PMCID: PMC4380319 DOI: 10.1371/journal.pone.0120729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Accepted: 01/14/2015] [Indexed: 11/30/2022] Open
Abstract
A variety of methods that predict human nonsynonymous single nucleotide polymorphisms (SNPs) to be neutral or disease-associated have been developed over the last decade. These methods are used for pinpointing disease-associated variants in the many variants obtained with next-generation sequencing technologies. The high performances of current sequence-based predictors indicate that sequence data contains valuable information about a variant being neutral or disease-associated. However, most predictors do not readily disclose this information, and so it remains unclear what sequence properties are most important. Here, we show how we can obtain insight into sequence characteristics of variants and their surroundings by interpreting predictors. We used an extensive range of features derived from the variant itself, its surrounding sequence, sequence conservation, and sequence annotation, and employed linear support vector machine classifiers to enable extracting feature importance from trained predictors. Our approach is useful for providing additional information about what features are most important for the predictions made. Furthermore, for large sets of known variants, it can provide insight into the mechanisms responsible for variants being disease-associated.
Collapse
Affiliation(s)
- Bastiaan A. van den Berg
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628CD, Delft, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
| | - Marcel J. T. Reinders
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628CD, Delft, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
| | - Dick de Ridder
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628CD, Delft, The Netherlands
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
| | - Tjaart A. P. de Beer
- European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- * E-mail:
| |
Collapse
|
22
|
Clowes C, Boylan MGS, Ridge LA, Barnes E, Wright JA, Hentges KE. The functional diversity of essential genes required for mammalian cardiac development. Genesis 2014; 52:713-37. [PMID: 24866031 PMCID: PMC4141749 DOI: 10.1002/dvg.22794] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Revised: 05/22/2014] [Accepted: 05/23/2014] [Indexed: 01/04/2023]
Abstract
Genes required for an organism to develop to maturity (for which no other gene can compensate) are considered essential. The continuing functional annotation of the mouse genome has enabled the identification of many essential genes required for specific developmental processes including cardiac development. Patterns are now emerging regarding the functional nature of genes required at specific points throughout gestation. Essential genes required for development beyond cardiac progenitor cell migration and induction include a small and functionally homogenous group encoding transcription factors, ligands and receptors. Actions of core cardiogenic transcription factors from the Gata, Nkx, Mef, Hand, and Tbx families trigger a marked expansion in the functional diversity of essential genes from midgestation onwards. As the embryo grows in size and complexity, genes required to maintain a functional heartbeat and to provide muscular strength and regulate blood flow are well represented. These essential genes regulate further specialization and polarization of cell types along with proliferative, migratory, adhesive, contractile, and structural processes. The identification of patterns regarding the functional nature of essential genes across numerous developmental systems may aid prediction of further essential genes and those important to development and/or progression of disease. genesis 52:713–737, 2014.
Collapse
Affiliation(s)
- Christopher Clowes
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester, United Kingdom
| | | | | | | | | | | |
Collapse
|
23
|
Famiglietti ML, Estreicher A, Gos A, Bolleman J, Géhant S, Breuza L, Bridge A, Poux S, Redaschi N, Bougueleret L, Xenarios I. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 2014; 35:927-35. [PMID: 24848695 PMCID: PMC4107114 DOI: 10.1002/humu.22594] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 05/09/2014] [Indexed: 11/25/2022]
Abstract
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
Collapse
Affiliation(s)
- Maria Livia Famiglietti
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Yates CM, Filippis I, Kelley LA, Sternberg MJE. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 2014; 426:2692-701. [PMID: 24810707 PMCID: PMC4087249 DOI: 10.1016/j.jmb.2014.04.026] [Citation(s) in RCA: 168] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 04/23/2014] [Accepted: 04/28/2014] [Indexed: 11/16/2022]
Abstract
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. Bioinformatics approaches are key for identification of disease-causing variants. SAV phenotype prediction can be improved using network information. A method including these features, SuSPect, outperforms tested methods. SuSPect is available to use at www.sbg.bio.ic.ac.uk/suspect.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.
| | - Ioannis Filippis
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
25
|
Gulati S, Cheng TMK, Bates PA. Cancer networks and beyond: interpreting mutations using the human interactome and protein structure. Semin Cancer Biol 2013; 23:219-26. [PMID: 23680723 DOI: 10.1016/j.semcancer.2013.05.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 04/30/2013] [Accepted: 05/03/2013] [Indexed: 01/08/2023]
Abstract
Over recent years, with the advances in next-generation sequencing, a large number of cancer mutations have been identified and accumulated in public repositories. Coupled to this is our increased ability to generate detailed interactome maps that help to enrich our knowledge of the biological implications of cancer mutations. As a result, network analysis approaches have become an invaluable tool to predict and interpret mutations that are associated with tumour survival and progression. Our understanding of cancer mechanisms is further enhanced by mapping protein structure information to such networks. Here we review the current methodologies for annotating the functional impacts of cancer mutations, which range from analysis of protein structures to protein-protein interaction network studies.
Collapse
Affiliation(s)
- Sakshi Gulati
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, United Kingdom
| | | | | |
Collapse
|