1
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
2
|
Wang T, Tian S, Tikhonova EB, Karamyshev AL, Wang JJ, Zhang F, Wang D. The Enrichment of miRNA-Targeted mRNAs in Translationally Less Active over More Active Polysomes. BIOLOGY 2023; 12:1536. [PMID: 38132362 PMCID: PMC10741098 DOI: 10.3390/biology12121536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 12/03/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023]
Abstract
miRNAs moderately inhibit the translation and enhance the degradation of their target mRNAs via cognate binding sites located predominantly in the 3'-untranslated regions (UTR). Paradoxically, miRNA targets are also polysome-associated. We studied the polysome association by the comparative translationally less-active light- and more-active heavy-polysome profiling of a wild type (WT) human cell line and its isogenic mutant (MT) with a disrupted DICER1 gene and, thus, mature miRNA production. As expected, the open reading frame (ORF) length is a major determinant of light- to heavy-polysome mRNA abundance ratios, but is rendered less powerful in WT than in MT cells by miRNA-regulatory activities. We also observed that miRNAs tend to target mRNAs with longer ORFs, and that adjusting the mRNA abundance ratio with the ORF length improves its correlation with the 3'-UTR miRNA-binding-site count. In WT cells, miRNA-targeted mRNAs exhibit higher abundance in light relative to heavy polysomes, i.e., light-polysome enrichment. In MT cells, the DICER1 disruption not only significantly abrogated the light-polysome enrichment, but also narrowed the mRNA abundance ratio value range. Additionally, the abrogation of the enrichment due to the DICER1 gene disruption, i.e., the decreases of the ORF-length-adjusted mRNA abundance ratio from WT to MT cells, exhibits a nearly perfect linear correlation with the 3'-UTR binding-site count. Transcription factors and protein kinases are the top two most enriched mRNA groups. Taken together, the results provide evidence for the light-polysome enrichment of miRNA-targeted mRNAs to reconcile polysome association and moderate translation inhibition, and that ORF length is an important, though currently under-appreciated, transcriptome regulation parameter.
Collapse
Affiliation(s)
- Tingzeng Wang
- Department of Environmental Toxicology, and The Institute of Environmental and Human Health (TIEHH), Texas Tech University, Lubbock, TX 79416, USA; (T.W.); (S.T.)
| | - Shuangmei Tian
- Department of Environmental Toxicology, and The Institute of Environmental and Human Health (TIEHH), Texas Tech University, Lubbock, TX 79416, USA; (T.W.); (S.T.)
| | - Elena B. Tikhonova
- Department of Cell Biology and Biochemistry, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; (E.B.T.); (A.L.K.)
| | - Andrey L. Karamyshev
- Department of Cell Biology and Biochemistry, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; (E.B.T.); (A.L.K.)
| | - Jing J. Wang
- Department of Cancer Biology and Genetics, James Comprehensive Cancer Center, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA;
| | - Fangyuan Zhang
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79416, USA;
| | - Degeng Wang
- Department of Environmental Toxicology, and The Institute of Environmental and Human Health (TIEHH), Texas Tech University, Lubbock, TX 79416, USA; (T.W.); (S.T.)
| |
Collapse
|
3
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
4
|
Huang F, Tang X, Ye B, Wu S, Ding K. PSL-LCCL: a resource for subcellular protein localization in liver cancer cell line SK_HEP1. Database (Oxford) 2022; 2022:6521743. [PMID: 35134877 PMCID: PMC9248857 DOI: 10.1093/database/baab087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/31/2021] [Indexed: 12/19/2022]
Abstract
The characterization of subcellular protein localization provides a basis for further
understanding cellular behaviors. A delineation of subcellular localization of proteins on
cytosolic membrane-bound organelles in human liver cancer cell lines (hLCCLs) has yet to
be performed. To obtain its proteome-wide view, we isolated and enriched six cytosolic
membrane-bound organelles in one of the hLCCLs (SK_HEP1) and quantified their proteins
using mass spectrometry. The vigorous selection of marker proteins and a
machine-learning-based algorithm were implemented to localize proteins at cluster and
neighborhood levels. We validated the performance of the proposed method by comparing the
predicted subcellular protein localization with publicly available resources. The profiles
enabled investigating the correlation of protein domains with their subcellular
localization and colocalization of protein complex members. A subcellular proteome
database for SK_HEP1, including (i) the subcellular protein localization and (ii) the
subcellular locations of protein complex members and their interactions, was constructed.
Our research provides resources for further research on hLCCLs proteomics. Database URL: http://www.igenetics.org.cn/project/PSL-LCCL/
Collapse
Affiliation(s)
| | | | - Bo Ye
- Department of Bioinformatics, School of Basic
Medicine, Chongqing Medical University, #1 Road Yixueyuan, Yuzhong
District, Chongqing 400016, People’s Republic of China
| | - Songfeng Wu
- *Correspondence may also be addressed to Songfeng Wu. Tel:
+8610-61777053; and Keyue Ding. Tel:
+86371-87160116;
| | - Keyue Ding
- *Correspondence may also be addressed to Songfeng Wu. Tel:
+8610-61777053; and Keyue Ding. Tel:
+86371-87160116;
| |
Collapse
|
5
|
Murcia-Garzón J, Méndez-Tenorio A. Promiscuous Domains in Eukaryotes and HAT Proteins in FUNGI Have Followed Different Evolutionary Paths. J Mol Evol 2022; 90:124-138. [PMID: 35084521 DOI: 10.1007/s00239-021-10046-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/27/2021] [Indexed: 10/19/2022]
Abstract
Diverse studies have shown that the content of genes present in sequenced genomes does not seem to correlate with the complexity of the organisms. However, various studies have shown that organism complexity and the size of the proteome has, indeed, a significant correlation. This characteristic allows us to postulate that some molecular mechanisms have permitted a greater functional diversity to some proteins to increase their participation in developing organisms with higher complexity. Among those mechanisms, the domain promiscuity, defined as the ability of the domains to organize in combination with other distinct domains, is of great importance for the evolution of organisms. Previous works have analyzed the degree of domain promiscuity of the proteomes showing how it seems to have paralleled the evolution of eukaryotic organisms. The latter has motivated the present study, where we analyzed the domain promiscuity in a collection of 84 eukaryotic proteomes representative of all the taxonomy groups of the tree of life. Using a grammar definition approach, we determined the architecture of 1,223,227 proteins, conformed by 2,296,371 domains, which established 839,184 bigram types. The phylogenetic reconstructions based on differences in the content of information from measures of proteome promiscuity confirm that the evolution of the promiscuity of domains in eukaryotic organisms resembles the evolutionary history of the species. However, a close analysis of the PHD and RING domains, the most promiscuous domains found in fungi and functional components of chromatin remodeling enzymes and important expression regulators, suggests an evolution according to their function.
Collapse
Affiliation(s)
- Jazmín Murcia-Garzón
- Laboratorio de Biotecnología Vegetal, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Boulevard del Maestro S/N esq. Elías Piña, Col. Narciso Mendoza, 88710, Reynosa, Tamaulipas, Mexico
| | - Alfonso Méndez-Tenorio
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Prol. de Carpio y Plan de Ayala s/n, Col. Santo Tomás, 11340, Mexico City, Mexico.
| |
Collapse
|
6
|
Boolean function metrics can assist modelers to check and choose logical rules. J Theor Biol 2022; 538:111025. [DOI: 10.1016/j.jtbi.2022.111025] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 12/07/2021] [Accepted: 01/10/2022] [Indexed: 12/25/2022]
|
7
|
Coyote-Maestas W, Nedrud D, Suma A, He Y, Matreyek KA, Fowler DM, Carnevale V, Myers CL, Schmidt D. Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling. Nat Commun 2021; 12:7114. [PMID: 34880224 PMCID: PMC8654947 DOI: 10.1038/s41467-021-27342-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
Protein domains are the basic units of protein structure and function. Comparative analysis of genomes and proteomes showed that domain recombination is a main driver of multidomain protein functional diversification and some of the constraining genomic mechanisms are known. Much less is known about biophysical mechanisms that determine whether protein domains can be combined into viable protein folds. Here, we use massively parallel insertional mutagenesis to determine compatibility of over 300,000 domain recombination variants of the Inward Rectifier K+ channel Kir2.1 with channel surface expression. Our data suggest that genomic and biophysical mechanisms acted in concert to favor gain of large, structured domain at protein termini during ion channel evolution. We use machine learning to build a quantitative biophysical model of domain compatibility in Kir2.1 that allows us to derive rudimentary rules for designing domain insertion variants that fold and traffic to the cell surface. Positional Kir2.1 responses to motif insertion clusters into distinct groups that correspond to contiguous structural regions of the channel with distinct biophysical properties tuned towards providing either folding stability or gating transitions. This suggests that insertional profiling is a high-throughput method to annotate function of ion channel structural regions.
Collapse
Affiliation(s)
- Willow Coyote-Maestas
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - David Nedrud
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - Antonio Suma
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Yungui He
- grid.17635.360000000419368657Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455 USA
| | - Kenneth A. Matreyek
- grid.67105.350000 0001 2164 3847Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Douglas M. Fowler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98115 USA ,grid.34477.330000000122986657Department of Bioengineering, University of Washington, Seattle, WA 98115 USA
| | - Vincenzo Carnevale
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Chad L. Myers
- grid.17635.360000000419368657Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
8
|
Weiskittel TM, Ung CY, Correia C, Zhang C, Li H. De novo individualized disease modules reveal the synthetic penetrance of genes and inform personalized treatment regimens. Genome Res 2021; 32:124-134. [PMID: 34876496 PMCID: PMC8744682 DOI: 10.1101/gr.275889.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 11/30/2021] [Indexed: 12/04/2022]
Abstract
Current understandings of individual disease etiology and therapeutics are limited despite great need. To fill the gap, we propose a novel computational pipeline that collects potent disease gene cooperative pathways to envision individualized disease etiology and therapies. Our algorithm constructs individualized disease modules de novo, which enables us to elucidate the importance of mutated genes in specific patients and to understand the synthetic penetrance of these genes across patients. We reveal that importance of the notorious cancer drivers TP53 and PIK3CA fluctuate widely across breast cancers and peak in tumors with distinct numbers of mutations and that rarely mutated genes such as XPO1 and PLEKHA1 have high disease module importance in specific individuals. Furthermore, individualized module disruption enables us to devise customized singular and combinatorial target therapies that were highly varied across patients, showing the need for precision therapeutics pipelines. As the first analysis of de novo individualized disease modules, we illustrate the power of individualized disease modules for precision medicine by providing deep novel insights on the activity of diseased genes in individuals.
Collapse
Affiliation(s)
- Taylor M Weiskittel
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Choong Y Ung
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Cristina Correia
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Cheng Zhang
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Hu Li
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| |
Collapse
|
9
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
10
|
Benigni B, Gallotti R, De Domenico M. Potential-driven random walks on interconnected systems. Phys Rev E 2021; 104:024120. [PMID: 34525567 DOI: 10.1103/physreve.104.024120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 07/12/2021] [Indexed: 12/22/2022]
Abstract
Interconnected systems have to route information to function properly: At the lowest scale neural cells exchange electrochemical signals to communicate, while at larger scales animals and humans move between distinct spatial patches and machines exchange information via the Internet through communication protocols. Nontrivial patterns emerge from the analysis of information flows, which are not captured either by broadcasting, such as in random walks, or by geodesic routing, such as shortest paths. In fact, alternative models between those extreme protocols are still eluding us. Here we propose a class of stochastic processes, based on biased random walks, where agents are driven by a physical potential pervading the underlying network topology. By considering a generalized Coulomb dependence on the distance on destination(s), we show that it is possible to interpolate between random walk and geodesic routing in a simple and effective way. We demonstrate that it is not possible to find a one-size-fit-all solution to efficient navigation and that network heterogeneity or modularity has measurable effects. We illustrate how our framework can describe the movements of animals and humans, capturing with a stylized model some measurable features of the latter. From a methodological perspective, our potential-driven random walks open the doors to a broad spectrum of analytical tools, ranging from random-walk centralities to geometry induced by potential-driven network processes.
Collapse
Affiliation(s)
- Barbara Benigni
- Department of Information Engineering and Computer Science, University of Trento, Via Sommarive, 9, 38123 Povo, Trento, Italy and CoMuNe Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| | - Riccardo Gallotti
- CoMuNe Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| | - Manlio De Domenico
- CoMuNe Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| |
Collapse
|
11
|
Abstract
Domains are the structural, functional and evolutionary units of proteins. They combine to form multidomain proteins. The evolutionary history of this molecular combinatorics has been studied with phylogenomic methods. Here, we construct networks of domain organization and explore their evolution. A time series of networks revealed two ancient waves of structural novelty arising from ancient 'p-loop' and 'winged helix' domains and a massive 'big bang' of domain organization. The evolutionary recruitment of domains was highly modular, hierarchical and ongoing. Domain rearrangements elicited non-random and scale-free network structure. Comparative analyses of preferential attachment, randomness and modularity showed yin-and-yang complementary transition and biphasic patterns along the structural chronology. Remarkably, the evolving networks highlighted a central evolutionary role of cofactor-supporting structures of non-ribosomal peptide synthesis pathways, likely crucial to the early development of the genetic code. Some highly modular domains featured dual response regulation in two-component signal transduction systems with DNA-binding activity linked to transcriptional regulation of responses to environmental change. Interestingly, hub domains across the evolving networks shared the historical role of DNA binding and editing, an ancient protein function in molecular evolution. Our investigation unfolds historical source-sink patterns of evolutionary recruitment that further our understanding of protein architectures and functions.
Collapse
|
12
|
Alon U, Mokryn O, Hershberg U. Using Domain Based Latent Personal Analysis of B Cell Clone Diversity Patterns to Identify Novel Relationships Between the B Cell Clone Populations in Different Tissues. Front Immunol 2021; 12:642673. [PMID: 33868278 PMCID: PMC8047331 DOI: 10.3389/fimmu.2021.642673] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 03/01/2021] [Indexed: 01/10/2023] Open
Abstract
The B cell population is highly diverse and very skewed. It is divided into clones (B cells with a common mother cell). It is thought that each clone represents an initial B cell receptor specificity. A few clones are very abundant, comprised of hundreds or thousands of B cells while the majority have only a few cells per clone. We suggest a novel method - domain-based latent personal analysis (LPA), a method for spectral exploration of entities in a domain, which can be used to find the spectral spread of sub repertoires within a person. LPA defines a domain-based spectral signature for each sub repertoire. LPA signatures consist of the elements, in our case - the clones, that most differentiate the sub repertoire from the person’s abundance of clones. They include both positive elements, which describe overabundant clones, and negative elements that describe missing clones. The signatures can also be used to compare the sub repertoires they represent to each other. Applying LPA to compare the repertoires found in different tissues, we reiterated previous findings that showed that gut and blood tissues have separate repertoires. We further identify a third branch of clonal patterns typical of the lymphatic organs (Spleen, MLN, and bone marrow) separated from the other two categories. We developed a python version of LPA analysis that can easily be applied to compare clonal distributions - https://github.com/ScanLab-ossi/LPA. It could also be easily adapted to study other skewed sequence populations used in the analysis of B cell receptor populations, for instance, k-mers and V gene usage. These analysis types should allow for inter and intra-repertoire comparisons of diversity, which could revolutionize the way we understand repertoire changes and diversity.
Collapse
Affiliation(s)
- Uri Alon
- Department of Human Biology, Faculty of Sciences, University of Haifa, Haifa, Israel
| | - Osnat Mokryn
- Department of Information Systems, Faculty of Social Sciences, University of Haifa, Haifa, Israel
| | - Uri Hershberg
- Department of Human Biology, Faculty of Sciences, University of Haifa, Haifa, Israel
| |
Collapse
|
13
|
Gullotto D. Fine tuned exploration of evolutionary relationships within the protein universe. Stat Appl Genet Mol Biol 2021; 20:17-36. [PMID: 33594839 DOI: 10.1515/sagmb-2019-0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Accepted: 01/12/2021] [Indexed: 11/15/2022]
Abstract
In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.
Collapse
Affiliation(s)
- Danilo Gullotto
- Advanced Computational Biostructural Research Collaboratory, I-95019, Zafferana Etnea, Italy
| |
Collapse
|
14
|
Abstract
Background:
The basic building block of a body is protein which is a complex system
whose structure plays a key role in activation, catalysis, messaging and disease states. Therefore,
careful investigation of protein structure is necessary for the diagnosis of diseases and for the drug
designing. Protein structures are described at their different levels of complexity: primary (chain),
secondary (helical), tertiary (3D), and quaternary structure. Analyzing complex 3D structure of
protein is a difficult task but it can be analyzed as a network of interconnection between its
component, where amino acids are considered as nodes and interconnection between them are
edges.
Objective:
Many literature works have proven that the small world network concept provides
many new opportunities to investigate network of biological systems. The objective of this paper is
analyzing the protein structure using small world concept.
Methods:
Protein is analyzed using small world network concept, specifically where extreme
condition is having a degree distribution which follows power law. For the correct verification of
the proposed approach, dataset of the Oncogene protein structure is analyzed using Python
programming.
Results:
Protein structure is plotted as network of amino acids (Residue Interaction Graph (RIG))
using distance matrix of nodes with given threshold, then various centrality measures (i.e., degree
distribution, Degree-Betweenness correlation, and Betweenness-Closeness correlation) are
calculated for 1323 nodes and graphs are plotted.
Conclusion:
Ultimately, it is concluded that there exist hubs with higher centrality degree but less
in number, and they are expected to be robust toward harmful effects of mutations with new
functions.
Collapse
Affiliation(s)
- Neetu Kumari
- Department of Computer Science, Banaras Hindu University, Varanasi, India
| | - Anshul Verma
- Department of Computer Science, Banaras Hindu University, Varanasi, India
| |
Collapse
|
15
|
Urban Ecosystem Services: A Review of the Knowledge Components and Evolution in the 2010s. SUSTAINABILITY 2020. [DOI: 10.3390/su12239839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In an effort to reconnect urban populations to the biosphere, which is an urgent task to ensure human sustainability, the concept of urban ecosystem services (UES) has recently garnered scholarly and political attention. With an aim to examine the emerging research trends and gaps in UES, we present an up-to-date, computer-based meta-analysis of UES from 2010 to 2019 by implementing a keyword co-occurrence network (KCN) approach. A total of 10,247 author keywords were selected and used to analyze undirected and weighted networks of these keywords. Specifically, power-law distribution fitting was performed to identify overall UES keyword trends, and clusters of keywords were examined to understand micro-level knowledge trends. The knowledge components and structures of UES literature exhibited scale-free network characteristics, which implies that the KCN of the UES throughout the 2010s was dominated by a small number of keywords such as “urbanization”, “land use and land cover”, “urban green space” and “green infrastructure”. Finally, our findings indicate that knowledge of stakeholder involvement and qualitative aspects of UES are not as refined as spatial UES approaches. The implications of these knowledge components and trends are discussed in the context of urban sustainability and policy planning.
Collapse
|
16
|
Knowledge Structures and Components of Rural Resilience in the 2010s: Conceptual Development and Implications. SUSTAINABILITY 2020. [DOI: 10.3390/su12229769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Resilience is being widely adopted as a comprehensive analytical framework for understanding sustainability dynamics, despite the conceptual challenges in developing proxies and indicators for researchers and policy makers. In our study, we observed how the concept of resilience undergoes continued extension within the rural resilience literature. We comprehensively reviewed rural resilience literature using keyword co-occurrence network (KCN) analysis and a systematic review of shortlisted papers. We conducted the KCN analysis for 1186 papers to characterize the state of the rural resilience literature, and systematically reviewed 36 shortlisted papers to further examine how rural resilience analysis and its assessment tools are helping understand the complexity and interdependence of rural social-ecological systems, over three three-year periods from 2010 to 2018. The results show that the knowledge structure built by the high frequency of co-occurrence keywords remains similar over the three-year periods, including climate change, resilience, vulnerability, adaptation, and management, whereas the components of knowledge have greatly expanded, indicating an increased understanding of rural system dynamics. Through the systematic review, we found that developing resilience assessment tools is often designed as a process to strengthen adaptive capacity at the household or community level in response to global processes of climate change and economic globalization. Furthermore, community resilience is found to be an interesting knowledge component that has characterized rural resilience literature in the 2010s. Based on our study, we summarized conceptual characteristics of rural resilience and discussed the challenges and implications for researchers and policy makers.
Collapse
|
17
|
Martino A, Rizzi A. (Hyper)graph Kernels over Simplicial Complexes. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1155. [PMID: 33286924 PMCID: PMC7597323 DOI: 10.3390/e22101155] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 10/02/2020] [Accepted: 10/12/2020] [Indexed: 11/17/2022]
Abstract
Graph kernels are one of the mainstream approaches when dealing with measuring similarity between graphs, especially for pattern recognition and machine learning tasks. In turn, graphs gained a lot of attention due to their modeling capabilities for several real-world phenomena ranging from bioinformatics to social network analysis. However, the attention has been recently moved towards hypergraphs, generalization of plain graphs where multi-way relations (other than pairwise relations) can be considered. In this paper, four (hyper)graph kernels are proposed and their efficiency and effectiveness are compared in a twofold fashion. First, by inferring the simplicial complexes on the top of underlying graphs and by performing a comparison among 18 benchmark datasets against state-of-the-art approaches; second, by facing a real-world case study (i.e., metabolic pathways classification) where input data are natively represented by hypergraphs. With this work, we aim at fostering the extension of graph kernels towards hypergraphs and, more in general, bridging the gap between structural pattern recognition and the domain of hypergraphs.
Collapse
Affiliation(s)
- Alessio Martino
- Department of Information Engineering, Electronics and Telecommunications, University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy;
| | | |
Collapse
|
18
|
Martino A, De Santis E, Giuliani A, Rizzi A. Modelling and Recognition of Protein Contact Networks by Multiple Kernel Learning and Dissimilarity Representations. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E794. [PMID: 33286565 PMCID: PMC7517365 DOI: 10.3390/e22070794] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 07/13/2020] [Accepted: 07/17/2020] [Indexed: 11/26/2022]
Abstract
Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system.
Collapse
Affiliation(s)
- Alessio Martino
- Department of Information Engineering, Electronics and Telecommunications, University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy; (E.D.S.); (A.R.)
| | - Enrico De Santis
- Department of Information Engineering, Electronics and Telecommunications, University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy; (E.D.S.); (A.R.)
| | - Alessandro Giuliani
- Department of Environment and Health, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161 Rome, Italy;
| | - Antonello Rizzi
- Department of Information Engineering, Electronics and Telecommunications, University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy; (E.D.S.); (A.R.)
| |
Collapse
|
19
|
Ferruz N, Lobos F, Lemm D, Toledo-Patino S, Farías-Rico JA, Schmidt S, Höcker B. Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design. J Mol Biol 2020; 432:3898-3914. [PMID: 32330481 PMCID: PMC7322520 DOI: 10.1016/j.jmb.2020.04.013] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/15/2022]
Abstract
Natural evolution has generated an impressively diverse protein universe via duplication and recombination from a set of protein fragments that served as building blocks. The application of these concepts to the design of new proteins using subdomain-sized fragments from different folds has proven to be experimentally successful. To better understand how evolution has shaped our protein universe, we performed an all-against-all comparison of protein domains representing all naturally existing folds and identified conserved homologous protein fragments. Overall, we found more than 1000 protein fragments of various lengths among different folds through similarity network analysis. These fragments are present in very different protein environments and represent versatile building blocks for protein design. These data are available in our web server called F(old P)uzzle (fuzzle.uni-bayreuth.de), which allows to individually filter the dataset and create customized networks for folds of interest. We believe that our results serve as an invaluable resource for structural and evolutionary biologists and as raw material for the design of custom-made proteins.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dominik Lemm
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Saacnicteh Toledo-Patino
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | - Steffen Schmidt
- Max Planck Institute for Developmental Biology, Tübingen, Germany; Computational Biochemistry, University of Bayreuth, Bayreuth, Germany.
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany.
| |
Collapse
|
20
|
Hernandez-Guerrero R, Galán-Vásquez E, Pérez-Rueda E. The protein architecture in Bacteria and Archaea identifies a set of promiscuous and ancient domains. PLoS One 2019; 14:e0226604. [PMID: 31856202 PMCID: PMC6922389 DOI: 10.1371/journal.pone.0226604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 11/29/2019] [Indexed: 11/19/2022] Open
Abstract
In this work, we describe a systematic comparative genomic analysis of promiscuous domains in genomes of Bacteria and Archaea. A quantitative measure of domain promiscuity, the weighted domain architecture score (WDAS), was used and applied to 1317 domains in 1320 genomes of Bacteria and Archaea. A functional analysis associated with the WDAS per genome showed that 18 of 50 functional categories were identified as significantly enriched in the promiscuous domains; in particular, small-molecule binding domains, transferases domains, DNA binding domains (transcription factors), and signal transduction domains were identified as promiscuous. In contrast, non-promiscuous domains were identified as associated with 6 of 50 functional categories, and the category Function unknown was enriched. In addition, the WDASs of 52 domains correlated with genome size, i.e., WDAS values decreased as the genome size increased, suggesting that the number of combinations at larger domains increases, including domains in the superfamilies Winged helix-turn-helix and P-loop-containing nucleoside triphosphate hydrolases. Finally, based on classification of the domains according to their ancestry, we determined that the set of 52 promiscuous domains are also ancient and abundant among all the genomes, in contrast to the non-promiscuous domains. In summary, we consider that the association between these two classes of protein domains (promiscuous and non-promiscuous) provides bacterial and archaeal cells with the ability to respond to diverse environmental challenges.
Collapse
Affiliation(s)
- Rafael Hernandez-Guerrero
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
| | - Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Ciudad Universitaria, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Ernesto Pérez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- * E-mail:
| |
Collapse
|
21
|
Martino A, Giuliani A, Todde V, Bizzarri M, Rizzi A. Metabolic networks classification and knowledge discovery by information granulation. Comput Biol Chem 2019; 84:107187. [PMID: 31923821 DOI: 10.1016/j.compbiolchem.2019.107187] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 10/21/2019] [Accepted: 11/28/2019] [Indexed: 01/11/2023]
Abstract
Graphs are powerful structures able to capture topological and semantic information from data, hence suitable for modelling a plethora of real-world (complex) systems. For this reason, graph-based pattern recognition gained a lot of attention in recent years. In this paper, a general-purpose classification system in the graphs domain is presented. When most of the information of the available patterns can be encoded in edge labels, an information granulation-based approach is highly discriminant and allows for the identification of semantically meaningful edges. The proposed classification system has been tested on the entire set of organisms (5299) for which metabolic networks are known, allowing for both a perfect mirroring of the underlying taxonomy and the identification of most discriminant metabolic reactions and pathways. The widespread diffusion of graph (network) structures in biology makes the proposed pattern recognition approach potentially very useful in many different fields of application. More specifically, the possibility to have a reliable metric to compare different metabolic systems is instrumental in emerging fields like microbiome analysis and, more in general, for proposing metabolic networks as a universal phenotype spanning the entire tree of life and in direct contact with environmental cues.
Collapse
Affiliation(s)
- Alessio Martino
- Department of Information Engineering, Electronics and Telecommunications, University of Rome "La Sapienza" - Via Eudossiana 18, 00184 Rome, Italy.
| | - Alessandro Giuliani
- Department of Environment and Health, Istituto Superiore di Sanità - Viale Regina Elena 299, 00161 Rome, Italy.
| | - Virginia Todde
- Department of Environment and Health, Istituto Superiore di Sanità - Viale Regina Elena 299, 00161 Rome, Italy.
| | - Mariano Bizzarri
- Department of Experimental Medicine, University of Rome "La Sapienza", Systems Biology Group Lab, Rome, Italy.
| | - Antonello Rizzi
- Department of Information Engineering, Electronics and Telecommunications, University of Rome "La Sapienza" - Via Eudossiana 18, 00184 Rome, Italy.
| |
Collapse
|
22
|
Abstract
This paper investigates a novel graph embedding procedure based on simplicial complexes. Inherited from algebraic topology, simplicial complexes are collections of increasing-order simplices (e.g., points, lines, triangles, tetrahedrons) which can be interpreted as possibly meaningful substructures (i.e., information granules) on the top of which an embedding space can be built by means of symbolic histograms. In the embedding space, any Euclidean pattern recognition system can be used, possibly equipped with feature selection capabilities in order to select the most informative symbols. The selected symbols can be analysed by field-experts in order to extract further knowledge about the process to be modelled by the learning system, hence the proposed modelling strategy can be considered as a grey-box. The proposed embedding has been tested on thirty benchmark datasets for graph classification and, further, we propose two real-world applications, namely predicting proteins’ enzymatic function and solubility propensity starting from their 3D structure in order to give an example of the knowledge discovery phase which can be carried out starting from the proposed embedding strategy.
Collapse
|
23
|
Hadriche A, Jmail N, Blanc JL, Pezard L. Using centrality measures to extract core pattern of brain dynamics during the resting state. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 179:104985. [PMID: 31443863 DOI: 10.1016/j.cmpb.2019.104985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2018] [Revised: 07/10/2019] [Accepted: 07/13/2019] [Indexed: 06/10/2023]
Abstract
The patterns of brain dynamics were studied during resting state on a macroscopic scale for control subjects and multiple sclerosis patients. Macroscopic brain dynamics is defined after successive coarse-grainings and selection of significant patterns and transitions based on Markov representation of brain activity. The resulting networks show that control dynamics is merely organized according to a single principal pattern whereas patients dynamics depict more variable patterns. Centrality measures are used to extract core dynamical pattern in brain dynamics and classification technique allow to define MS dynamics with relevant error rate.
Collapse
Affiliation(s)
- Abir Hadriche
- Université de Sfax, ENIS, REGIM Lab, Sfax, Tunisie; Université de Gabes, ISIMG, Gabes, Tunisie; Université de Sfax, Centre de Recherche Numérique de Sfax, Sfax, Tunisie.
| | - Nawel Jmail
- Université de Sfax, Centre de Recherche Numérique de Sfax, Sfax, Tunisie; Université de Sfax, MIRACL, Sfax, Tunisie.
| | - Jean-Luc Blanc
- Aix-Marseille Université, CNRS, LNSC UMR 7260, 3 Place Victor Hugo, Marseille 13003, France.
| | - Laurent Pezard
- Aix-Marseille Université, CNRS, LNSC UMR 7260, 3 Place Victor Hugo, Marseille 13003, France.
| |
Collapse
|
24
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
25
|
SHINDE SNEHALB, KURHEKAR MANISHP. COMPLEX BIOLOGICAL IMMUNE SYSTEM THROUGH THE EYES OF DUAL-PHASE EVOLUTION. J BIOL SYST 2018. [DOI: 10.1142/s0218339018500213] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Dual-phase evolution (DPE) and the network theory help to analyze prominent properties of the complex adaptive systems (CASs) such as emergence and self-organization that are caused due to the phase transitions. These transitions are observed because of the increase and decrease in the number of system components and their interactions. The immune system, which is one of the CASs, provides an adaptive response to the foreign molecules. Prior to this response, the immune system is present in the circulation state and during the response, it moves into the growth state, where the number of immune cells and their cell–cell contacts increase rapidly. The phase transitions from the circulation state to the growth state and then back to the circulation state cause the emergence and self-organization of the immune system, respectively. There is a need to understand these complex cellular dynamics during the immune response. In this paper, we have proposed an integrated model of DPE, network theory, and the immune system that has helped to understand and analyze the phases and properties of the immune system. Analysis of the growth phase network is provided and it is concluded that this network exhibits scale-free nature following power law for the degree distribution of nodes.
Collapse
Affiliation(s)
- SNEHAL B. SHINDE
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, Maharashtra, India
| | - MANISH P. KURHEKAR
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, Maharashtra, India
| |
Collapse
|
26
|
Abstract
The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension Df was distance-dependent: a high dimension for single and double mutants (Df = 4.0), which dropped to Df = 0.7-1.0 at 90% sequence identity, and increased to Df = 3.5-4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.
Collapse
|
27
|
Functional Analysis of Human Hub Proteins and Their Interactors Involved in the Intrinsic Disorder-Enriched Interactions. Int J Mol Sci 2017; 18:ijms18122761. [PMID: 29257115 PMCID: PMC5751360 DOI: 10.3390/ijms18122761] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/13/2017] [Accepted: 12/15/2017] [Indexed: 12/15/2022] Open
Abstract
Some of the intrinsically disordered proteins and protein regions are promiscuous interactors that are involved in one-to-many and many-to-one binding. Several studies have analyzed enrichment of intrinsic disorder among the promiscuous hub proteins. We extended these works by providing a detailed functional characterization of the disorder-enriched hub protein-protein interactions (PPIs), including both hubs and their interactors, and by analyzing their enrichment among disease-associated proteins. We focused on the human interactome, given its high degree of completeness and relevance to the analysis of the disease-linked proteins. We quantified and investigated numerous functional and structural characteristics of the disorder-enriched hub PPIs, including protein binding, structural stability, evolutionary conservation, several categories of functional sites, and presence of over twenty types of posttranslational modifications (PTMs). We showed that the disorder-enriched hub PPIs have a significantly enlarged number of disordered protein binding regions and long intrinsically disordered regions. They also include high numbers of targeting, catalytic, and many types of PTM sites. We empirically demonstrated that these hub PPIs are significantly enriched among 11 out of 18 considered classes of human diseases that are associated with at least 100 human proteins. Finally, we also illustrated how over a dozen specific human hubs utilize intrinsic disorder for their promiscuous PPIs.
Collapse
|
28
|
Wang X, Zhang Q, Cai Z, Dai Y, Mou L. Identification of novel diagnostic biomarkers for thyroid carcinoma. Oncotarget 2017; 8:111551-111566. [PMID: 29340074 PMCID: PMC5762342 DOI: 10.18632/oncotarget.22873] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 11/19/2017] [Indexed: 12/30/2022] Open
Abstract
Thyroid carcinoma (THCA) is the most universal endocrine malignancy worldwide. Unfortunately, a limited number of large-scale analyses have been performed to identify biomarkers for THCA. Here, we conducted a meta-analysis using 505 THCA patients and 59 normal controls from The Cancer Genome Atlas. After identifying differentially expressed long non-coding RNA (lncRNA) and protein coding genes (PCG), we found vast difference in various lncRNA-PCG co-expressed pairs in THCA. A dysregulation network with scale-free topology was constructed. Four molecules (LA16c-380H5.2, RP11-203J24.8, MLF1 and SDC4) could potentially serve as diagnostic biomarkers of THCA with high sensitivity and specificity. We further represent a diagnostic panel with expression cutoff values. Our results demonstrate the potential application of those four molecules as novel independent biomarkers for THCA diagnosis.
Collapse
Affiliation(s)
- Xiliang Wang
- Shenzhen Xenotransplantation Medical Engineering Research and Development Center, Institute of Translational Medicine, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China.,Department of Biochemistry in Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Qing Zhang
- Shenzhen Xenotransplantation Medical Engineering Research and Development Center, Institute of Translational Medicine, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Zhiming Cai
- Shenzhen Xenotransplantation Medical Engineering Research and Development Center, Institute of Translational Medicine, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Yifan Dai
- Jiangsu Key Laboratory of Xenotransplantation, Nanjing Medical University, Nanjing 210029, China
| | - Lisha Mou
- Shenzhen Xenotransplantation Medical Engineering Research and Development Center, Institute of Translational Medicine, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| |
Collapse
|
29
|
Abstract
The study of evolutionary relationships among protein sequences was one of the first applications of bioinformatics. Since then, and accompanying the wealth of biological data produced by genome sequencing and other high-throughput techniques, the use of bioinformatics in general and phylogenetics in particular has been gaining ground in the study of protein and proteome evolution. Nowadays, the use of phylogenetics is instrumental not only to infer the evolutionary relationships among species and their genome sequences, but also to reconstruct ancestral states of proteins and proteomes and hence trace the paths followed by evolution. Here I survey recent progress in the elucidation of mechanisms of protein and proteome evolution in which phylogenetics has played a determinant role.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics Department, Centro de Investigación Principe Felipe
| |
Collapse
|
30
|
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017; 114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
Collapse
|
31
|
Ecogenomics of virophages and their giant virus hosts assessed through time series metagenomics. Nat Commun 2017; 8:858. [PMID: 29021524 PMCID: PMC5636890 DOI: 10.1038/s41467-017-01086-2] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 08/16/2017] [Indexed: 11/30/2022] Open
Abstract
Virophages are small viruses that co-infect eukaryotic cells alongside giant viruses (Mimiviridae) and hijack their machinery to replicate. While two types of virophages have been isolated, their genomic diversity and ecology remain largely unknown. Here we use time series metagenomics to identify and study the dynamics of 25 uncultivated virophage populations, 17 of which represented by complete or near-complete genomes, in two North American freshwater lakes. Taxonomic analysis suggests that these freshwater virophages represent at least three new candidate genera. Ecologically, virophage populations are repeatedly detected over years and evolutionary stable, yet their distinct abundance profiles and gene content suggest that virophage genera occupy different ecological niches. Co-occurrence analyses reveal 11 virophages strongly associated with uncultivated Mimiviridae, and three associated with eukaryotes among the Dinophyceae, Rhizaria, Alveolata, and Cryptophyceae groups. Together, these findings significantly augment virophage databases, help refine virophage taxonomy, and establish baseline ecological hypotheses and tools to study virophages in nature. Virophages are recently-identified small viruses that infect larger viruses, yet their diversity and ecological roles are poorly understood. Here, Roux and colleagues present time series metagenomics data revealing new virophage genera and their putative ecological interactions in two freshwater lakes.
Collapse
|
32
|
Hernandez C, Mella C, Navarro G, Olivera-Nappa A, Araya J. Protein complex prediction via dense subgraphs and false positive analysis. PLoS One 2017; 12:e0183460. [PMID: 28937982 PMCID: PMC5609739 DOI: 10.1371/journal.pone.0183460] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 08/04/2017] [Indexed: 01/04/2023] Open
Abstract
Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
Collapse
Affiliation(s)
- Cecilia Hernandez
- Computer Science, University of Concepción, Concepción, Chile
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
- * E-mail:
| | - Carlos Mella
- Computer Science, University of Concepción, Concepción, Chile
| | - Gonzalo Navarro
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
| | - Alvaro Olivera-Nappa
- Center for Biotechnology and Bioengineering (CeBiB), Department of Chemical Engineering and Biotechnology, University of Chile, Santiago, Chile
| | - Jaime Araya
- Computer Science, University of Concepción, Concepción, Chile
| |
Collapse
|
33
|
Du X, Sun S, Hu C, Yao Y, Yan Y, Zhang Y. DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks. J Chem Inf Model 2017; 57:1499-1510. [PMID: 28514151 DOI: 10.1021/acs.jcim.7b00028] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many proteins variants statistically associated with human disease, nearly all such variants have unknown mechanisms, for example, protein-protein interactions (PPIs). In this study, we address this challenge using a recent machine learning advance-deep neural networks (DNNs). We aim at improving the performance of PPIs prediction and propose a method called DeepPPI (Deep neural networks for Protein-Protein Interactions prediction), which employs deep neural networks to learn effectively the representations of proteins from common protein descriptors. The experimental results indicate that DeepPPI achieves superior performance on the test data set with an Accuracy of 92.50%, Precision of 94.38%, Recall of 90.56%, Specificity of 94.49%, Matthews Correlation Coefficient of 85.08% and Area Under the Curve of 97.43%, respectively. Extensive experiments show that DeepPPI can learn useful features of proteins pairs by a layer-wise abstraction, and thus achieves better prediction performance than existing methods. The source code of our approach can be available via http://ailab.ahu.edu.cn:8087/DeepPPI/index.html .
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, ‡School of Computer Science and Technology, and §Center of Information Support & Assurance Technology, Anhui University , Hefei, 230601 Anhui, China
| | - Shiwei Sun
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, ‡School of Computer Science and Technology, and §Center of Information Support & Assurance Technology, Anhui University , Hefei, 230601 Anhui, China
| | - Changlin Hu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, ‡School of Computer Science and Technology, and §Center of Information Support & Assurance Technology, Anhui University , Hefei, 230601 Anhui, China
| | - Yu Yao
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, ‡School of Computer Science and Technology, and §Center of Information Support & Assurance Technology, Anhui University , Hefei, 230601 Anhui, China
| | - Yuanting Yan
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, ‡School of Computer Science and Technology, and §Center of Information Support & Assurance Technology, Anhui University , Hefei, 230601 Anhui, China
| | - Yanping Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, ‡School of Computer Science and Technology, and §Center of Information Support & Assurance Technology, Anhui University , Hefei, 230601 Anhui, China
| |
Collapse
|
34
|
Srivastava A, Mazzocco G, Kel A, Wyrwicz LS, Plewczynski D. Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein-protein interactions using machine learning methods. MOLECULAR BIOSYSTEMS 2016; 12:778-85. [PMID: 26738778 DOI: 10.1039/c5mb00672d] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Protein-protein interactions (PPIs) play a vital role in most biological processes. Hence their comprehension can promote a better understanding of the mechanisms underlying living systems. However, besides the cost and the time limitation involved in the detection of experimentally validated PPIs, the noise in the data is still an important issue to overcome. In the last decade several in silico PPI prediction methods using both structural and genomic information were developed for this purpose. Here we introduce a unique validation approach aimed to collect reliable non interacting proteins (NIPs). Thereafter the most relevant protein/protein-pair related features were selected. Finally, the prepared dataset was used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96.33% and 98.02%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances can be considerably improved by focusing on data preparation.
Collapse
Affiliation(s)
- A Srivastava
- Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - G Mazzocco
- Centre of New Technologies, University of Warsaw, Banacha 2c Str., 02-097 Warsaw, Poland. and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
| | - A Kel
- GeneXplain GmbH, Am Exer 10b, D-38302, Wolfenbüttel, Germany
| | - L S Wyrwicz
- Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - D Plewczynski
- Centre of New Technologies, University of Warsaw, Banacha 2c Str., 02-097 Warsaw, Poland.
| |
Collapse
|
35
|
Guo JL, Zhu XY, Suo Q, Forrest J. Non-uniform Evolving Hypergraphs and Weighted Evolving Hypergraphs. Sci Rep 2016; 6:36648. [PMID: 27845334 PMCID: PMC5109229 DOI: 10.1038/srep36648] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 10/19/2016] [Indexed: 11/09/2022] Open
Abstract
Firstly, this paper proposes a non-uniform evolving hypergraph model with nonlinear preferential attachment and an attractiveness. This model allows nodes to arrive in batches according to a Poisson process and to form hyperedges with existing batches of nodes. Both the number of arriving nodes and that of chosen existing nodes are random variables so that the size of each hyperedge is non-uniform. This paper establishes the characteristic equation of hyperdegrees, calculates changes in the hyperdegree of each node, and obtains the stationary average hyperdegree distribution of the model by employing the Poisson process theory and the characteristic equation. Secondly, this paper constructs a model for weighted evolving hypergraphs that couples the establishment of new hyperedges, nodes and the dynamical evolution of the weights. Furthermore, what is obtained are respectively the stationary average hyperdegree and hyperstrength distributions by using the hyperdegree distribution of the established unweighted model above so that the weighted evolving hypergraph exhibits a scale-free behavior for both hyperdegree and hyperstrength distributions.
Collapse
Affiliation(s)
- Jin-Li Guo
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, PR China
| | - Xin-Yun Zhu
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, PR China
| | - Qi Suo
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, PR China
| | - Jeffrey Forrest
- School of Business, Slippery Rock University, Slippery Rock, PA 16057, USA
| |
Collapse
|
36
|
Aslan MS, Chen XW, Cheng H. Analyzing and learning sparse and scale-free networks using Gaussian graphical models. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2016. [DOI: 10.1007/s41060-016-0009-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
37
|
Mohan A, De Ridder D, Vanneste S. Robustness and dynamicity of functional networks in phantom sound. Neuroimage 2016; 146:171-187. [PMID: 27103139 DOI: 10.1016/j.neuroimage.2016.04.033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Revised: 03/30/2016] [Accepted: 04/14/2016] [Indexed: 01/12/2023] Open
Abstract
Phantom sound perception is the perception of a sound in the absence of a corresponding external sound source. It is a common symptom for which no treatment exists. Gaining a better understanding of its pathophysiology by applying network science might help in identifying targets in the brain for neuromodulatory approaches to treat this elusive symptom. Brain networks are commonly organized as functional modules which have a densely connected core network coupled to a communally-organized peripheral network. The core network is called the rich club network and the peripheral network is divided into the feeder and local networks. In current study, we investigate the effects of virtual lesions on the endogenous dynamics, complexity and robustness of the remaining brain. It is hypothesized that depending on whether nodes is functionally central to the network or not, the robustness and dynamics of the network change when a lesion in introduced. We therefore investigate the effect of introducing a virtual focal lesion randomly to different nodes is in the tinnitus network and contrast it to the effect of specifically targeting the nodes of the rich-club, feeder and local nodes in patients experiencing a phantom sound (i.e. tinnitus). The tinnitus and control networks were computed from the source-localized EEG of 311 tinnitus patients and 256 control subjects. The results of the current study indicate that both the tinnitus and control networks are robust to the attack on random and rich club nodes, but are drastically modified when attacked from the periphery, especially while targeting the feeder hubs. In both the tinnitus and control networks, feeder nodes were found to have a higher betweenness centrality value than the rich club nodes. This shows that the feeders have a larger influence on the information transmission through the brain than the rich club nodes, by transferring information from the peripheral communities to the core. Further, evidence for the theoretical model of a multimodal tinnitus network is also presented showing that the tinnitus network is divided into individual, separable modules each possibly encoding a different aspect of tinnitus. The current study alludes to the concept that the efficient modification of the tinnitus network is theoretically possible by disconnecting the individual communities from the core of the pathological network.
Collapse
Affiliation(s)
- Anusha Mohan
- Lab for Clinical & Integrative Neuroscience, School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA
| | - Dirk De Ridder
- Department of Surgical Sciences, Section of Neurosurgery, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | - Sven Vanneste
- Lab for Clinical & Integrative Neuroscience, School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA.
| |
Collapse
|
38
|
Emerson AI, Andrews S, Ahmed I, Azis TK, Malek JA. K-core decomposition of a protein domain co-occurrence network reveals lower cancer mutation rates for interior cores. J Clin Bioinforma 2015; 5:1. [PMID: 25767694 PMCID: PMC4357223 DOI: 10.1186/s13336-015-0016-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 02/18/2015] [Indexed: 11/10/2022] Open
Abstract
Background Network biology currently focuses primarily on metabolic pathways, gene regulatory, and protein-protein interaction networks. While these approaches have yielded critical information, alternative methods to network analysis will offer new perspectives on biological information. A little explored area is the interactions between domains that can be captured using domain co-occurrence networks (DCN). A DCN can be used to study the function and interaction of proteins by representing protein domains and their co-existence in genes and by mapping cancer mutations to the individual protein domains to identify signals. Results The domain co-occurrence network was constructed for the human proteome based on PFAM domains in proteins. Highly connected domains in the central cores were identified using the k-core decomposition technique. Here we show that these domains were found to be more evolutionarily conserved than the peripheral domains. The somatic mutations for ovarian, breast and prostate cancer diseases were obtained from the TCGA database. We mapped the somatic mutations to the individual protein domains and the local false discovery rate was used to identify significantly mutated domains in each cancer type. Significantly mutated domains were found to be enriched in cancer disease pathways. However, we found that the inner cores of the DCN did not contain any of the significantly mutated domains. We observed that the inner core protein domains are highly conserved and these domains co-exist in large numbers with other protein domains. Conclusion Mutations and domain co-occurrence networks provide a framework for understanding hierarchal designs in protein function from a network perspective. This study provides evidence that a majority of protein domains in the inner core of the DCN have a lower mutation frequency and that protein domains present in the peripheral regions of the k-core contribute more heavily to the disease. These findings may contribute further to drug development. Electronic supplementary material The online version of this article (doi:10.1186/s13336-015-0016-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arnold I Emerson
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY USA ; Genomic Core, Weill Cornell Medical College in Qatar, Qatar Foundation, Doha, 24144 Qatar
| | - Simeon Andrews
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY USA ; Genomic Core, Weill Cornell Medical College in Qatar, Qatar Foundation, Doha, 24144 Qatar
| | - Ikhlak Ahmed
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY USA ; Genomic Core, Weill Cornell Medical College in Qatar, Qatar Foundation, Doha, 24144 Qatar
| | - Thasni Ka Azis
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY USA ; Genomic Core, Weill Cornell Medical College in Qatar, Qatar Foundation, Doha, 24144 Qatar
| | - Joel A Malek
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY USA ; Genomic Core, Weill Cornell Medical College in Qatar, Qatar Foundation, Doha, 24144 Qatar
| |
Collapse
|
39
|
Shen X, Zhao Y, Li Y, He T, Yang J, Hu X. An efficient protein complex mining algorithm based on Multistage Kernel Extension. BMC Bioinformatics 2014; 15 Suppl 12:S7. [PMID: 25474367 PMCID: PMC4255745 DOI: 10.1186/1471-2105-15-s12-s7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background In recent years, many protein complex mining algorithms, such as classical clique percolation (CPM) method and markov clustering (MCL) algorithm, have developed for protein-protein interaction network. However, most of the available algorithms primarily concentrate on mining dense protein subgraphs as protein complexes, failing to take into account the inherent organizational structure within protein complexes. Thus, there is a critical need to study the possibility of mining protein complexes using the topological information hidden in edges. Moreover, the recent massive experimental analyses reveal that protein complexes have their own intrinsic organization. Methods Inspired by the formation process of cliques of the complex social network and the centrality-lethality rule, we propose a new protein complex mining algorithm called Multistage Kernel Extension (MKE) algorithm, integrating the idea of critical proteins recognition in the Protein- Protein Interaction (PPI) network,. MKE first recognizes the nodes with high degree as the first level kernel of protein complex, and then adds the weighted best neighbour node of the first level kernel into the current kernel to form the second level kernel of the protein complex. This process is repeated, extending the current kernel to form protein complex. In the end, overlapped protein complexes are merged to form the final protein complex set. Results Here MKE has better accuracy compared with the classical clique percolation method and markov clustering algorithm. MKE also performs better than the classical clique percolation method both on Gene Ontology semantic similarity and co-localization enrichment and can effectively identify protein complexes with biological significance in the PPI network.
Collapse
|
40
|
Ben-Tal N, Kolodny R. Representation of the Protein Universe using Classifications, Maps, and Networks. Isr J Chem 2014. [DOI: 10.1002/ijch.201400001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
41
|
Abstract
To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the "domain network," nodes represent domains, and edges connect domains that share "motifs," i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as "continuous" versus "discrete." The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the "motif network," in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.
Collapse
|
42
|
Guo Z, Jiang W, Lages N, Borcherds W, Wang D. Relationship between gene duplicability and diversifiability in the topology of biochemical networks. BMC Genomics 2014; 15:577. [PMID: 25005725 PMCID: PMC4129122 DOI: 10.1186/1471-2164-15-577] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 06/26/2014] [Indexed: 01/21/2023] Open
Abstract
Background Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes. Results Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene’s duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes – the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family. Conclusion Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks – an improvement of our understanding of gene duplicability.
Collapse
Affiliation(s)
| | | | | | | | - Degeng Wang
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA.
| |
Collapse
|
43
|
Haggerty LS, Jachiet PA, Hanage WP, Fitzpatrick DA, Lopez P, O'Connell MJ, Pisani D, Wilkinson M, Bapteste E, McInerney JO. A pluralistic account of homology: adapting the models to the data. Mol Biol Evol 2013; 31:501-16. [PMID: 24273322 PMCID: PMC3935183 DOI: 10.1093/molbev/mst228] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Defining homologous genes is important in many evolutionary studies but raises obvious issues. Some of these issues are conceptual and stem from our assumptions of how a gene evolves, others are practical, and depend on the algorithmic decisions implemented in existing software. Therefore, to make progress in the study of homology, both ontological and epistemological questions must be considered. In particular, defining homologous genes cannot be solely addressed under the classic assumptions of strong tree thinking, according to which genes evolve in a strictly tree-like fashion of vertical descent and divergence and the problems of homology detection are primarily methodological. Gene homology could also be considered under a different perspective where genes evolve as “public goods,” subjected to various introgressive processes. In this latter case, defining homologous genes becomes a matter of designing models suited to the actual complexity of the data and how such complexity arises, rather than trying to fit genetic data to some a priori tree-like evolutionary model, a practice that inevitably results in the loss of much information. Here we show how important aspects of the problems raised by homology detection methods can be overcome when even more fundamental roots of these problems are addressed by analyzing public goods thinking evolutionary processes through which genes have frequently originated. This kind of thinking acknowledges distinct types of homologs, characterized by distinct patterns, in phylogenetic and nonphylogenetic unrooted or multirooted networks. In addition, we define “family resemblances” to include genes that are related through intermediate relatives, thereby placing notions of homology in the broader context of evolutionary relationships. We conclude by presenting some payoffs of adopting such a pluralistic account of homology and family relationship, which expands the scope of evolutionary analyses beyond the traditional, yet relatively narrow focus allowed by a strong tree-thinking view on gene evolution.
Collapse
Affiliation(s)
- Leanne S Haggerty
- Bioinformatics and Molecular Evolution Unit, Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Keller DB, Schultz J. Connectivity, not frequency, determines the fate of a morpheme. PLoS One 2013; 8:e69945. [PMID: 23922865 PMCID: PMC3726735 DOI: 10.1371/journal.pone.0069945] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Accepted: 06/18/2013] [Indexed: 11/25/2022] Open
Abstract
Morphemes are the smallest meaningful parts of words and therefore represent a natural unit to study the evolution of words. To analyze the influence of language change on morphemes, we performed a large scale analysis of German and English vocabulary covering the last 200 years. Using a network approach from bioinformatics, we examined the historical dynamics of morphemes, the fixation of new morphemes and the emergence of words containing existing morphemes. We found that these processes are driven mainly by the number of different direct neighbors of a morpheme in words (connectivity, an equivalent to family size or type frequency) and not its frequency of usage (equivalent to token frequency). This contrasts words, whose survival is determined by their frequency of usage. We therefore identified features of morphemes which are not dictated by the statistical properties of words. As morphemes are also relevant for the mental representation of words, this result might enable establishing a link between an individual's perception of language and historical language change.
Collapse
Affiliation(s)
| | - Jörg Schultz
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| |
Collapse
|
45
|
A novel function prediction approach using protein overlap networks. BMC SYSTEMS BIOLOGY 2013; 7:61. [PMID: 23866986 PMCID: PMC3720179 DOI: 10.1186/1752-0509-7-61] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 07/12/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database. RESULTS The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein. CONCLUSIONS The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.
Collapse
|
46
|
Todor A, Dobra A, Kahveci T. Characterizing the topology of probabilistic biological networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:970-983. [PMID: 24334390 DOI: 10.1109/tcbb.2013.108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
UNLABELLED Biological interactions are often uncertain events, that may or may not take place with some probability. This uncertainty leads to a massive number of alternative interaction topologies for each such network. The existing studies analyze the degree distribution of biological networks by assuming that all the given interactions take place under all circumstances. This strong and often incorrect assumption can lead to misleading results. In this paper, we address this problem and develop a sound mathematical basis to characterize networks in the presence of uncertain interactions. Using our mathematical representation, we develop a method that can accurately describe the degree distribution of such networks. We also take one more step and extend our method to accurately compute the joint-degree distributions of node pairs connected by edges. The number of possible network topologies grows exponentially with the number of uncertain interactions. However, the mathematical model we develop allows us to compute these degree distributions in polynomial time in the number of interactions. Our method works quickly even for entire protein-protein interaction (PPI) networks. It also helps us find an adequate mathematical model using MLE. We perform a comparative study of node-degree and joint-degree distributions in two types of biological networks: the classical deterministic networks and the more flexible probabilistic networks. Our results confirm that power-law and log-normal models best describe degree distributions for both probabilistic and deterministic networks. Moreover, the inverse correlation of degrees of neighboring nodes shows that, in probabilistic networks, nodes with large number of interactions prefer to interact with those with small number of interactions more frequently than expected. We also show that probabilistic networks are more robust for node-degree distribution computation than the deterministic ones. AVAILABILITY all the data sets used, the software implemented and the alignments found in this paper are available at http://bioinformatics.cise.ufl.edu/projects/probNet/.
Collapse
|
47
|
Cloninger CR. Person-centered Health Promotion in Chronic Disease. INTERNATIONAL JOURNAL OF PERSON CENTERED MEDICINE 2013; 3:5-12. [PMID: 26339469 PMCID: PMC4556425 DOI: 10.5750/ijpcm.v3i1.379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Health promotion must be person-centered, not organ- or disease-centered, in order to be effective because physical, mental, social, and spiritual aspects of human functioning are inextricably intertwined. Chronic medical disorders, such as heart disease, chronic obstructive pulmonary disease, diabetes, cancer, asthma, and arthritis, are strongly associated with immature personality, emotional instability, and social dysfunction. All indicators of physical, mental, and social well-being are strongly related to the level of maturity and integration of personality, so personality is a useful focus for the promotion of well-being. Assessment of personality also facilitates the awareness of the clinician and the patient about the patient's strengths, weaknesses, and goals, thereby contributing to an effective therapeutic alliance. Health, well-being, resilience, and recovery of function all involve increasing levels of the character traits of Self-directedness, Cooperativeness, and Self-transcendence. Person-centered programs that enhance self-regulation of functioning to achieve personally valued goals improve compliance with medical treatment and quality of life in people with chronic disease. Effective therapeutic approaches to health promotion activate a complex adaptive system of feedback interactions among functioning, plasticity, and virtuous ways of thinking and acting. The probability of personality change can be predicted by high levels of Self-transcendence, which give rise to an outlook of unity and connectedness, particularly when combined with the temperament traits of high Novelty Seeking and high Persistence. In summary, person-centered psychobiological treatments that facilitate the development of well-being and personality development are crucial in the prevention, treatment, and rehabilitation of chronic medical diseases.
Collapse
Affiliation(s)
- C Robert Cloninger
- Wallace Renard Professor of Psychiatry, Genetics, & Psychology, Washington University in St. Louis, USA
| |
Collapse
|
48
|
Mohanty S, Purwar M, Srinivasan N, Rekha N. Tethering preferences of domain families co-occurring in multi-domain proteins. MOLECULAR BIOSYSTEMS 2013; 9:1708-25. [PMID: 23571467 DOI: 10.1039/c3mb25481j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Genomic data of several organisms have revealed the presence of a vast repertoire of multi-domain proteins. The role played by individual domains in a multi-domain protein has a profound influence on the overall function of the protein. In the present analysis an attempt has been made to better understand the tethering preferences of domain families that occur in multi-domain proteins. The analysis has been carried out on an exhaustive dataset of 2 961 898 sequences of proteins from 930 organisms, where 741 274 proteins are comprised of at least two domain families. For every domain family, the number of other domain families with which it co-occurs within a protein in this dataset has been enumerated and is referred to as the tethering number of the domain family. It was found that, in the general dataset, the AAA ATPase family and the family of Ser/Thr kinases have the highest tethering numbers of 450 and 444 respectively. Further analysis reveals significant correlation between the number of members in a family and its tethering number. Positive correlation was also observed for the extent of a sequence and functional diversity within a family and the tethering numbers of domain families. Domain families that are present ubiquitously in diverse organisms tend to have large tethering numbers, while organism/kingdom-specific families have low tethering numbers. Thus, the analysis uncovers how domain families recombine and evolve to give rise to multi-domain proteins.
Collapse
Affiliation(s)
- Smita Mohanty
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | | | | | |
Collapse
|
49
|
Kaushik S, Mutt E, Chellappan A, Sankaran S, Srinivasan N, Sowdhamini R. Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage. PLoS One 2013; 8:e56449. [PMID: 23437136 PMCID: PMC3577913 DOI: 10.1371/journal.pone.0056449] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 01/13/2013] [Indexed: 12/31/2022] Open
Abstract
Background Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. Methodology/Principal Findings We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. Conclusions/Significance Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.
Collapse
Affiliation(s)
- Swati Kaushik
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
| | - Eshita Mutt
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
| | - Ajithavalli Chellappan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
- School of Chemical and Biotechnology, Shanmugha Arts, Science, Technology & Research Academy, Thanjavur, Tamil Nadu, India
| | - Sandhya Sankaran
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Bangalore, India
| | - Narayanaswamy Srinivasan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Bangalore, India
- * E-mail: (NS); (RS)
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
- * E-mail: (NS); (RS)
| |
Collapse
|
50
|
Di Paola L, De Ruvo M, Paci P, Santoni D, Giuliani A. Protein Contact Networks: An Emerging Paradigm in Chemistry. Chem Rev 2012. [DOI: 10.1021/cr3002356] [Citation(s) in RCA: 173] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- L. Di Paola
- Faculty of Engineering, Università CAMPUS BioMedico, Via A. del Portillo,
21, 00128 Roma, Italy
| | | | | | - D. Santoni
- BioMathLab, CNR-Institute of Systems Analysis and Computer Science (IASI), viale Manzoni 30, 00185
Roma, Italy
| | - A. Giuliani
- Environment
and Health Department, Istituto Superiore di Sanità, Viale Regina Elena
299, 00161, Roma, Italy
| |
Collapse
|