1
|
Perin C, Cretin G, Gelly JC. Hierarchical Analysis of Protein Structures: From Secondary Structures to Protein Units and Domains. Methods Mol Biol 2025; 2870:357-370. [PMID: 39543044 DOI: 10.1007/978-1-0716-4213-9_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
The three-dimensional structure of proteins is traditionally organized into hierarchical levels, specifically secondary structures and domains. However, different studies suggest the existence of intermediate levels, such as Protein Units (PUs), which provide a refined understanding of protein architecture. PUs, characterized by their compactness and independence, serve as an intermediate organizational level, bridging the gap between secondary structures and domains. This new view not only enhances our comprehension of protein structure, folding, and evolutionary mechanisms but also provides a robust methodology for identifying and categorizing protein domains. Based on the concept of PUs, alternative structural partitioning solutions can be proposed that address the structural ambiguity of proteins, leading to more meaningful domain identification.
Collapse
Affiliation(s)
- Charlotte Perin
- TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse, France
| | | | | |
Collapse
|
2
|
Cong J, Zhang S, Zhang Q, Yu X, Huang J, Wei X, Huang X, Qiu J, Zhou X. Conserved features and diversity attributes of chimeric RNAs across accessions in four plants. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:3151-3163. [PMID: 39087631 PMCID: PMC11500992 DOI: 10.1111/pbi.14437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 06/17/2024] [Accepted: 07/08/2024] [Indexed: 08/02/2024]
Abstract
As a non-collinear expression form of genetic information, chimeric RNAs increase the complexity of transcriptome in diverse organisms. Although chimeric RNAs have been identified in plants, few common features have been revealed. Here, we systemically explored the landscape of chimeric RNAs across multi-accession and multi-tissue using pan-genome and transcriptome data of four plants: rice, maize, soybean, and Arabidopsis. Among the four species, conserved characteristics of breakpoints and parental genes were discovered. In each species, chimeric RNAs displayed a high level of diversity among accessions, and the clustering of accessions using chimeric events was generally concordant with clustering based on genomic variants, implying a general relationship between genetic variations and chimeric RNAs. Through mass spectrometry, we confirmed a fusion protein OsNDC1-OsGID1L2 and observed its subcellular localization, which differed from the original proteins. Phenotypic cues in transgenic rice suggest the potential functions of OsNDC1-OsGID1L2. Moreover, an intriguing chimeric event Os01g0216500-Os01g0216900, generated by a large deletion in basmati rice, also exists in another accession without the deletion, demonstrating its convergence in evolution. Our results illuminate the characteristics and hint at the evolutionary implications of plant chimeric RNAs, which serve as a supplement to genetic variations, thus expanding our understanding of genetic diversity.
Collapse
Affiliation(s)
- Jia Cong
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Sinan Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
- CAS Center for Excellence in Molecular Plant SciencesChinese Academy of SciencesShanghaiChina
| | - Qi Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xiting Yu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Jiazhi Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xin Wei
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xiaoyi Zhou
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| |
Collapse
|
3
|
Rabenow M, Haar E, Schmidt K, Hänsch R, Mendel RR, Oliphant KD. Convergent evolution links molybdenum insertase domains with organism-specific sequences. Commun Biol 2024; 7:1352. [PMID: 39424966 PMCID: PMC11489736 DOI: 10.1038/s42003-024-07073-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 10/14/2024] [Indexed: 10/21/2024] Open
Abstract
In all domains of life, the biosynthesis of the pterin-based Molybdenum cofactor (Moco) is crucial. Molybdenum (Mo) becomes biologically active by integrating into a unique pyranopterin scaffold, forming Moco. The final two steps of Moco biosynthesis are catalyzed by the two-domain enzyme Mo insertase, linked by gene fusion in higher organisms. Despite well-understood Moco biosynthesis, the evolutionary significance of Mo insertase fusion remains unclear. Here, we present findings from Neurospora crassa that shed light on the critical role of Mo insertase fusion in eukaryotes. Substituting the linkage region with sequences from other species resulted in Moco deficiency, and separate expression of domains, as seen in lower organisms, failed to rescue deficient strains. Stepwise truncation and structural modeling revealed a crucial 20-amino acid sequence within the linkage region essential for fungal growth. Our findings highlight the evolutionary importance of gene fusion and specific sequence composition in eukaryotic Mo insertases.
Collapse
Affiliation(s)
- Miriam Rabenow
- Department of Plant Biology, Technische Universität Braunschweig, Braunschweig, Germany
| | - Eduard Haar
- Department of Plant Biology, Technische Universität Braunschweig, Braunschweig, Germany
| | - Katharina Schmidt
- Department of Plant Biology, Technische Universität Braunschweig, Braunschweig, Germany
| | - Robert Hänsch
- Department of Plant Biology, Technische Universität Braunschweig, Braunschweig, Germany
| | - Ralf R Mendel
- Department of Plant Biology, Technische Universität Braunschweig, Braunschweig, Germany
| | - Kevin D Oliphant
- Department of Plant Biology, Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
4
|
Szymborski J, Emad A. INTREPPPID-an orthologue-informed quintuplet network for cross-species prediction of protein-protein interaction. Brief Bioinform 2024; 25:bbae405. [PMID: 39171984 PMCID: PMC11339867 DOI: 10.1093/bib/bbae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 07/25/2024] [Accepted: 07/31/2024] [Indexed: 08/23/2024] Open
Abstract
An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constraints in time and cost of the associated 'wet lab' experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method that incorporates orthology data using a new 'quintuplet' neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intraspecies and cross-species tasks using strict evaluation datasets. We show that INTREPPPID's orthologous locality loss increases performance because of the biological relevance of the orthologue data and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.
Collapse
Affiliation(s)
- Joseph Szymborski
- Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montréal, QC H3A 0G4, Canada
- Mila, Quebec AI Institute, 6666 St-Urbain Street #200, Montréal, QC H2S 3H1, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montréal, QC H3A 0G4, Canada
- Mila, Quebec AI Institute, 6666 St-Urbain Street #200, Montréal, QC H2S 3H1, Canada
- The Rosalind and Morris Goodman Cancer Institute, 1160 Pine Avenue, Montréal, QC H3A 1A3, Canada
| |
Collapse
|
5
|
Hayford RK, Haley OC, Cannon EK, Portwood JL, Gardiner JM, Andorf CM, Woodhouse MR. Functional annotation and meta-analysis of maize transcriptomes reveal genes involved in biotic and abiotic stress. BMC Genomics 2024; 25:533. [PMID: 38816789 PMCID: PMC11137889 DOI: 10.1186/s12864-024-10443-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 05/22/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Environmental stress factors, such as biotic and abiotic stress, are becoming more common due to climate variability, significantly affecting global maize yield. Transcriptome profiling studies provide insights into the molecular mechanisms underlying stress response in maize, though the functions of many genes are still unknown. To enhance the functional annotation of maize-specific genes, MaizeGDB has outlined a data-driven approach with an emphasis on identifying genes and traits related to biotic and abiotic stress. RESULTS We mapped high-quality RNA-Seq expression reads from 24 different publicly available datasets (17 abiotic and seven biotic studies) generated from the B73 cultivar to the recent version of the reference genome B73 (B73v5) and deduced stress-related functional annotation of maize gene models. We conducted a robust meta-analysis of the transcriptome profiles from the datasets to identify maize loci responsive to stress, identifying 3,230 differentially expressed genes (DEGs): 2,555 DEGs regulated in response to abiotic stress, 408 DEGs regulated during biotic stress, and 267 common DEGs (co-DEGs) that overlap between abiotic and biotic stress. We discovered hub genes from network analyses, and among the hub genes of the co-DEGs we identified a putative NAC domain transcription factor superfamily protein (Zm00001eb369060) IDP275, which previously responded to herbivory and drought stress. IDP275 was up-regulated in our analysis in response to eight different abiotic and four different biotic stresses. A gene set enrichment and pathway analysis of hub genes of the co-DEGs revealed hormone-mediated signaling processes and phenylpropanoid biosynthesis pathways, respectively. Using phylostratigraphic analysis, we also demonstrated how abiotic and biotic stress genes differentially evolve to adapt to changing environments. CONCLUSIONS These results will help facilitate the functional annotation of multiple stress response gene models and annotation in maize. Data can be accessed and downloaded at the Maize Genetics and Genomics Database (MaizeGDB).
Collapse
Affiliation(s)
- Rita K Hayford
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA.
| | - Olivia C Haley
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA
| | - Ethalinda K Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA
| | - John L Portwood
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA
| | - Jack M Gardiner
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| | | |
Collapse
|
6
|
Zou HT, Ji BY, Xie XL. A multi-source molecular network representation model for protein-protein interactions prediction. Sci Rep 2024; 14:6184. [PMID: 38485942 PMCID: PMC10940665 DOI: 10.1038/s41598-024-56286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open
Abstract
The prediction of potential protein-protein interactions (PPIs) is a critical step in decoding diseases and understanding cellular mechanisms. Traditional biological experiments have identified plenty of potential PPIs in recent years, but this problem is still far from being solved. Hence, there is urgent to develop computational models with good performance and high efficiency to predict potential PPIs. In this study, we propose a multi-source molecular network representation learning model (called MultiPPIs) to predict potential protein-protein interactions. Specifically, we first extract the protein sequence features according to the physicochemical properties of amino acids by utilizing the auto covariance method. Second, a multi-source association network is constructed by integrating the known associations among miRNAs, proteins, lncRNAs, drugs, and diseases. The graph representation learning method, DeepWalk, is adopted to extract the multisource association information of proteins with other biomolecules. In this way, the known protein-protein interaction pairs can be represented as a concatenation of the protein sequence and the multi-source association features of proteins. Finally, the Random Forest classifier and corresponding optimal parameters are used for training and prediction. In the results, MultiPPIs obtains an average 86.03% prediction accuracy with 82.69% sensitivity at the AUC of 93.03% under five-fold cross-validation. The experimental results indicate that MultiPPIs has a good prediction performance and provides valuable insights into the field of potential protein-protein interactions prediction. MultiPPIs is free available at https://github.com/jiboyalab/multiPPIs .
Collapse
Affiliation(s)
- Hai-Tao Zou
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China
| | - Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.
| | - Xiao-Lan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China.
| |
Collapse
|
7
|
Zheng Y, Cabassa-Hourton C, Eubel H, Chevreux G, Lignieres L, Crilat E, Braun HP, Lebreton S, Savouré A. Pyrroline-5-carboxylate metabolism protein complex detected in Arabidopsis thaliana leaf mitochondria. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:917-934. [PMID: 37843921 DOI: 10.1093/jxb/erad406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/14/2023] [Indexed: 10/18/2023]
Abstract
Proline dehydrogenase (ProDH) and pyrroline-5-carboxylate (P5C) dehydrogenase (P5CDH) catalyse the oxidation of proline into glutamate via the intermediates P5C and glutamate-semialdehyde (GSA), which spontaneously interconvert. P5C and GSA are also intermediates in the production of glutamate from ornithine and α-ketoglutarate catalysed by ornithine δ-aminotransferase (OAT). ProDH and P5CDH form a fused bifunctional PutA enzyme in Gram-negative bacteria and are associated in a bifunctional substrate-channelling complex in Thermus thermophilus; however, the physical proximity of ProDH and P5CDH in eukaryotes has not been described. Here, we report evidence of physical proximity and interactions between Arabidopsis ProDH, P5CDH, and OAT in the mitochondria of plants during dark-induced leaf senescence when all three enzymes are expressed. Pairwise interactions and localization of the three enzymes were investigated using bimolecular fluorescence complementation with confocal microscopy in tobacco and sub-mitochondrial fractionation in Arabidopsis. Evidence for a complex composed of ProDH, P5CDH, and OAT was revealed by co-migration of the proteins in native conditions upon gel electrophoresis. Co-immunoprecipitation coupled with mass spectrometry analysis confirmed the presence of the P5C metabolism complex in Arabidopsis. Pull-down assays further demonstrated a direct interaction between ProDH1 and P5CDH. P5C metabolism complexes might channel P5C among the constituent enzymes and directly provide electrons to the respiratory electron chain via ProDH.
Collapse
Affiliation(s)
- Yao Zheng
- Sorbonne Université, UPEC, CNRS, IRD, INRAE Institute of Ecology and Environmental Sciences of Paris (iEES), 75005 Paris, France
| | - Cécile Cabassa-Hourton
- Sorbonne Université, UPEC, CNRS, IRD, INRAE Institute of Ecology and Environmental Sciences of Paris (iEES), 75005 Paris, France
| | - Holger Eubel
- Institute of Plant Genetics, Leibniz Universität Hannover, Germany
| | - Guillaume Chevreux
- Université Paris Cité, CNRS, Institut Jacques Monod, F-75013 Paris, France
| | - Laurent Lignieres
- Université Paris Cité, CNRS, Institut Jacques Monod, F-75013 Paris, France
| | - Emilie Crilat
- Sorbonne Université, UPEC, CNRS, IRD, INRAE Institute of Ecology and Environmental Sciences of Paris (iEES), 75005 Paris, France
| | - Hans-Peter Braun
- Institute of Plant Genetics, Leibniz Universität Hannover, Germany
| | - Sandrine Lebreton
- Sorbonne Université, UPEC, CNRS, IRD, INRAE Institute of Ecology and Environmental Sciences of Paris (iEES), 75005 Paris, France
| | - Arnould Savouré
- Sorbonne Université, UPEC, CNRS, IRD, INRAE Institute of Ecology and Environmental Sciences of Paris (iEES), 75005 Paris, France
| |
Collapse
|
8
|
Ishii C, Asatani K, Sakata I. Detecting possible pairs of materials for composites using a material word co-occurrence network. PLoS One 2024; 19:e0297361. [PMID: 38277416 PMCID: PMC10817182 DOI: 10.1371/journal.pone.0297361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 01/02/2024] [Indexed: 01/28/2024] Open
Abstract
Composite materials are popular because of their high performance capabilities, but new material development is time-consuming. To accelerate this process, researchers studying material informatics, an academic discipline combining computational science and material science, have developed less time-consuming approaches for predicting possible material combinations. However, these processes remain problematic because some materials are not suited for them. The limitations of specific candidates for new composites may cause potential new material pairs to be overlooked. To solve this problem, we developed a new method to predict possible composite material pairs by considering more materials than previous techniques. We predicted possible material pairs by conducting link predictions of material word co-occurrence networks while assuming that co-occurring material word pairs in scientific papers on composites were reported as composite materials. As a result, we succeeded in predicting the co-occurrence of material words with high specificity. Nodes tended to link to many other words, generating new links in the created co-occurrence material word network; notably, the number of material words co-occurring with graphene increased rapidly. This phenomenon confirmed that graphene is an attractive composite component. We expect our method to contribute to the accelerated development of new composite materials.
Collapse
Affiliation(s)
- Chika Ishii
- Customer Experience Department, Cisco Systems G.K., Minato-ku, Tokyo, Japan
| | - Kimitaka Asatani
- Department of Technology Management for Innovation, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Ichiro Sakata
- Department of Technology Management for Innovation, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
9
|
Nithya C, Kiran M, Nagarajaram HA. Hubs and Bottlenecks in Protein-Protein Interaction Networks. Methods Mol Biol 2024; 2719:227-248. [PMID: 37803121 DOI: 10.1007/978-1-0716-3461-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Protein-protein interaction networks (PPINs) represent the physical interactions among proteins in a cell. These interactions are critical in all cellular processes, including signal transduction, metabolic regulation, and gene expression. In PPINs, centrality measures are widely used to identify the most critical nodes. The two most commonly used centrality measures in networks are degree and betweenness centralities. Degree centrality is the number of connections a node has in the network, and betweenness centrality is the measure of the extent to which a node lies on the shortest paths between pairs of other nodes in the network. In PPINs, proteins with high degree and betweenness centrality are referred to as hubs and bottlenecks respectively. Hubs and bottlenecks are topologically and functionally essential proteins that play crucial roles in maintaining the network's structure and function. This article comprehensively reviews essential literature on hubs and bottlenecks, including their properties and functions.
Collapse
Affiliation(s)
- Chandramohan Nithya
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Manjari Kiran
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | | |
Collapse
|
10
|
Xia Y, Zhao K, Liu D, Zhou X, Zhang G. Multi-domain and complex protein structure prediction using inter-domain interactions from deep learning. Commun Biol 2023; 6:1221. [PMID: 38040847 PMCID: PMC10692239 DOI: 10.1038/s42003-023-05610-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023] Open
Abstract
Accurately capturing domain-domain interactions is key to understanding protein function and designing structure-based drugs. Although AlphaFold2 has made a breakthrough on single domain, it should be noted that the structure modeling for multi-domain protein and complex remains a challenge. In this study, we developed a multi-domain and complex structure assembly protocol, named DeepAssembly, based on domain segmentation and single domain modeling algorithms. Firstly, DeepAssembly uses a population-based evolutionary algorithm to assemble multi-domain proteins by inter-domain interactions inferred from a developed deep learning network. Secondly, protein complexes are assembled by means of domains rather than chains using DeepAssembly. Experimental results show that on 219 multi-domain proteins, the average inter-domain distance precision by DeepAssembly is 22.7% higher than that of AlphaFold2. Moreover, DeepAssembly improves accuracy by 13.1% for 164 multi-domain structures with low confidence deposited in AlphaFold database. We apply DeepAssembly for the prediction of 247 heterodimers. We find that DeepAssembly successfully predicts the interface (DockQ ≥ 0.23) for 32.4% of the dimers, suggesting a lighter way to assemble complex structures by treating domains as assembly units and using inter-domain interactions learned from monomer structures.
Collapse
Affiliation(s)
- Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
11
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
12
|
Kumar S, Sega S, Lynn-Barbe JK, Harris DL, Koehn JT, Crans DC, Crick DC. Proline Dehydrogenase and Pyrroline 5 Carboxylate Dehydrogenase from Mycobacterium tuberculosis: Evidence for Substrate Channeling. Pathogens 2023; 12:1171. [PMID: 37764979 PMCID: PMC10537722 DOI: 10.3390/pathogens12091171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/25/2023] [Accepted: 09/08/2023] [Indexed: 09/29/2023] Open
Abstract
In Mycobacterium tuberculosis, proline dehydrogenase (PruB) and ∆1-pyrroline-5-carboxylate (P5C) dehydrogenase (PruA) are monofunctional enzymes that catalyze proline oxidation to glutamate via the intermediates P5C and L-glutamate-γ-semialdehyde. Both enzymes are essential for the replication of pathogenic M. tuberculosis. Highly active enzymes were expressed and purified using a Mycobacterium smegmatis expression system. The purified enzymes were characterized using natural substrates and chemically synthesized analogs. The structural requirements of the quinone electron acceptor were examined. PruB displayed activity with all tested lipoquinone analogs (naphthoquinone or benzoquinone). In PruB assays utilizing analogs of the native naphthoquinone [MK-9 (II-H2)] specificity constants Kcat/Km were an order of magnitude greater for the menaquinone analogs than the benzoquinone analogs. In addition, mycobacterial PruA was enzymatically characterized for the first time using exogenous chemically synthesized P5C. A Km value of 120 ± 0.015 µM was determined for P5C, while the Km value for NAD+ was shown to be 33 ± 4.3 µM. Furthermore, proline competitively inhibited PruA activity and coupled enzyme assays, suggesting that the recombinant purified monofunctional PruB and PruA enzymes of M. tuberculosis channel substrate likely increase metabolic flux and protect the bacterium from methylglyoxal toxicity.
Collapse
Affiliation(s)
- Santosh Kumar
- Mycobacteria Research Laboratories, Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO 80523-1682, USA; (S.K.)
| | - Steven Sega
- Mycobacteria Research Laboratories, Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO 80523-1682, USA; (S.K.)
| | - Jamie K. Lynn-Barbe
- Mycobacteria Research Laboratories, Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO 80523-1682, USA; (S.K.)
| | - Dannika L. Harris
- Mycobacteria Research Laboratories, Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO 80523-1682, USA; (S.K.)
| | - Jordan T. Koehn
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599-3290, USA;
| | - Debbie C. Crans
- Chemistry Department, Colorado State University, Fort Collins, CO 80523-1682, USA;
| | - Dean C. Crick
- Mycobacteria Research Laboratories, Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO 80523-1682, USA; (S.K.)
| |
Collapse
|
13
|
Lin ZJ, Huang BX, Su LF, Zhu SY, He JW, Chen GZ, Lin PX. Sub-region analysis of DMD gene in cases with idiopathic generalized epilepsy. Neurogenetics 2023; 24:161-169. [PMID: 37022522 DOI: 10.1007/s10048-023-00715-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/24/2023] [Indexed: 04/07/2023]
Abstract
Gene sub-region encoded protein domain is the basic unit for protein structure and function. The DMD gene is the largest coding gene in humans, with its phenotype relevant to idiopathic generalized epilepsy. We hypothesized variants clustered in sub-regions of idiopathic generalized epilepsy genes and investigated the relationship between the DMD gene and idiopathic generalized epilepsy. Whole exome sequencing was performed in 106 idiopathic generalized epilepsy individuals. DMD variants were filtered with variant type, allele frequency, in silico prediction, hemizygous or homozygous status in the population, inheritance mode, and domain location. Variants located at the sub-regions were selected by the subRVIS software. The pathogenicity of variants was evaluated by the American College of Medical Genetics and Genomics criteria. Articles on functional studies related to epilepsy for variants clustered protein domains were reviewed. In sub-regions of the DMD gene, two variants were identified in two unrelated cases with juvenile absence epilepsy or juvenile myoclonic epilepsy. The pathogenicity of both variants was uncertain significance. Allele frequency of both variants in probands with idiopathic generalized epilepsy reached statistical significance compared with the population (Fisher's test, p = 2.02 × 10-6, adjusted α = 4.52 × 10-6). The variants clustered in the spectrin domain of dystrophin, which binds to glycoprotein complexes and indirectly affects ion channels contributing to epileptogenesis. Gene sub-region analysis suggests a weak association between the DMD gene and idiopathic generalized epilepsy. Functional analysis of gene sub-region helps infer the pathogenesis of idiopathic generalized epilepsy.
Collapse
Affiliation(s)
- Zhi-Jian Lin
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Bi-Xia Huang
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Li-Fang Su
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Sheng-Yin Zhu
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Jun-Wei He
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Guo-Zhang Chen
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Peng-Xing Lin
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China.
| |
Collapse
|
14
|
Wang Z, Deng Z, Zhang W, Lou Q, Choi KS, Wei Z, Wang L, Wu J. MMSMAPlus: a multi-view multi-scale multi-attention embedding model for protein function prediction. Brief Bioinform 2023:7187109. [PMID: 37258453 DOI: 10.1093/bib/bbad201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 04/16/2023] [Accepted: 05/08/2023] [Indexed: 06/02/2023] Open
Abstract
Protein is the most important component in organisms and plays an indispensable role in life activities. In recent years, a large number of intelligent methods have been proposed to predict protein function. These methods obtain different types of protein information, including sequence, structure and interaction network. Among them, protein sequences have gained significant attention where methods are investigated to extract the information from different views of features. However, how to fully exploit the views for effective protein sequence analysis remains a challenge. In this regard, we propose a multi-view, multi-scale and multi-attention deep neural model (MMSMA) for protein function prediction. First, MMSMA extracts multi-view features from protein sequences, including one-hot encoding features, evolutionary information features, deep semantic features and overlapping property features based on physiochemistry. Second, a specific multi-scale multi-attention deep network model (MSMA) is built for each view to realize the deep feature learning and preliminary classification. In MSMA, both multi-scale local patterns and long-range dependence from protein sequences can be captured. Third, a multi-view adaptive decision mechanism is developed to make a comprehensive decision based on the classification results of all the views. To further improve the prediction performance, an extended version of MMSMA, MMSMAPlus, is proposed to integrate homology-based protein prediction under the framework of multi-view deep neural model. Experimental results show that the MMSMAPlus has promising performance and is significantly superior to the state-of-the-art methods. The source code can be found at https://github.com/wzy-2020/MMSMAPlus.
Collapse
Affiliation(s)
- Zhongyu Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Wei Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Qiongdan Lou
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | | | - Zhisheng Wei
- National Key Laboratory of Food Science and Resource Mining, Jiangnan University, Wuxi, China
| | - Lei Wang
- National Key Laboratory of Food Science and Resource Mining, Jiangnan University, Wuxi, China
| | - Jing Wu
- National Key Laboratory of Food Science and Resource Mining, Jiangnan University, Wuxi, China
| |
Collapse
|
15
|
Laval F, Coppin G, Twizere JC, Vidal M. Homo cerevisiae-Leveraging Yeast for Investigating Protein-Protein Interactions and Their Role in Human Disease. Int J Mol Sci 2023; 24:9179. [PMID: 37298131 PMCID: PMC10252790 DOI: 10.3390/ijms24119179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 05/20/2023] [Accepted: 05/22/2023] [Indexed: 06/12/2023] Open
Abstract
Understanding how genetic variation affects phenotypes represents a major challenge, particularly in the context of human disease. Although numerous disease-associated genes have been identified, the clinical significance of most human variants remains unknown. Despite unparalleled advances in genomics, functional assays often lack sufficient throughput, hindering efficient variant functionalization. There is a critical need for the development of more potent, high-throughput methods for characterizing human genetic variants. Here, we review how yeast helps tackle this challenge, both as a valuable model organism and as an experimental tool for investigating the molecular basis of phenotypic perturbation upon genetic variation. In systems biology, yeast has played a pivotal role as a highly scalable platform which has allowed us to gain extensive genetic and molecular knowledge, including the construction of comprehensive interactome maps at the proteome scale for various organisms. By leveraging interactome networks, one can view biology from a systems perspective, unravel the molecular mechanisms underlying genetic diseases, and identify therapeutic targets. The use of yeast to assess the molecular impacts of genetic variants, including those associated with viral interactions, cancer, and rare and complex diseases, has the potential to bridge the gap between genotype and phenotype, opening the door for precision medicine approaches and therapeutic development.
Collapse
Affiliation(s)
- Florent Laval
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; (F.L.); (G.C.)
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- TERRA Teaching and Research Centre, University of Liège, 5030 Gembloux, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, 4000 Liège, Belgium
| | - Georges Coppin
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; (F.L.); (G.C.)
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, 4000 Liège, Belgium
| | - Jean-Claude Twizere
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; (F.L.); (G.C.)
- TERRA Teaching and Research Centre, University of Liège, 5030 Gembloux, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, 4000 Liège, Belgium
- Division of Science and Math, New York University Abu Dhabi, Abu Dhabi P.O. Box 129188, United Arab Emirates
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; (F.L.); (G.C.)
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
16
|
The Rosetta Stone Hypothesis-Based Interaction of the Tumor Suppressor Proteins Nit1 and Fhit. Cells 2023; 12:cells12030353. [PMID: 36766695 PMCID: PMC9913352 DOI: 10.3390/cells12030353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/11/2023] [Accepted: 01/14/2023] [Indexed: 01/20/2023] Open
Abstract
In previous studies, we have identified the tumor suppressor proteins Fhit (fragile histidine triad) and Nit1 (Nitrilase1) as interaction partners of β-catenin both acting as repressors of the canonical Wnt pathway. Interestingly, in D. melanogaster and C. elegans these proteins are expressed as NitFhit fusion proteins. According to the Rosetta Stone hypothesis, if proteins are expressed as fusion proteins in one organism and as single proteins in others, the latter should interact physically and show common signaling function. Here, we tested this hypothesis and provide the first biochemical evidence for a direct association between Nit1 and Fhit. In addition, size exclusion chromatography of purified recombinant human Nit1 showed a tetrameric structure as also previously observed for the NitFhit Rosetta Stone fusion protein Nft-1 in C. elegans. Finally, in line with the Rosetta Stone hypothesis we identified Hsp60 and Ubc9 as other common interaction partners of Nit1 and Fhit. The interaction of Nit1 and Fhit may affect their enzymatic activities as well as interaction with other binding partners.
Collapse
|
17
|
Esch L, Kirsch C, Vogel L, Kelm J, Huwa N, Schmitz M, Classen T, Schaffrath U. Pathogen Resistance Depending on Jacalin-Dirigent Chimeric Proteins Is Common among Poaceae but Absent in the Dicot Arabidopsis as Evidenced by Analysis of Homologous Single-Domain Proteins. PLANTS (BASEL, SWITZERLAND) 2022; 12:67. [PMID: 36616196 PMCID: PMC9824508 DOI: 10.3390/plants12010067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/16/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
MonocotJRLs are Poaceae-specific two-domain proteins that consist of a jacalin-related lectin (JRL) and a dirigent (DIR) domain which participate in multiple developmental processes, including disease resistance. For OsJAC1, a monocotJRL from rice, it has been confirmed that constitutive expression in transgenic rice or barley plants facilitates broad-spectrum disease resistance. In this process, both domains of OsJAC1 act cooperatively, as evidenced from experiments with artificially separated JRL- or DIR-domain-containing proteins. Interestingly, these chimeric proteins did not evolve in dicotyledonous plants. Instead, proteins with a single JRL domain, multiple JRL domains or JRL domains fused to domains other than DIR domains are present. In this study, we wanted to test if the cooperative function of JRL and DIR proteins leading to pathogen resistance was conserved in the dicotyledonous plant Arabidopsis thaliana. In Arabidopsis, we identified 50 JRL and 24 DIR proteins, respectively, from which seven single-domain JRL and two single-domain DIR candidates were selected. A single-cell transient gene expression assay in barley revealed that specific combinations of the Arabidopsis JRL and DIR candidates reduced the penetration success of barley powdery mildew. Strikingly, one of these pairs, AtJAX1 and AtDIR19, is encoded by genes located next to each other on chromosome one. However, when using natural variation and analyzing Arabidopsis ecotypes that express full-length or truncated versions of AtJAX1, the presence/absence of the full-length AtJAX1 protein could not be correlated with resistance to the powdery mildew fungus Golovinomyces orontii. Furthermore, an analysis of the additional JRL and DIR candidates in a bi-fluorescence complementation assay in Nicotiana benthamiana revealed no direct interaction of these JRL/DIR pairs. Since transgenic Arabidopsis plants expressing OsJAC1-GFP also did not show increased resistance to G. orontii, it was concluded that the resistance mediated by the synergistic activities of DIR and JRL proteins is specific for members of the Poaceae, at least regarding the resistance against powdery mildew. Arabidopsis lacks the essential components of the DIR-JRL-dependent resistance pathway.
Collapse
Affiliation(s)
- Lara Esch
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Christian Kirsch
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Lara Vogel
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Jana Kelm
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Nikolai Huwa
- Institute for Bioorganic Chemistry, Heinrich Heine University Düsseldorf, 52425 Jülich, Germany
| | - Maike Schmitz
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Thomas Classen
- Institute for Bio- and Geosciences 1: Bioorganic Chemistry, Forschungszentrum Jülich, 52425 Jülich, Germany
| | - Ulrich Schaffrath
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| |
Collapse
|
18
|
Bolotin E, Melamed D, Livnat A. Genes that are Used Together are More Likely to be Fused Together in Evolution by Mutational Mechanisms: A Bioinformatic Test of the Used-Fused Hypothesis. Evol Biol 2022; 50:30-55. [PMID: 36816837 PMCID: PMC9925542 DOI: 10.1007/s11692-022-09579-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 09/11/2022] [Indexed: 12/05/2022]
Abstract
Cases of parallel or recurrent gene fusions in evolution as well as in genetic disease and cancer are difficult to explain, because unlike point mutations, they can require the repetition of a similar configuration of multiple breakpoints rather than the repetition of a single point mutation. The used-together-fused-together hypothesis holds that genes that are used together repeatedly and persistently in a specific context are more likely to undergo fusion mutation in the course of evolution for mechanistic reasons. This hypothesis offers to explain gene fusion in both evolution and disease under one umbrella. Using bioinformatic data, we tested this hypothesis against alternatives, including that all gene pairs can fuse by random mutation, but among pairs thus fused, those that had interacted previously are more likely to be favored by selection. Results show that across multiple measures of gene interaction, human genes whose orthologs are fused in one or more species are more likely to interact with each other than random pairs of genes of the same genomic distance between pair members; that an overlap exists between genes that fused in the course of evolution in non-human species and genes that undergo fusion in human cancers; and that across six primate species studied, fusions predominate over fissions and exhibit substantial evolutionary parallelism. Together, these results support the used-together-fused-together hypothesis over its alternatives. Multiple implications are discussed, including the relevance of mutational mechanisms to the evolution of genome organization, to the distribution of fitness effects of mutation, to evolutionary parallelism and more. Supplementary Information The online version contains supplementary material available at 10.1007/s11692-022-09579-9.
Collapse
Affiliation(s)
- Evgeni Bolotin
- Department of Evolutionary and Environmental Biology, University of Haifa, 3498838 Haifa, Israel
- Institute of Evolution, University of Haifa, Haifa, 3498838 Israel
| | - Daniel Melamed
- Department of Evolutionary and Environmental Biology, University of Haifa, 3498838 Haifa, Israel
- Institute of Evolution, University of Haifa, Haifa, 3498838 Israel
| | - Adi Livnat
- Department of Evolutionary and Environmental Biology, University of Haifa, 3498838 Haifa, Israel
- Institute of Evolution, University of Haifa, Haifa, 3498838 Israel
| |
Collapse
|
19
|
Suria AM, Smith S, Speare L, Chen Y, Chien I, Clark EG, Krueger M, Warwick AM, Wilkins H, Septer AN. Prevalence and diversity of type VI secretion systems in a model beneficial symbiosis. Front Microbiol 2022; 13:988044. [PMID: 36187973 PMCID: PMC9515649 DOI: 10.3389/fmicb.2022.988044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 08/24/2022] [Indexed: 11/13/2022] Open
Abstract
The type VI secretion system (T6SS) is widely distributed in diverse bacterial species and habitats where it is required for interbacterial competition and interactions with eukaryotic cells. Previous work described the role of a T6SS in the beneficial symbiont, Vibrio fischeri, during colonization of the light organ of Euprymna scolopes squid. However, the prevalence and diversity of T6SSs found within the distinct symbiotic structures of this model host have not yet been determined. Here, we analyzed 73 genomes of isolates from squid light organs and accessory nidamental glands (ANGs) and 178 reference genomes. We found that the majority of these bacterial symbionts encode diverse T6SSs from four distinct classes, and most share homology with T6SSs from more distantly related species, including pathogens of animals and humans. These findings indicate that T6SSs with shared evolutionary histories can be integrated into the cellular systems of host-associated bacteria with different effects on host health. Furthermore, we found that one T6SS in V. fischeri is located within a genomic island with high genomic plasticity. Five distinct genomic island genotypes were identified, suggesting this region encodes diverse functional potential that natural selection can act on. Finally, analysis of newly described T6SSs in roseobacter clade ANG isolates revealed a novel predicted protein that appears to be a fusion of the TssB-TssC sheath components. This work underscores the importance of studying T6SSs in diverse organisms and natural habitats to better understand how T6SSs promote the propagation of bacterial populations and impact host health.
Collapse
Affiliation(s)
- Andrea M. Suria
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Stephanie Smith
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Lauren Speare
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States,Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuzhou Chen
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Iris Chien
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Emily Grace Clark
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Madelyn Krueger
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Alexander M. Warwick
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Hannah Wilkins
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Alecia N. Septer
- Department of Earth, Marine and Environmental Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States,*Correspondence: Alecia N. Septer,
| |
Collapse
|
20
|
Baranwal M, Magner A, Saldinger J, Turali-Emre ES, Elvati P, Kozarekar S, VanEpps JS, Kotov NA, Violi A, Hero AO. Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions. BMC Bioinformatics 2022; 23:370. [PMID: 36088285 PMCID: PMC9464414 DOI: 10.1186/s12859-022-04910-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/26/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Development of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. RESULTS In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein-protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein-protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy. CONCLUSIONS In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph ). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues.
Collapse
Affiliation(s)
- Mayank Baranwal
- Division of Data and Decision Sciences, Tata Consultancy Services Research, Mumbai, India
- Systems and Control Engineering Group, Indian Institute of Technology, Bombay, India
| | - Abram Magner
- Department of Computer Science, University of Albany, SUNY, Albany, USA
| | - Jacob Saldinger
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
| | | | - Paolo Elvati
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, USA
| | - Shivani Kozarekar
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
| | - J. Scott VanEpps
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Department of Emergency Medicine, University of Michigan, Ann Arbor, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, USA
| | - Nicholas A. Kotov
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, USA
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, USA
| | - Angela Violi
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, USA
- Biophysics Program, University of Michigan, Ann Arbor, USA
| | - Alfred O. Hero
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA
- Department of Statistics, University of Michigan, Ann Arbor, USA
- Program in Applied Interdisciplinary Mathematics, University of Michigan, Ann Arbor, USA
- Program in Bioinformatics, University of Michigan, Ann Arbor, USA
| |
Collapse
|
21
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
22
|
Sen N, Madhusudhan MS. A structural database of chain–chain and domain–domain interfaces of proteins. Protein Sci 2022. [DOI: 10.1002/pro.4406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Neeladri Sen
- Indian Institute of Science Education and Research Pune India
- Institute of Structural and Molecular Biology University College London London UK
| | | |
Collapse
|
23
|
Escudeiro P, Henry CS, Dias RP. Functional characterization of prokaryotic dark matter: the road so far and what lies ahead. CURRENT RESEARCH IN MICROBIAL SCIENCES 2022; 3:100159. [PMID: 36561390 PMCID: PMC9764257 DOI: 10.1016/j.crmicr.2022.100159] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 12/25/2022] Open
Abstract
Eight-hundred thousand to one trillion prokaryotic species may inhabit our planet. Yet, fewer than two-hundred thousand prokaryotic species have been described. This uncharted fraction of microbial diversity, and its undisclosed coding potential, is known as the "microbial dark matter" (MDM). Next-generation sequencing has allowed to collect a massive amount of genome sequence data, leading to unprecedented advances in the field of genomics. Still, harnessing new functional information from the genomes of uncultured prokaryotes is often limited by standard classification methods. These methods often rely on sequence similarity searches against reference genomes from cultured species. This hinders the discovery of unique genetic elements that are missing from the cultivated realm. It also contributes to the accumulation of prokaryotic gene products of unknown function among public sequence data repositories, highlighting the need for new approaches for sequencing data analysis and classification. Increasing evidence indicates that these proteins of unknown function might be a treasure trove of biotechnological potential. Here, we outline the challenges, opportunities, and the potential hidden within the functional dark matter (FDM) of prokaryotes. We also discuss the pitfalls surrounding molecular and computational approaches currently used to probe these uncharted waters, and discuss future opportunities for research and applications.
Collapse
Affiliation(s)
- Pedro Escudeiro
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Christopher S. Henry
- Argonne National Laboratory, Lemont, Illinois, USA
- University of Chicago, Chicago, Illinois, USA
| | - Ricardo P.M. Dias
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
- iXLab - Innovation for National Biological Resilience, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| |
Collapse
|
24
|
Cui X, Xue Y, McCormack C, Garces A, Rachman TW, Yi Y, Stolzer M, Durand D. Simulating domain architecture evolution. Bioinformatics 2022; 38:i134-i142. [PMID: 35758772 PMCID: PMC9236583 DOI: 10.1093/bioinformatics/btac242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Simulation is an essential technique for generating biomolecular data with a ‘known’ history for use in validating phylogenetic inference and other evolutionary methods. On longer time scales, simulation supports investigations of equilibrium behavior and provides a formal framework for testing competing evolutionary hypotheses. Twenty years of molecular evolution research have produced a rich repertoire of simulation methods. However, current models do not capture the stringent constraints acting on the domain insertions, duplications, and deletions by which multidomain architectures evolve. Although these processes have the potential to generate any combination of domains, only a tiny fraction of possible domain combinations are observed in nature. Modeling these stringent constraints on domain order and co-occurrence is a fundamental challenge in domain architecture simulation that does not arise with sequence and gene family simulation. Results Here, we introduce a stochastic model of domain architecture evolution to simulate evolutionary trajectories that reflect the constraints on domain order and co-occurrence observed in nature. This framework is implemented in a novel domain architecture simulator, DomArchov, using the Metropolis–Hastings algorithm with data-driven transition probabilities. The use of a data-driven event module enables quick and easy redeployment of the simulator for use in different taxonomic and protein function contexts. Using empirical evaluation with metazoan datasets, we demonstrate that domain architectures simulated by DomArchov recapitulate properties of genuine domain architectures that reflect the constraints on domain order and adjacency seen in nature. This work expands the realm of evolutionary processes that are amenable to simulation. Availability and implementation DomArchov is written in Python 3 and is available at http://www.cs.cmu.edu/~durand/DomArchov. The data underlying this article are available via the same link. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyue Cui
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yifan Xue
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Collin McCormack
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alejandro Garces
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Thomas W Rachman
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yang Yi
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
25
|
Maphosa MN, Steenkamp ET, Kanzi AM, van Wyk S, De Vos L, Santana QC, Duong TA, Wingfield BD. Intra-Species Genomic Variation in the Pine Pathogen Fusarium circinatum. J Fungi (Basel) 2022; 8:jof8070657. [PMID: 35887414 PMCID: PMC9316270 DOI: 10.3390/jof8070657] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 06/02/2022] [Accepted: 06/08/2022] [Indexed: 12/10/2022] Open
Abstract
Fusarium circinatum is an important global pathogen of pine trees. Genome plasticity has been observed in different isolates of the fungus, but no genome comparisons are available. To address this gap, we sequenced and assembled to chromosome level five isolates of F. circinatum. These genomes were analysed together with previously published genomes of F. circinatum isolates, FSP34 and KS17. Multi-sample variant calling identified a total of 461,683 micro variants (SNPs and small indels) and a total of 1828 macro structural variants of which 1717 were copy number variants and 111 were inversions. The variant density was higher on the sub-telomeric regions of chromosomes. Variant annotation revealed that genes involved in transcription, transport, metabolism and transmembrane proteins were overrepresented in gene sets that were affected by high impact variants. A core genome representing genomic elements that were conserved in all the isolates and a non-redundant pangenome representing all genomic elements is presented. Whole genome alignments showed that an average of 93% of the genomic elements were present in all isolates. The results of this study reveal that some genomic elements are not conserved within the isolates and some variants are high impact. The described genome-scale variations will help to inform novel disease management strategies against the pathogen.
Collapse
|
26
|
Guo K, Buehler MJ. Rapid prediction of protein natural frequencies using graph neural networks. DIGITAL DISCOVERY 2022; 1:277-285. [PMID: 35769204 PMCID: PMC9189858 DOI: 10.1039/d1dd00007a] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 03/28/2022] [Indexed: 11/21/2022]
Abstract
Natural vibrational frequencies of proteins help to correlate functional shifts with sequence or geometric variations that lead to negligible changes in protein structures, such as point mutations related to disease lethality or medication effectiveness. Normal mode analysis is a well-known approach to accurately obtain protein natural frequencies. However, it is not feasible when high-resolution protein structures are not available or time consuming to obtain. Here we provide a machine learning model to directly predict protein frequencies from primary amino acid sequences and low-resolution structural features such as contact or distance maps. We utilize a graph neural network called principal neighborhood aggregation, trained with the structural graphs and normal mode frequencies of more than 34 000 proteins from the protein data bank. combining with existing contact/distance map prediction tools, this approach enables an end-to-end prediction of the frequency spectrum of a protein given its primary sequence.
Collapse
Affiliation(s)
- Kai Guo
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. 1-165 Cambridge Massachusetts 02139 USA +1 617 452 2750
- Institute of High Performance Computing, ASTAR Singapore 138632 Singapore
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. 1-165 Cambridge Massachusetts 02139 USA +1 617 452 2750
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge Massachusetts 02139 USA
- Center for Materials Science and Engineering 77 Massachusetts Ave Cambridge Massachusetts 02139 USA
| |
Collapse
|
27
|
In silico Methods for Identification of Potential Therapeutic Targets. Interdiscip Sci 2022; 14:285-310. [PMID: 34826045 PMCID: PMC8616973 DOI: 10.1007/s12539-021-00491-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 10/19/2021] [Accepted: 11/01/2021] [Indexed: 11/01/2022]
Abstract
AbstractAt the initial stage of drug discovery, identifying novel targets with maximal efficacy and minimal side effects can improve the success rate and portfolio value of drug discovery projects while simultaneously reducing cycle time and cost. However, harnessing the full potential of big data to narrow the range of plausible targets through existing computational methods remains a key issue in this field. This paper reviews two categories of in silico methods—comparative genomics and network-based methods—for finding potential therapeutic targets among cellular functions based on understanding their related biological processes. In addition to describing the principles, databases, software, and applications, we discuss some recent studies and prospects of the methods. While comparative genomics is mostly applied to infectious diseases, network-based methods can be applied to infectious and non-infectious diseases. Nonetheless, the methods often complement each other in their advantages and disadvantages. The information reported here guides toward improving the application of big data-driven computational methods for therapeutic target discovery.
Graphical abstract
Collapse
|
28
|
Botas J, Rodríguez Del Río Á, Giner-Lamia J, Huerta-Cepas J. GeCoViz: genomic context visualisation of prokaryotic genes from a functional and evolutionary perspective. Nucleic Acids Res 2022; 50:W352-W357. [PMID: 35639770 PMCID: PMC9252766 DOI: 10.1093/nar/gkac367] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/14/2022] [Accepted: 05/05/2022] [Indexed: 11/14/2022] Open
Abstract
Synteny conservation analysis is a well-established methodology to investigate the potential functional role of unknown prokaryotic genes. However, bioinformatic tools to reconstruct and visualise genomic contexts usually depend on slow computations, are restricted to narrow taxonomic ranges, and/or do not allow for the functional and interactive exploration of neighbouring genes across different species. Here, we present GeCoViz, an online resource built upon 12 221 reference prokaryotic genomes that provides fast and interactive visualisation of custom genomic regions anchored by any target gene, which can be sought by either name, orthologous group (KEGGs, eggNOGs), protein domain (PFAM) or sequence. To facilitate functional and evolutionary interpretation, GeCoViz allows to customise the taxonomic scope of each analysis and provides comprehensive annotations of the neighbouring genes. Interactive visualisation options include, among others, the scaled representations of gene lengths and genomic distances, and on the fly calculation of synteny conservation of neighbouring genes, which can be highlighted based on custom thresholds. The resulting plots can be downloaded as high-quality images for publishing purposes. Overall, GeCoViz offers an easy-to-use, comprehensive, fast and interactive web-based tool for investigating the genomic context of prokaryotic genes, and is freely available at https://gecoviz.cgmlab.org.
Collapse
Affiliation(s)
- Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| | - Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain.,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, 28040, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| |
Collapse
|
29
|
Sciolino N, Liu A, Breindel L, Burz DS, Sulchek T, Shekhtman A. Microfluidics delivery of DARPP-32 into HeLa cells maintains viability for in-cell NMR spectroscopy. Commun Biol 2022; 5:451. [PMID: 35551287 PMCID: PMC9098904 DOI: 10.1038/s42003-022-03412-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 04/26/2022] [Indexed: 11/09/2022] Open
Abstract
High-resolution structural studies of proteins and protein complexes in a native eukaryotic environment present a challenge to structural biology. In-cell NMR can characterize atomic resolution structures but requires high concentrations of labeled proteins in intact cells. Most exogenous delivery techniques are limited to specific cell types or are too destructive to preserve cellular physiology. The feasibility of microfluidics transfection or volume exchange for convective transfer, VECT, as a means to deliver labeled target proteins to HeLa cells for in-cell NMR experiments is demonstrated. VECT delivery does not require optimization or impede cell viability; cells are immediately available for long-term eukaryotic in-cell NMR experiments. In-cell NMR-based drug screening using VECT was demonstrated by collecting spectra of the sensor molecule DARPP32, in response to exogenous administration of Forskolin.
Collapse
Affiliation(s)
- Nicholas Sciolino
- University at Albany, Department of Chemistry, Albany, NY, 12222, USA
| | - Anna Liu
- Georgia Tech, School of Mechanical Engineering, Atlanta, GA, 30332, USA
| | - Leonard Breindel
- University at Albany, Department of Chemistry, Albany, NY, 12222, USA
| | - David S Burz
- University at Albany, Department of Chemistry, Albany, NY, 12222, USA
| | - Todd Sulchek
- Georgia Tech, School of Mechanical Engineering, Atlanta, GA, 30332, USA
| | | |
Collapse
|
30
|
Abstract
The hypervariable residues that compose the major part of proteins’ surfaces are generally considered outside evolutionary control. Yet, these “nonconserved” residues determine the outcome of stochastic encounters in crowded cells. It has recently become apparent that these encounters are not as random as one might imagine, but carefully orchestrated by the intracellular electrostatics to optimize protein diffusion, interactivity, and partner search. The most influential factor here is the protein surface-charge density, which takes different optimal values across organisms with different intracellular conditions. In this study, we examine how far the net-charge density and other physicochemical properties of proteomes will take us in terms of distinguishing organisms in general. The results show that these global proteome properties not only follow the established taxonomical hierarchy, but also provide clues to functional adaptation. In many cases, the proteome–property divergence is even resolved at species level. Accordingly, the variable parts of the genes are not as free to drift as they seem in sequence alignment, but present a complementary tool for functional, taxonomic, and evolutionary assignment.
Collapse
|
31
|
Kaundal R, Loaiza CD, Duhan N, Flann N. deepHPI: a comprehensive deep learning platform for accurate prediction and visualization of host-pathogen protein-protein interactions. Brief Bioinform 2022; 23:6576450. [PMID: 35511057 DOI: 10.1093/bib/bbac125] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 02/07/2022] [Accepted: 03/15/2022] [Indexed: 01/06/2023] Open
Abstract
Host-pathogen protein interactions (HPPIs) play vital roles in many biological processes and are directly involved in infectious diseases. With the outbreak of more frequent pandemics in the last couple of decades, such as the recent outburst of Covid-19 causing millions of deaths, it has become more critical to develop advanced methods to accurately predict pathogen interactions with their respective hosts. During the last decade, experimental methods to identify HPIs have been used to decipher host-pathogen systems with the caveat that those techniques are labor-intensive, expensive and time-consuming. Alternatively, accurate prediction of HPIs can be performed by the use of data-driven machine learning. To provide a more robust and accurate solution for the HPI prediction problem, we have developed a deepHPI tool based on deep learning. The web server delivers four host-pathogen model types: plant-pathogen, human-bacteria, human-virus and animal-pathogen, leveraging its operability to a wide range of analyses and cases of use. The deepHPI web tool is the first to use convolutional neural network models for HPI prediction. These models have been selected based on a comprehensive evaluation of protein features and neural network architectures. The best prediction models have been tested on independent validation datasets, which achieved an overall Matthews correlation coefficient value of 0.87 for animal-pathogen using the combined pseudo-amino acid composition and conjoint triad (PAAC_CT) features, 0.75 for human-bacteria using the combined pseudo-amino acid composition, conjoint triad and normalized Moreau-Broto feature (PAAC_CT_NMBroto), 0.96 for human-virus using PAAC_CT_NMBroto and 0.94 values for plant-pathogen interactions using the combined pseudo-amino acid composition, composition and transition feature (PAAC_CTDC_CTDT). Our server running deepHPI is deployed on a high-performance computing cluster that enables large and multiple user requests, and it provides more information about interactions discovered. It presents an enriched visualization of the resulting host-pathogen networks that is augmented with external links to various protein annotation resources. We believe that the deepHPI web server will be very useful to researchers, particularly those working on infectious diseases. Additionally, many novel and known host-pathogen systems can be further investigated to significantly advance our understanding of complex disease-causing agents. The developed models are established on a web server, which is freely accessible at http://bioinfo.usu.edu/deepHPI/.
Collapse
Affiliation(s)
- Rakesh Kaundal
- Bioinformatics Facility, Center for Integrated BioSystems, College of Agriculture and Applied Sciences.,Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences.,Department of Computer Science, College of Science; Utah State University, Logan, 84322 USA
| | - Cristian D Loaiza
- Bioinformatics Facility, Center for Integrated BioSystems, College of Agriculture and Applied Sciences.,Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences
| | - Naveen Duhan
- Bioinformatics Facility, Center for Integrated BioSystems, College of Agriculture and Applied Sciences.,Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences
| | - Nicholas Flann
- Department of Computer Science, College of Science; Utah State University, Logan, 84322 USA
| |
Collapse
|
32
|
Gao M, Nakajima An D, Parks JM, Skolnick J. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 2022; 13:1744. [PMID: 35365655 PMCID: PMC8975832 DOI: 10.1038/s41467-022-29394-2] [Citation(s) in RCA: 128] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 03/15/2022] [Indexed: 12/20/2022] Open
Abstract
Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Remarkably accurate atomic structures have been recently computed for individual proteins by AlphaFold2 (AF2). Here, we demonstrate that the same neural network models from AF2 developed for single protein sequences can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches, our method, AF2Complex, does not require paired multiple sequence alignments. It achieves higher accuracy than some complex protein-protein docking strategies and provides a significant improvement over AF-Multimer, a development of AlphaFold for multimeric proteins. Moreover, we introduce metrics for predicting direct protein-protein interactions between arbitrary protein pairs and validate AF2Complex on some challenging benchmark sets and the E. coli proteome. Lastly, using the cytochrome c biogenesis system I as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Atlanta, GA, USA.
| | - Davi Nakajima An
- School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jerry M Parks
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Atlanta, GA, USA.
| |
Collapse
|
33
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
34
|
Evolutionary genomic relationships and coupling in MK-STYX and STYX pseudophosphatases. Sci Rep 2022; 12:4139. [PMID: 35264672 PMCID: PMC8907265 DOI: 10.1038/s41598-022-07943-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 02/28/2022] [Indexed: 11/08/2022] Open
Abstract
The dual specificity phosphatase (DUSP) family has catalytically inactive members, called pseudophosphatases. They have mutations in their catalytic motifs that render them enzymatically inactive. This study analyzes the significance of two pseudophosphatases, MK-STYX [MAPK (mitogen-activated protein kinase phosphoserine/threonine/tyrosine-binding protein]) and STYX (serine/threonine/tyrosine-interacting protein), throughout their evolution and provides measurements and comparison of their evolutionary conservation. Phylogenetic trees were constructed to show any deviation from various species evolutionary paths. Data was collected on a large set of proteins that have either one of the two domains of MK-STYX, the DUSP domain or the cdc-25 homology (CH2) /rhodanese-like domain. The distance between species pairs for MK-STYX or STYX and Ka/Ks ratio were calculated. In addition, both pseudophosphatases were ranked among a large set of related proteins, including the active homologs of MK-STYX, MKP (MAPK phosphatase)-1 and MKP-3. MK-STYX had one of the highest species-species protein distances and was under weaker purifying selection pressure than most proteins with its domains. In contrast, the protein distances of STYX were lower than 82% of the DUSP-containing proteins and was under one of the strongest purifying selection pressures. However, there was similar selection pressure on the N-terminal sequences of MK-STYX, STYX, MKP-1, and MKP-3. We next perform statistical coupling analysis, a process that reveals interconnected regions within the proteins. We find that while MKP-1,-3, and STYX all have 2 functional units (sectors), MK-STYX only has one, and that MK-STYX is similar to MKP-3 in the evolutionary coupling of the active site and KIM domain. Within those two domains, the mean coupling is also most similar for MK-STYX and MKP-3. This study reveals striking distinctions between the evolutionary patterns of MK-STYX and STYX, suggesting a very specific role for each pseudophosphatase, further highlighting the relevance of these atypical members of DUSP as signaling regulators. Therefore, our study provides computational evidence and evolutionary reasons to further explore the properties of pseudophosphatases, in particular MK-STYX and STYX.
Collapse
|
35
|
Murcia-Garzón J, Méndez-Tenorio A. Promiscuous Domains in Eukaryotes and HAT Proteins in FUNGI Have Followed Different Evolutionary Paths. J Mol Evol 2022; 90:124-138. [PMID: 35084521 DOI: 10.1007/s00239-021-10046-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/27/2021] [Indexed: 10/19/2022]
Abstract
Diverse studies have shown that the content of genes present in sequenced genomes does not seem to correlate with the complexity of the organisms. However, various studies have shown that organism complexity and the size of the proteome has, indeed, a significant correlation. This characteristic allows us to postulate that some molecular mechanisms have permitted a greater functional diversity to some proteins to increase their participation in developing organisms with higher complexity. Among those mechanisms, the domain promiscuity, defined as the ability of the domains to organize in combination with other distinct domains, is of great importance for the evolution of organisms. Previous works have analyzed the degree of domain promiscuity of the proteomes showing how it seems to have paralleled the evolution of eukaryotic organisms. The latter has motivated the present study, where we analyzed the domain promiscuity in a collection of 84 eukaryotic proteomes representative of all the taxonomy groups of the tree of life. Using a grammar definition approach, we determined the architecture of 1,223,227 proteins, conformed by 2,296,371 domains, which established 839,184 bigram types. The phylogenetic reconstructions based on differences in the content of information from measures of proteome promiscuity confirm that the evolution of the promiscuity of domains in eukaryotic organisms resembles the evolutionary history of the species. However, a close analysis of the PHD and RING domains, the most promiscuous domains found in fungi and functional components of chromatin remodeling enzymes and important expression regulators, suggests an evolution according to their function.
Collapse
Affiliation(s)
- Jazmín Murcia-Garzón
- Laboratorio de Biotecnología Vegetal, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Boulevard del Maestro S/N esq. Elías Piña, Col. Narciso Mendoza, 88710, Reynosa, Tamaulipas, Mexico
| | - Alfonso Méndez-Tenorio
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Prol. de Carpio y Plan de Ayala s/n, Col. Santo Tomás, 11340, Mexico City, Mexico.
| |
Collapse
|
36
|
Mansoor M, Nauman M, Ur Rehman H, Benso A. Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction. Soft comput 2022. [DOI: 10.1007/s00500-021-06707-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
37
|
Song B, Luo X, Luo X, Liu Y, Niu Z, Zeng X. Learning spatial structures of proteins improves protein-protein interaction prediction. Brief Bioinform 2022; 23:6501351. [PMID: 35018418 DOI: 10.1093/bib/bbab558] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 12/07/2021] [Accepted: 12/07/2021] [Indexed: 01/09/2023] Open
Abstract
Spatial structures of proteins are closely related to protein functions. Integrating protein structures improves the performance of protein-protein interaction (PPI) prediction. However, the limited quantity of known protein structures restricts the application of structure-based prediction methods. Utilizing the predicted protein structure information is a promising method to improve the performance of sequence-based prediction methods. We propose a novel end-to-end framework, TAGPPI, to predict PPIs using protein sequence alone. TAGPPI extracts multi-dimensional features by employing 1D convolution operation on protein sequences and graph learning method on contact maps constructed from AlphaFold. A contact map contains abundant spatial structure information, which is difficult to obtain from 1D sequence data directly. We further demonstrate that the spatial information learned from contact maps improves the ability of TAGPPI in PPI prediction tasks. We compare the performance of TAGPPI with those of nine state-of-the-art sequence-based methods, and TAGPPI outperforms such methods in all metrics. To the best of our knowledge, this is the first method to use the predicted protein topology structure graph for sequence-based PPI prediction. More importantly, our proposed architecture could be extended to other prediction tasks related to proteins.
Collapse
Affiliation(s)
- Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China
| | - Xiaoyan Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China.,MindRank AI ltd., Hangzhou, 311113, Zhejiang, China
| | - Xiaoli Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China.,BioMap, Haidian, 100089, Beijing, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China
| | | | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, Hunan, China
| |
Collapse
|
38
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
39
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
40
|
Filho JAF, Rosolen RR, Almeida DA, de Azevedo PHC, Motta MLL, Aono AH, dos Santos CA, Horta MAC, de Souza AP. Trends in biological data integration for the selection of enzymes and transcription factors related to cellulose and hemicellulose degradation in fungi. 3 Biotech 2021; 11:475. [PMID: 34777932 PMCID: PMC8548487 DOI: 10.1007/s13205-021-03032-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
Fungi are key players in biotechnological applications. Although several studies focusing on fungal diversity and genetics have been performed, many details of fungal biology remain unknown, including how cellulolytic enzymes are modulated within these organisms to allow changes in main plant cell wall compounds, cellulose and hemicellulose, and subsequent biomass conversion. With the advent and consolidation of DNA/RNA sequencing technology, different types of information can be generated at the genomic, structural and functional levels, including the gene expression profiles and regulatory mechanisms of these organisms, during degradation-induced conditions. This increase in data generation made rapid computational development necessary to deal with the large amounts of data generated. In this context, the origination of bioinformatics, a hybrid science integrating biological data with various techniques for information storage, distribution and analysis, was a fundamental step toward the current state-of-the-art in the postgenomic era. The possibility of integrating biological big data has facilitated exciting discoveries, including identifying novel mechanisms and more efficient enzymes, increasing yields, reducing costs and expanding opportunities in the bioprocess field. In this review, we summarize the current status and trends of the integration of different types of biological data through bioinformatics approaches for biological data analysis and enzyme selection.
Collapse
Affiliation(s)
- Jaire A. Ferreira Filho
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Rafaela R. Rosolen
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Deborah A. Almeida
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Paulo Henrique C. de Azevedo
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Maria Lorenza L. Motta
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Alexandre H. Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Clelton A. dos Santos
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Brazilian Biorenewables National Laboratory (LNBR), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, SP Brazil
| | - Maria Augusta C. Horta
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Faculty of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP Brazil
| | - Anete P. de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Department of Plant Biology, Institute of Biology, UNICAMP, Universidade Estadual de Campinas, Campinas, SP 13083-875 Brazil
| |
Collapse
|
41
|
Huang LC, Taujale R, Gravel N, Venkat A, Yeung W, Byrne DP, Eyers PA, Kannan N. KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases. BMC Bioinformatics 2021; 22:446. [PMID: 34537014 PMCID: PMC8449880 DOI: 10.1186/s12859-021-04358-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Protein kinases are among the largest druggable family of signaling proteins, involved in various human diseases, including cancers and neurodegenerative disorders. Despite their clinical relevance, nearly 30% of the 545 human protein kinases remain highly understudied. Comparative genomics is a powerful approach for predicting and investigating the functions of understudied kinases. However, an incomplete knowledge of kinase orthologs across fully sequenced kinomes severely limits the application of comparative genomics approaches for illuminating understudied kinases. Here, we introduce KinOrtho, a query- and graph-based orthology inference method that combines full-length and domain-based approaches to map one-to-one kinase orthologs across 17 thousand species. RESULTS Using multiple metrics, we show that KinOrtho performed better than existing methods in identifying kinase orthologs across evolutionarily divergent species and eliminated potential false positives by flagging sequences without a proper kinase domain for further evaluation. We demonstrate the advantage of using domain-based approaches for identifying domain fusion events, highlighting a case between an understudied serine/threonine kinase TAOK1 and a metabolic kinase PIK3C2A with high co-expression in human cells. We also identify evolutionary fission events involving the understudied OBSCN kinase domains, further highlighting the value of domain-based orthology inference approaches. Using KinOrtho-defined orthologs, Gene Ontology annotations, and machine learning, we propose putative biological functions of several understudied kinases, including the role of TP53RK in cell cycle checkpoint(s), the involvement of TSSK3 and TSSK6 in acrosomal vesicle localization, and potential functions for the ULK4 pseudokinase in neuronal development. CONCLUSIONS In sum, KinOrtho presents a novel query-based tool to identify one-to-one orthologous relationships across thousands of proteomes that can be applied to any protein family of interest. We exploit KinOrtho here to identify kinase orthologs and show that its well-curated kinome ortholog set can serve as a valuable resource for illuminating understudied kinases, and the KinOrtho framework can be extended to any protein-family of interest.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Rahil Taujale
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Nathan Gravel
- PREP@UGA, University of Georgia, 500 D.W. Brooks Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Dominic P. Byrne
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Patrick A. Eyers
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
42
|
Ren J, Chen S, Ye F, Gong X, Lu Y, Cai Q, Chen Y. Exploration of differentially-expressed exosomal mRNAs, lncRNAs and circRNAs from serum samples of gallbladder cancer and xantho-granulomatous cholecystitis patients. Bioengineered 2021; 12:6134-6143. [PMID: 34486489 PMCID: PMC8806659 DOI: 10.1080/21655979.2021.1972780] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Gallbladder cancer (GBC) is the most common biliary tract malignancy worldwide. Although a growing number of studies have explored the mechanism of GBC, thus far, few molecules have been discovered that can be utilized as specific biomarkers for the early diagnosis and therapeutic treatment of GBC. Recent studies have shown that exosomes not only participate in the progression of tumors, but also carry specific information that can define multiple cancer types. The present study investigated the expression profiles of coding (or messenger) ribonucleic acids (mRNAs) and non-coding RNAs (ncRNAs, including long non-coding RNAs [lncRNAs] and circular RNAs [circRNAs]) in plasma-derived exosomes from GBC patients. Using high-throughput RNA sequencing and subsequent bioinformatic analysis, a number of differentially expressed (DE) mRNAs, lncRNAs, and circRNAs were identified in GBC exosomes, compared to their expressions in xantho-granulomatous cholecystitis (XGC) exosomes. Gene Ontology (GO) and Kyoto Encyclopedia of Gene and Genome (KEGG) analyses were then conducted to investigate the potential functions of these DE RNAs. Furthermore, the interaction networks and competing endogenous RNA networks of these DE RNAs and their target genes were investigated, revealing a complex regulatory network among mRNAs and ncRNAs. In summary, this study demonstrates the diagnostic value of plasma-derived exosomes in GBC and provides a new perspective on the mechanism of GBC.
Collapse
Affiliation(s)
- Jiajun Ren
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Sheng Chen
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Feng Ye
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Xiaoyong Gong
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Ye Lu
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Qiang Cai
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Yongjun Chen
- Department of General Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| |
Collapse
|
43
|
Vicedomini R, Blachon C, Oteri F, Carbone A. MyCLADE: a multi-source domain annotation server for sequence functional exploration. Nucleic Acids Res 2021; 49:W452-W458. [PMID: 34023906 PMCID: PMC8262732 DOI: 10.1093/nar/gkab395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/27/2021] [Accepted: 04/29/2021] [Indexed: 11/13/2022] Open
Abstract
The ever-increasing number of genomic and metagenomic sequences accumulating in our databases requires accurate approaches to explore their content against specific domain targets. MyCLADE is a user-friendly webserver designed for targeted functional profiling of genomic and metagenomic sequences based on a database of a few million probabilistic models of Pfam domains. It uses the MetaCLADE multi-source domain annotation strategy, modelling domains based on multiple probabilistic profiles. MyCLADE takes a list of protein sequences and possibly a target set of domains/clans as input and, for each sequence, it provides a domain architecture built from the targeted domains or from all Pfam domains. It is linked to the Pfam and QuickGO databases in multiple ways for easy retrieval of domain and clan information. E-value, bit-score, domain-dependent probability scores and logos representing the match of the model with the sequence are provided to help the user to assess the quality of each annotation. Availability and implementation: MyCLADE is freely available at http://www.lcqb.upmc.fr/myclade.
Collapse
Affiliation(s)
- Riccardo Vicedomini
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
- Sorbonne Université, CNRS, Institut des Sciences du Calcul et des Données (ISCD), France
| | - Clémence Blachon
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
| | - Francesco Oteri
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
| |
Collapse
|
44
|
Abstract
![]()
TYW1 is a radical S-adenosyl-l-methionine
(SAM) enzyme that catalyzes the condensation of pyruvate and N-methylguanosine-containing tRNAPhe, forming
4-demethylwyosine-containing tRNAPhe. Homologues of TYW1
are found in both archaea and eukarya; archaeal homologues consist
of a single domain, while eukaryal homologues contain a flavin binding
domain in addition to the radical SAM domain shared with archaeal
homologues. In this study, TYW1 from Saccharomyces cerevisiae (ScTYW1) was heterologously expressed in Escherichia coli and purified to homogeneity. ScTYW1 is purified with 0.54 ± 0.07 and 4.2 ± 1.9 equiv of
flavin mononucleotide (FMN) and iron, respectively, per mole of protein,
suggesting the protein is ∼50% replete with Fe–S clusters
and FMN. While both NADPH and NADH are sufficient for activity, significantly
more product is observed when used in combination with flavin nucleotides. ScTYW1 is the first example of a radical SAM flavoenzyme
that is active with NAD(P)H alone.
Collapse
Affiliation(s)
- Anthony P Young
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Vahe Bandarian
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| |
Collapse
|
45
|
Makita Y, Suzuki S, Fushimi K, Shimada S, Suehisa A, Hirata M, Kuriyama T, Kurihara Y, Hamasaki H, Okubo-Kurihara E, Yoshitake K, Watanabe T, Sakuta M, Gojobori T, Sakami T, Narikawa R, Yamaguchi H, Kawachi M, Matsui M. Identification of a dual orange/far-red and blue light photoreceptor from an oceanic green picoplankton. Nat Commun 2021; 12:3593. [PMID: 34135337 PMCID: PMC8209157 DOI: 10.1038/s41467-021-23741-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 05/11/2021] [Indexed: 11/09/2022] Open
Abstract
Photoreceptors are conserved in green algae to land plants and regulate various developmental stages. In the ocean, blue light penetrates deeper than red light, and blue-light sensing is key to adapting to marine environments. Here, a search for blue-light photoreceptors in the marine metagenome uncover a chimeric gene composed of a phytochrome and a cryptochrome (Dualchrome1, DUC1) in a prasinophyte, Pycnococcus provasolii. DUC1 detects light within the orange/far-red and blue spectra, and acts as a dual photoreceptor. Analyses of its genome reveal the possible mechanisms of light adaptation. Genes for the light-harvesting complex (LHC) are duplicated and transcriptionally regulated under monochromatic orange/blue light, suggesting P. provasolii has acquired environmental adaptability to a wide range of light spectra and intensities.
Collapse
Affiliation(s)
- Yuko Makita
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Shigekatsu Suzuki
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Keiji Fushimi
- Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
- Research Institute of Green Science and Technology, Shizuoka University, Shizuoka, Japan
- Core Research for Evolutional Science and Technology, Japan Science and Technology Agency, Saitama, Japan
| | - Setsuko Shimada
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Aya Suehisa
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Manami Hirata
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Tomoko Kuriyama
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Yukio Kurihara
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Hidefumi Hamasaki
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
- Yokohama City University, Kihara Institute for Biological Research, Yokohama, Japan
| | - Emiko Okubo-Kurihara
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Tsuyoshi Watanabe
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, Kushiro, Hokkaido, Japan
| | - Masaaki Sakuta
- Department of Biological Sciences, Ochanomizu University, Tokyo, Japan
| | - Takashi Gojobori
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Tomoko Sakami
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, Minami-ise, Mie, Japan
| | - Rei Narikawa
- Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
- Research Institute of Green Science and Technology, Shizuoka University, Shizuoka, Japan
- Core Research for Evolutional Science and Technology, Japan Science and Technology Agency, Saitama, Japan
- Department of Biological Sciences, Graduate School of Science, Tokyo Metropolitan University, Tokyo, Japan
| | - Haruyo Yamaguchi
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Masanobu Kawachi
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Minami Matsui
- Synthetic Genomics Research Group, RIKEN Center for Sustainable Resource Science, Yokohama, Japan.
- Yokohama City University, Kihara Institute for Biological Research, Yokohama, Japan.
| |
Collapse
|
46
|
Wang X, Zhang Y, Yu B, Salhi A, Chen R, Wang L, Liu Z. Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Comput Biol Med 2021; 134:104516. [PMID: 34119922 DOI: 10.1016/j.compbiomed.2021.104516] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 05/24/2021] [Accepted: 05/24/2021] [Indexed: 12/22/2022]
Abstract
Predicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and time-consuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISP-XGBoost method can further enhance the prediction of PPI sites.
Collapse
Affiliation(s)
- Xue Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yaqun Zhang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| | - Adil Salhi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Ruixin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Lin Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Zengfeng Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| |
Collapse
|
47
|
Czibula G, Albu AI, Bocicor MI, Chira C. AutoPPI: An Ensemble of Deep Autoencoders for Protein-Protein Interaction Prediction. ENTROPY 2021; 23:e23060643. [PMID: 34064042 PMCID: PMC8223997 DOI: 10.3390/e23060643] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/08/2021] [Accepted: 05/19/2021] [Indexed: 01/06/2023]
Abstract
Proteins are essential molecules, that must correctly perform their roles for the good health of living organisms. The majority of proteins operate in complexes and the way they interact has pivotal influence on the proper functioning of such organisms. In this study we address the problem of protein–protein interaction and we propose and investigate a method based on the use of an ensemble of autoencoders. Our approach, entitled AutoPPI, adopts a strategy based on two autoencoders, one for each type of interactions (positive and negative) and we advance three types of neural network architectures for the autoencoders. Experiments were performed on several data sets comprising proteins from four different species. The results indicate good performances of our proposed model, with accuracy and AUC values of over 0.97 in all cases. The best performing model relies on a Siamese architecture in both the encoder and the decoder, which advantageously captures common features in protein pairs. Comparisons with other machine learning techniques applied for the same problem prove that AutoPPI outperforms most of its contenders, for the considered data sets.
Collapse
|
48
|
Creamer KE, Kudo Y, Moore BS, Jensen PR. Phylogenetic analysis of the salinipostin γ-butyrolactone gene cluster uncovers new potential for bacterial signalling-molecule diversity. Microb Genom 2021; 7:000568. [PMID: 33979276 PMCID: PMC8209734 DOI: 10.1099/mgen.0.000568] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 03/24/2021] [Indexed: 12/19/2022] Open
Abstract
Bacteria communicate by small-molecule chemicals that facilitate intra- and inter-species interactions. These extracellular signalling molecules mediate diverse processes including virulence, bioluminescence, biofilm formation, motility and specialized metabolism. The signalling molecules produced by members of the phylum Actinobacteria generally comprise γ-butyrolactones, γ-butenolides and furans. The best-known actinomycete γ-butyrolactone is A-factor, which triggers specialized metabolism and morphological differentiation in the genus Streptomyces . Salinipostins A–K are unique γ-butyrolactone molecules with rare phosphotriester moieties that were recently characterized from the marine actinomycete genus Salinispora . The production of these compounds has been linked to the nine-gene biosynthetic gene cluster (BGC) spt . Critical to salinipostin assembly is the γ-butyrolactone synthase encoded by spt9 . Here, we report the surprising distribution of spt9 homologues across 12 bacterial phyla, the majority of which are not known to produce γ-butyrolactones. Further analyses uncovered a large group of spt -like gene clusters outside of the genus Salinispora , suggesting the production of new salinipostin-like diversity. These gene clusters show evidence of horizontal transfer and location-specific recombination among Salinispora strains. The results suggest that γ-butyrolactone production may be more widespread than previously recognized. The identification of new γ-butyrolactone BGCs is the first step towards understanding the regulatory roles of the encoded small molecules in Actinobacteria.
Collapse
Affiliation(s)
- Kaitlin E. Creamer
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Yuta Kudo
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
- Present address: Frontier Research Institute for Interdisciplinary Sciences, Japan Graduate School of Agricultural Science, Tohoku University, Sendai, Miyagi, Japan
| | - Bradley S. Moore
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Paul R. Jensen
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
49
|
Fluorescence resonance energy transfer in revealing protein-protein interactions in living cells. Emerg Top Life Sci 2021; 5:49-59. [PMID: 33856021 DOI: 10.1042/etls20200337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 02/22/2021] [Accepted: 03/04/2021] [Indexed: 11/17/2022]
Abstract
Genes are expressed to proteins for a wide variety of fundamental biological processes at the cellular and organismal levels. However, a protein rarely functions alone, but rather acts through interactions with other proteins to maintain normal cellular and organismal functions. Therefore, it is important to analyze the protein-protein interactions to determine functional mechanisms of proteins, which can also guide to develop therapeutic targets for treatment of diseases caused by altered protein-protein interactions leading to cellular/organismal dysfunctions. There is a large number of methodologies to study protein interactions in vitro, in vivo and in silico, which led to the development of many protein interaction databases, and thus, have enriched our knowledge about protein-protein interactions and functions. However, many of these interactions were identified in vitro, but need to be verified/validated in living cells. Furthermore, it is unclear whether these interactions are direct or mediated via other proteins. Moreover, these interactions are representative of cell- and time-average, but not a single cell in real time. Therefore, it is crucial to detect direct protein-protein interactions in a single cell during biological processes in vivo, towards understanding the functional mechanisms of proteins in living cells. Importantly, a fluorescence resonance energy transfer (FRET)-based methodology has emerged as a powerful technique to decipher direct protein-protein interactions at a single cell resolution in living cells, which is briefly described in a limited available space in this mini-review.
Collapse
|
50
|
An integrated deep learning and dynamic programming method for predicting tumor suppressor genes, oncogenes, and fusion from PDB structures. Comput Biol Med 2021; 133:104323. [PMID: 33934067 DOI: 10.1016/j.compbiomed.2021.104323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 02/18/2021] [Accepted: 03/07/2021] [Indexed: 11/20/2022]
Abstract
Mutations in proto-oncogenes (ONGO) and the loss of regulatory function of tumor suppression genes (TSG) are the common underlying mechanism for uncontrolled tumor growth. While cancer is a heterogeneous complex of distinct diseases, finding the potentiality of the genes related functionality to ONGO or TSG through computational studies can help develop drugs that target the disease. This paper proposes a classification method that starts with a preprocessing stage to extract the feature map sets from the input 3D protein structural information. The next stage is a deep convolutional neural network stage (DCNN) that outputs the probability of functional classification of genes. We explored and tested two approaches: in Approach 1, all filtered and cleaned 3D-protein-structures (PDB) are pooled together, whereas in Approach 2, the primary structures and their corresponding PDBs are separated according to the genes' primary structural information. Following the DCNN stage, a dynamic programming-based method is used to determine the final prediction of the primary structures' functionality. We validated our proposed method using the COSMIC online database. For the ONGO vs TSG classification problem the AUROC of the DCNN stage for Approach 1 and Approach 2 DCNN are 0.978 and 0.765, respectively. The AUROCs of the final genes' primary structure functionality classification for Approach 1 and Approach 2 are 0.989, and 0.879, respectively. For comparison, the current state-of-the-art reported AUROC is 0.924. Our results warrant further study to apply the deep learning models to humans' (GRCh38) genes, for predicting their corresponding probabilities of functionality in the cancer drivers.
Collapse
|