1
|
Reinar WB, Krabberød AK, Lalun VO, Butenko MA, Jakobsen KS. Short tandem repeats delineate gene bodies across eukaryotes. Nat Commun 2024; 15:10902. [PMID: 39738068 DOI: 10.1038/s41467-024-55276-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 12/05/2024] [Indexed: 01/01/2025] Open
Abstract
Short tandem repeats (STRs) have emerged as important and hypermutable sites where genetic variation correlates with gene expression in plant and animal systems. Recently, it has been shown that a broad range of transcription factors (TFs) are affected by STRs near or in the DNA target binding site. Despite this, the distribution of STR motif repetitiveness in eukaryote genomes is still largely unknown. Here, we identify monomer and dimer STR motif repetitiveness in 5.1 billion 10-bp windows upstream of translation starts and downstream of translation stops in 25 million genes spanning 1270 species across the eukaryotic Tree of Life. We report that all surveyed genomes have gene-proximal shifts in motif repetitiveness. Within genomes, variation in gene-proximal repetitiveness landscapes correlated to the function of genes; genes with housekeeping functions were depleted in upstream and downstream repetitiveness. Furthermore, the repetitiveness landscapes correlated with TF binding sites, indicating that gene function has evolved in conjunction with cis-regulatory STRs and TFs that recognize repetitive sites. These results suggest that the hypermutability inherent to STRs is canalized along the genome sequence and contributes to regulatory and eco-evolutionary dynamics in all eukaryotes.
Collapse
Affiliation(s)
- William B Reinar
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway.
| | - Anders K Krabberød
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Vilde O Lalun
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Melinka A Butenko
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
2
|
Zhao H, Shao X, Guo M, Xing Y, Wang J, Luo L, Cai L. Competitive Chemical Reaction Kinetic Model of Nucleosome Assembly Using the Histone Variant H2A.Z and H2A In Vitro. Int J Mol Sci 2023; 24:15846. [PMID: 37958827 PMCID: PMC10647764 DOI: 10.3390/ijms242115846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/19/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
Nucleosomes not only serve as the basic building blocks for eukaryotic chromatin but also regulate many biological processes, such as DNA replication, repair, and recombination. To modulate gene expression in vivo, the histone variant H2A.Z can be dynamically incorporated into the nucleosome. However, the assembly dynamics of H2A.Z-containing nucleosomes remain elusive. Here, we demonstrate that our previous chemical kinetic model for nucleosome assembly can be extended to H2A.Z-containing nucleosome assembly processes. The efficiency of H2A.Z-containing nucleosome assembly, like that of canonical nucleosome assembly, was also positively correlated with the total histone octamer concentration, reaction rate constant, and reaction time. We expanded the kinetic model to represent the competitive dynamics of H2A and H2A.Z in nucleosome assembly, thus providing a novel method through which to assess the competitive ability of histones to assemble nucleosomes. Based on this model, we confirmed that histone H2A has a higher competitive ability to assemble nucleosomes in vitro than histone H2A.Z. Our competitive kinetic model and experimental results also confirmed that in vitro H2A.Z-containing nucleosome assembly is governed by chemical kinetic principles.
Collapse
Affiliation(s)
- Hongyu Zhao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
- Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Xueqin Shao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
| | - Mingxin Guo
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
| | - Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
- Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Jingyan Wang
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
- Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Liaofu Luo
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
- Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Lu Cai
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China; (H.Z.); (X.S.); (M.G.); (Y.X.); (J.W.); (L.L.)
- Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou 014010, China
| |
Collapse
|
3
|
Sievers A, Sauer L, Bisch M, Sprengel J, Hausmann M, Hildenbrand G. Moderation of Structural DNA Properties by Coupled Dinucleotide Contents in Eukaryotes. Genes (Basel) 2023; 14:genes14030755. [PMID: 36981025 PMCID: PMC10048725 DOI: 10.3390/genes14030755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/16/2023] [Accepted: 03/18/2023] [Indexed: 03/30/2023] Open
Abstract
Dinucleotides are known as determinants for various structural and physiochemical properties of DNA and for binding affinities of proteins to DNA. These properties (e.g., stiffness) and bound proteins (e.g., transcription factors) are known to influence important biological functions, such as transcription regulation and 3D chromatin organization. Accordingly, the question arises of how the considerable variations in dinucleotide contents of eukaryotic chromosomes could still provide consistent DNA properties resulting in similar functions and 3D conformations. In this work, we investigate the hypothesis that coupled dinucleotide contents influence DNA properties in opposite directions to moderate each other's influences. Analyzing all 2478 chromosomes of 155 eukaryotic species, considering bias from coding sequences and enhancers, we found sets of correlated and anti-correlated dinucleotide contents. Using computational models, we estimated changes of DNA properties resulting from this coupling. We found that especially pure A/T dinucleotides (AA, TT, AT, TA), known to influence histone positioning and AC/GT contents, are relevant moderators and that, e.g., the Roll property, which is known to influence histone affinity of DNA, is preferably moderated. We conclude that dinucleotide contents might indirectly influence transcription and chromatin 3D conformation, via regulation of histone occupancy and/or other mechanisms.
Collapse
Affiliation(s)
- Aaron Sievers
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
- Institute for Human Genetics, University Hospital Heidelberg, INF 366, 69117 Heidelberg, Germany
| | - Liane Sauer
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
- Institute for Human Genetics, University Hospital Heidelberg, INF 366, 69117 Heidelberg, Germany
| | - Marc Bisch
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
| | - Jan Sprengel
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
| | - Michael Hausmann
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
| | - Georg Hildenbrand
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
- Faculty of Engeneering, University of Applied Science Aschaffenburg, Würzburger Str. 45, 63743 Aschaffenburg, Germany
| |
Collapse
|
4
|
Zhao H, Guo M, Zhang F, Shao X, Liu G, Xing Y, Zhao X, Luo L, Cai L. Nucleosome Assembly and Disassembly in vitro Are Governed by Chemical Kinetic Principles. Front Cell Dev Biol 2021; 9:762571. [PMID: 34692710 PMCID: PMC8529108 DOI: 10.3389/fcell.2021.762571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 09/17/2021] [Indexed: 12/05/2022] Open
Abstract
As the elementary unit of eukaryotic chromatin, nucleosomes in vivo are highly dynamic in many biological processes, such as DNA replication, repair, recombination, or transcription, to allow the necessary factors to gain access to their substrate. The dynamic mechanism of nucleosome assembly and disassembly has not been well described thus far. We proposed a chemical kinetic model of nucleosome assembly and disassembly in vitro. In the model, the efficiency of nucleosome assembly was positively correlated with the total concentration of histone octamer, reaction rate constant and reaction time. All the corollaries of the model were well verified for the Widom 601 sequence and the six artificially synthesized DNA sequences, named CS1–CS6, by using the salt dialysis method in vitro. The reaction rate constant in the model may be used as a new parameter to evaluate the nucleosome reconstitution ability with DNAs. Nucleosome disassembly experiments for the Widom 601 sequence detected by Förster resonance energy transfer (FRET) and fluorescence thermal shift (FTS) assays demonstrated that nucleosome disassembly is the inverse process of assembly and can be described as three distinct stages: opening phase of the (H2A–H2B) dimer/(H3–H4)2 tetramer interface, release phase of the H2A–H2B dimers from (H3–H4)2 tetramer/DNA and removal phase of the (H3–H4)2 tetramer from DNA. Our kinetic model of nucleosome assembly and disassembly allows to confirm that nucleosome assembly and disassembly in vitro are governed by chemical kinetic principles.
Collapse
Affiliation(s)
- Hongyu Zhao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Mingxin Guo
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Fenghui Zhang
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Xueqin Shao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Guoqing Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Xiujuan Zhao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Liaofu Luo
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Lu Cai
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| |
Collapse
|
5
|
Khatun MS, Alam MA, Shoombuatong W, Mollah MNH, Kurata H, Hasan MM. Recent development of bioinformatics tools for microRNA target prediction. Curr Med Chem 2021; 29:865-880. [PMID: 34348604 DOI: 10.2174/0929867328666210804090224] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 06/10/2021] [Accepted: 06/15/2021] [Indexed: 11/22/2022]
Abstract
MicroRNAs (miRNAs) are central players that regulate the post-transcriptional processes of gene expression. Binding of miRNAs to target mRNAs can repress their translation by inducing the degradation or by inhibiting the translation of the target mRNAs. High-throughput experimental approaches for miRNA target identification are costly and time-consuming, depending on various factors. It is vitally important to develop the bioinformatics methods for accurately predicting miRNA targets. With the increase of RNA sequences in the post-genomic era, bioinformatics methods are being developed for miRNA studies specially for miRNA target prediction. This review summarizes the current development of state-of-the-art bioinformatics tools for miRNA target prediction, points out the progress and limitations of the available miRNA databases, and their working principles. Finally, we discuss the caveat and perspectives of the next-generation algorithms for the prediction of miRNA targets.
Collapse
Affiliation(s)
- Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502. Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112. United States
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700. Thailand
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh. 5Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083. Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502. Japan
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502. Japan
| |
Collapse
|
6
|
Liu G, Song S, Zhang Q, Dong B, Sun Y, Liu G, Zhao X. Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae. Front Genet 2021; 12:705038. [PMID: 34267784 PMCID: PMC8276760 DOI: 10.3389/fgene.2021.705038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/07/2021] [Indexed: 11/16/2022] Open
Abstract
Characterization and identification of recombination hotspots provide important insights into the mechanism of recombination and genome evolution. In contrast with existing sequence-based models for predicting recombination hotspots which were defined in a ORF-based manner, here, we first defined recombination hot/cold spots based on public high-resolution Spo11-oligo-seq data, then characterized them in terms of DNA sequence and epigenetic marks, and finally presented classifiers to identify hotspots. We found that, in addition to some previously discovered DNA-based features like GC-skew, recombination hotspots in yeast can also be characterized by some remarkable features associated with DNA physical properties and shape. More importantly, by using DNA-based features and several epigenetic marks, we built several classifiers to discriminate hotspots from coldspots, and found that SVM classifier performs the best with an accuracy of ∼92%, which is also the highest among the models in comparison. Feature importance analysis combined with prediction results show that epigenetic marks and variation of sequence-based features along the hotspots contribute dominantly to hotspot identification. By using incremental feature selection method, an optimal feature subset that consists of much less features was obtained without sacrificing prediction accuracy.
Collapse
Affiliation(s)
- Guoqing Liu
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genomics and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Shuangjian Song
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Qiguo Zhang
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Biyu Dong
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Yu Sun
- School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Guojun Liu
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genomics and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Xiujuan Zhao
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genomics and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| |
Collapse
|
7
|
Liu G, Zhao H, Meng H, Xing Y, Cai L. A deformation energy model reveals sequence-dependent property of nucleosome positioning. Chromosoma 2021; 130:27-40. [PMID: 33452566 PMCID: PMC7889546 DOI: 10.1007/s00412-020-00750-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 12/24/2020] [Accepted: 12/29/2020] [Indexed: 11/18/2022]
Abstract
We present a deformation energy model for predicting nucleosome positioning, in which a position-dependent structural parameter set derived from crystal structures of nucleosomes was used to calculate the DNA deformation energy. The model is successful in predicting nucleosome occupancy genome-wide in budding yeast, nucleosome free energy, and rotational positioning of nucleosomes. Our model also indicates that the genomic regions underlying the MNase-sensitive nucleosomes in budding yeast have high deformation energy and, consequently, low nucleosome-forming ability, while the MNase-sensitive non-histone particles are characterized by much lower DNA deformation energy and high nucleosome preference. In addition, we also revealed that remodelers, SNF2 and RSC8, are likely to act in chromatin remodeling by binding to broad nucleosome-depleted regions that are intrinsically favorable for nucleosome positioning. Our data support the important role of position-dependent physical properties of DNA in nucleosome positioning.
Collapse
Affiliation(s)
- Guoqing Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
- Inner Mongolia Key Lab of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
| | - Hongyu Zhao
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
- Inner Mongolia Key Lab of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Hu Meng
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
- Inner Mongolia Key Lab of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
- Inner Mongolia Key Lab of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Lu Cai
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
- Inner Mongolia Key Lab of Functional Genome Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| |
Collapse
|
8
|
Liu G, Zhao H, Meng H, Xing Y, Yang H, Lin H. Deform-nu: A DNA Deformation Energy-Based Predictor for Nucleosome Positioning. Front Cell Dev Biol 2021; 8:596341. [PMID: 33425904 PMCID: PMC7785812 DOI: 10.3389/fcell.2020.596341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 11/26/2020] [Indexed: 12/01/2022] Open
Abstract
The structure and function of chromatin can be regulated through positioning patterns of nucleosomes. DNA-based processes are regulated via nucleosomes. Therefore, it is significant to determine nucleosome positions in DNA-based processes. A deformation energy model was proposed to predict nucleosome positions in our previous study. A free web server based on the model (http://lin-group.cn/server/deform-nu/) was firstly established to estimate the occupancy and rotational positioning of nucleosomes in the study. Then, the performance of the model was verified by several examples. The results indicated that nucleosome positioning relied on the physical properties of DNA, such as deformation energy.
Collapse
Affiliation(s)
- Guoqing Liu
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hongyu Zhao
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hu Meng
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Yongqiang Xing
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hui Yang
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
9
|
Li Z, Guan Y, Yuan X, Zheng P, Zhu H. Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method. PLoS One 2019; 14:e0214442. [PMID: 30943219 PMCID: PMC6447165 DOI: 10.1371/journal.pone.0214442] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 03/13/2019] [Indexed: 01/08/2023] Open
Abstract
Identifying protein coding regions in DNA sequences by computational methods is an active research topic. Welan gum produced by Sphingomonas sp. WG has great application potential in oil recovery and concrete construction industry. Predicting the coding regions in the Sphingomonas sp. WG genome and addressing the mechanism underlying the explanation for the synthesis of Welan gum metabolism is an important issue at present. In this study, we apply a self adaptive spectral rotation (SASR, for short) method, which is based on the investigation of the Triplet Periodicity property, to predict the coding regions of the whole-genome data of Sphingomonas sp. WG without any previous training process, and 1115 suspected gene fragments are obtained. Suspected gene fragments are subjected to a similarity search against the non-redundant protein sequences (nr) database of NCBI with blastx, and 762 suspected gene fragments have been labeled as genes in the nr database.
Collapse
Affiliation(s)
- Zhongwei Li
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, Shandong, China
| | - Yanan Guan
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, Shandong, China
| | - Xiang Yuan
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, Shandong, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, Christchurch, New Zealand
| | - Hu Zhu
- College of Chemistry and Materials, Fujian Normal University, Fuzhou, China
| |
Collapse
|
10
|
Wang J, Li J, Yang B, Xie R, Marquez-Lago TT, Leier A, Hayashida M, Akutsu T, Zhang Y, Chou KC, Selkrig J, Zhou T, Song J, Lithgow T. Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics 2018; 35:2017-2028. [PMID: 30388198 PMCID: PMC7963071 DOI: 10.1093/bioinformatics/bty914] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 10/15/2018] [Accepted: 10/31/2018] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. RESULTS In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. AVAILABILITY AND IMPLEMENTATION http://bastion3.erc.monash.edu/. CONTACT selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| | - Jiahui Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Bingjiao Yang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Morihiro Hayashida
- National Institute of Technology, Matsue College, Matsue, Shimane, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Selkrig
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Tieli Zhou
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | | | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|