1
|
Rane RV, Oakeshott JG, Nguyen T, Hoffmann AA, Lee SF. Orthonome - a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes. BMC Genomics 2017; 18:673. [PMID: 28859620 PMCID: PMC5580312 DOI: 10.1186/s12864-017-4079-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 08/21/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge. RESULTS Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophila genomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com . CONCLUSION We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought.
Collapse
Affiliation(s)
- Rahul V Rane
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia. .,CSIRO, Canberra, Australian Capital Territory, Australia.
| | | | - Thu Nguyen
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Ary A Hoffmann
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Siu F Lee
- CSIRO, Canberra, Australian Capital Territory, Australia.,Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
2
|
Xu J, Li Y, Wang Y, Liu X, Zhu XG. Altered expression profiles of microRNA families during de-etiolation of maize and rice leaves. BMC Res Notes 2017; 10:108. [PMID: 28235420 PMCID: PMC5324284 DOI: 10.1186/s13104-016-2367-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 12/28/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are highly conserved small non-coding RNAs that play important regulatory roles in plants. Although many miRNA families are sequentially and functionally conserved across plant kingdoms (Dezulian et al. in Genome Biol 13, 2005), they still differ in many aspects such as family size, average length, genomic loci etc. (Unver et al. in Int J Plant Genomics, 2009). RESULTS In this study, we investigated changes of miRNA expression profiles during greening process of etiolated seedlings of Oryza sativa (C3) and Zea mays (C4) to explore conserved and species-specific characteristics of miRNAs between these two species. Futhermore, we predicted 47 and 42 candidate novel miRNAs using parameterized monocot specific miRDeep2 pipeline in maize and rice respectively. Potential targets of miRNAs comprising both mRNA and long non-coding RNA (lncRNA) were examined to clarify potential regulation of photosynthesis. Based on our result, two putative positive Kranz regulators reported by Wang et al. (2010) were predicted as potential targets of miR156. A few photosynthesis related genes such as sulfate adenylytransferase (APS3), chlorophyll a/b binding family protein etc. were suggested to be regulated by miRNAs. However, no C4 shuttle genes were predicted to be direct targets of either known or candidate novel miRNAs. CONCLUSIONS This study provided the comprehensive list of miRNA that showed altered expression during the de-etiolation process and a number of candidate miRNAs that might play regulatory roles in C3 and C4 photosynthesis.
Collapse
Affiliation(s)
- Jiajia Xu
- Key Laboratory of Computational Biology and Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yuanyuan Li
- Key Laboratory of Computational Biology and Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yaling Wang
- Key Laboratory of Computational Biology and Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Xinyu Liu
- Key Laboratory of Computational Biology and Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Xin-Guang Zhu
- Key Laboratory of Computational Biology and Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory of Hybrid Rice Research, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
3
|
Xu J, Bräutigam A, Weber APM, Zhu XG. Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation. JOURNAL OF EXPERIMENTAL BOTANY 2016; 67:5105-17. [PMID: 27436282 PMCID: PMC5014158 DOI: 10.1093/jxb/erw275] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Identification of potential cis-regulatory motifs controlling the development of C4 photosynthesis is a major focus of current research. In this study, we used time-series RNA-seq data collected from etiolated maize and rice leaf tissues sampled during a de-etiolation process to systematically characterize the expression patterns of C4-related genes and to further identify potential cis elements in five different genomic regions (i.e. promoter, 5'UTR, 3'UTR, intron, and coding sequence) of C4 orthologous genes. The results demonstrate that although most of the C4 genes show similar expression patterns, a number of them, including chloroplast dicarboxylate transporter 1, aspartate aminotransferase, and triose phosphate transporter, show shifted expression patterns compared with their C3 counterparts. A number of conserved short DNA motifs between maize C4 genes and their rice orthologous genes were identified not only in the promoter, 5'UTR, 3'UTR, and coding sequences, but also in the introns of core C4 genes. We also identified cis-regulatory motifs that exist in maize C4 genes and also in genes showing similar expression patterns as maize C4 genes but that do not exist in rice C3 orthologs, suggesting a possible recruitment of pre-existing cis-elements from genes unrelated to C4 photosynthesis into C4 photosynthesis genes during C4 evolution.
Collapse
Affiliation(s)
- Jiajia Xu
- CAS Key Laboratory of Computational Biology and State Key Laboratory for Hybrid Rice, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Andrea Bräutigam
- Institute of Plant Biochemistry, Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich-Heine University, 40225 Düsseldorf, Germany Network Analysis and Modeling, IPK Gatersleben, Correnstrasse 3, D-06466 Stadt Seeland, Germany
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich-Heine University, 40225 Düsseldorf, Germany
| | - Xin-Guang Zhu
- CAS Key Laboratory of Computational Biology and State Key Laboratory for Hybrid Rice, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
4
|
PoplarGene: poplar gene network and resource for mining functional information for genes from woody plants. Sci Rep 2016; 6:31356. [PMID: 27515999 PMCID: PMC4981870 DOI: 10.1038/srep31356] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/18/2016] [Indexed: 01/05/2023] Open
Abstract
Poplar is not only an important resource for the production of paper, timber and other wood-based products, but it has also emerged as an ideal model system for studying woody plants. To better understand the biological processes underlying various traits in poplar, e.g., wood development, a comprehensive functional gene interaction network is highly needed. Here, we constructed a genome-wide functional gene network for poplar (covering ~70% of the 41,335 poplar genes) and created the network web service PoplarGene, offering comprehensive functional interactions and extensive poplar gene functional annotations. PoplarGene incorporates two network-based gene prioritization algorithms, neighborhood-based prioritization and context-based prioritization, which can be used to perform gene prioritization in a complementary manner. Furthermore, the co-functional information in PoplarGene can be applied to other woody plant proteomes with high efficiency via orthology transfer. In addition to poplar gene sequences, the webserver also accepts Arabidopsis reference gene as input to guide the search for novel candidate functional genes in PoplarGene. We believe that PoplarGene (http://bioinformatics.caf.ac.cn/PoplarGene and http://124.127.201.25/PoplarGene) will greatly benefit the research community, facilitating studies of poplar and other woody plants.
Collapse
|
5
|
Wang L, Czedik-Eysenberg A, Mertz RA, Si Y, Tohge T, Nunes-Nesi A, Arrivault S, Dedow LK, Bryant DW, Zhou W, Xu J, Weissmann S, Studer A, Li P, Zhang C, LaRue T, Shao Y, Ding Z, Sun Q, Patel RV, Turgeon R, Zhu X, Provart NJ, Mockler TC, Fernie AR, Stitt M, Liu P, Brutnell TP. Comparative analyses of C4 and C3 photosynthesis in developing leaves of maize and rice. Nat Biotechnol 2014; 32:1158-65. [DOI: 10.1038/nbt.3019] [Citation(s) in RCA: 190] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Accepted: 08/14/2014] [Indexed: 01/29/2023]
|
6
|
Zhou H, Gao S, Nguyen NN, Fan M, Jin J, Liu B, Zhao L, Xiong G, Tan M, Li S, Wong L. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions. Biol Direct 2014; 9:5. [PMID: 24708540 PMCID: PMC4022245 DOI: 10.1186/1745-6150-9-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. RESULTS We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein domains from both host and pathogen proteins involved in host-pathogen PPIs tend to have lower charge, and tend to be more hydrophilic. CONCLUSIONS Our stringent homology-based prediction approach provides a better strategy in predicting PPIs between eukaryotic hosts and prokaryotic pathogens than a conventional homology-based approach. The properties we have observed from the predicted H. sapiens-M. tuberculosis H37Rv PPI network are useful for understanding inter-species host-pathogen PPI networks and provide novel insights for host-pathogen interaction studies.
Collapse
Affiliation(s)
- Hufeng Zhou
- NUS Graduate School for Integrative Sciences & Engineering, National University of Singapore, Singapore, Singapore
- School of Computing, National University of Singapore, Singapore, Singapore
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
- Department of Microbiology and Immunobiology, Harvard University, Cambridge, USA
| | - Shangzhi Gao
- Department of Environmental Health, Harvard School of Public Health, Harvard University, Cambridge, USA
| | - Nam Ninh Nguyen
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Mengyuan Fan
- NUS Graduate School for Integrative Sciences & Engineering, National University of Singapore, Singapore, Singapore
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Jingjing Jin
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Bing Liu
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Liang Zhao
- Bioinformatics Research Center, & School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| | - Geng Xiong
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
| | - Min Tan
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
- Department of Microbiology and Immunobiology, Harvard University, Cambridge, USA
| | - Shijun Li
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
- Department of Microbiology and Immunobiology, Harvard University, Cambridge, USA
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore, Singapore
| |
Collapse
|
7
|
Chen YL, Chen CM, Pai TW, Leong HW, Chong KF. Homologous synteny block detection based on suffix tree algorithms. J Bioinform Comput Biol 2014; 11:1343004. [PMID: 24372033 DOI: 10.1142/s021972001343004x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A synteny block represents a set of contiguous genes located within the same chromosome and well conserved among various species. Through long evolutionary processes and genome rearrangement events, large numbers of synteny blocks remain highly conserved across multiple species. Understanding distribution of conserved gene blocks facilitates evolutionary biologists to trace the diversity of life, and it also plays an important role for orthologous gene detection and gene annotation in the genomic era. In this work, we focus on collinear synteny detection in which the order of genes is required and well conserved among multiple species. To achieve this goal, the suffix tree based algorithms for efficiently identifying homologous synteny blocks was proposed. The traditional suffix tree algorithm was modified by considering a chromosome as a string and each gene in a chromosome is encoded as a symbol character. Hence, a suffix tree can be built for different query chromosomes from various species. We can then efficiently search for conserved synteny blocks that are modeled as overlapped contiguous edges in our suffix tree. In addition, we defined a novel Synteny Block Conserved Index (SBCI) to evaluate the relationship of synteny block distribution between two species, and which could be applied as an evolutionary indicator for constructing a phylogenetic tree from multiple species instead of performing large computational requirements through whole genome sequence alignment.
Collapse
Affiliation(s)
- Yu-Lun Chen
- Department of Computer Science and Engineering and Center of Excellence for the Oceans, National Taiwan Ocean University, No. 2 Peining Road, Keelung, Taiwan 20224, Republic of China
| | | | | | | | | |
Collapse
|
8
|
Antimicrobial peptides design by evolutionary multiobjective optimization. PLoS Comput Biol 2013; 9:e1003212. [PMID: 24039565 PMCID: PMC3764005 DOI: 10.1371/journal.pcbi.1003212] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 07/23/2013] [Indexed: 02/03/2023] Open
Abstract
Antimicrobial peptides (AMPs) are an abundant and wide class of molecules produced by many tissues and cell types in a variety of mammals, plant and animal species. Linear alpha-helical antimicrobial peptides are among the most widespread membrane-disruptive AMPs in nature, representing a particularly successful structural arrangement in innate defense. Recently, AMPs have received increasing attention as potential therapeutic agents, owing to their broad activity spectrum and their reduced tendency to induce resistance. The introduction of non-natural amino acids will be a key requisite in order to contrast host resistance and increase compound's life. In this work, the possibility to design novel AMP sequences with non-natural amino acids was achieved through a flexible computational approach, based on chemophysical profiles of peptide sequences. Quantitative structure-activity relationship (QSAR) descriptors were employed to code each peptide and train two statistical models in order to account for structural and functional properties of alpha-helical amphipathic AMPs. These models were then used as fitness functions for a multi-objective evolutional algorithm, together with a set of constraints for the design of a series of candidate AMPs. Two ab-initio natural peptides were synthesized and experimentally validated for antimicrobial activity, together with a series of control peptides. Furthermore, a well-known Cecropin-Mellitin alpha helical antimicrobial hybrid (CM18) was optimized by shortening its amino acid sequence while maintaining its activity and a peptide with non-natural amino acids was designed and tested, demonstrating the higher activity achievable with artificial residues. In recent years, the increasing and rapid spread of pathogenic microorganisms resistant to conventional antibiotics especially in hospital settings spurred research for the identification of novel molecules endowed with antimicrobial activities and new mechanisms of action. Antimicrobial peptides (AMPs) received an increasing attention as potential therapeutic agents because of their wide spectrum of activity and low rate in inducing bacterial resistance. Currently, research is focused on the design and optimization of novel AMPs to improve their antimicrobial activity, minimize the cytotoxicity and reduce the proteolytic degradation, also in biological fluids. To this end, the introduction of non-natural amino acids will be a key requisite in order to contrast host resistance and increase compound's life. However, the amino acidic alphabet extension to non-natural elements makes a systematic approach to AMPs design unfeasible. A rational in-silico approach can drastically reduce the number of testing compounds and consequently the production costs and the time required for evaluation of activity and toxicity. In this article, AMP in-silico design with non-natural amino acids was performed and a series of candidates were tested in order to demonstrate the potentiality of this approach.
Collapse
|