1
|
Bernot JP, Owen CL, Wolfe JM, Meland K, Olesen J, Crandall KA. Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling. Mol Biol Evol 2023; 40:msad175. [PMID: 37552897 PMCID: PMC10414812 DOI: 10.1093/molbev/msad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open
Abstract
The clade Pancrustacea, comprising crustaceans and hexapods, is the most diverse group of animals on earth, containing over 80% of animal species and half of animal biomass. It has been the subject of several recent phylogenomic analyses, yet relationships within Pancrustacea show a notable lack of stability. Here, the phylogeny is estimated with expanded taxon sampling, particularly of malacostracans. We show small changes in taxon sampling have large impacts on phylogenetic estimation. By analyzing identical orthologs between two slightly different taxon sets, we show that the differences in the resulting topologies are due primarily to the effects of taxon sampling on the phylogenetic reconstruction method. We compare trees resulting from our phylogenomic analyses with those from the literature to explore the large tree space of pancrustacean phylogenetic hypotheses and find that statistical topology tests reject the previously published trees in favor of the maximum likelihood trees produced here. Our results reject several clades including Caridoida, Eucarida, Multicrustacea, Vericrustacea, and Syncarida. Notably, we find Copepoda nested within Allotriocarida with high support and recover a novel relationship between decapods, euphausiids, and syncarids that we refer to as the Syneucarida. With denser taxon sampling, we find Stomatopoda sister to this latter clade, which we collectively name Stomatocarida, dividing Malacostraca into three clades: Leptostraca, Peracarida, and Stomatocarida. A new Bayesian divergence time estimation is conducted using 13 vetted fossils. We review our results in the context of other pancrustacean phylogenetic hypotheses and highlight 15 key taxa to sample in future studies.
Collapse
Affiliation(s)
- James P Bernot
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, ℅ National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Joanna M Wolfe
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kenneth Meland
- Department of Biology, University of Bergen, Bergen, Norway
| | - Jørgen Olesen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Keith A Crandall
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
2
|
Tessler M, Neumann JS, Kamm K, Osigus HJ, Eshel G, Narechania A, Burns JA, DeSalle R, Schierwater B. Phylogenomics and the first higher taxonomy of Placozoa, an ancient and enigmatic animal phylum. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.1016357] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Placozoa is an ancient phylum of extraordinarily unusual animals: miniscule, ameboid creatures that lack most fundamental animal features. Despite high genetic diversity, only recently have the second and third species been named. While prior genomic studies suffer from incomplete placozoan taxon sampling, we more than double the count with protein sequences from seven key genomes and produce the first nuclear phylogenomic reconstruction of all major placozoan lineages. This leads us to the first complete Linnaean taxonomic classification of Placozoa, over a century after its discovery: This may be the only time in the 21st century when an entire higher taxonomy for a whole animal phylum is formalized. Our classification establishes 2 new classes, 4 new orders, 3 new families, 1 new genus, and 1 new species, namely classes Polyplacotomia and Uniplacotomia; orders Polyplacotomea, Trichoplacea, Cladhexea, and Hoilungea; families Polyplacotomidae, Cladtertiidae, and Hoilungidae; and genus Cladtertia with species Cladtertia collaboinventa, nov. Our likelihood and gene content tree topologies refine the relationships determined in previous studies. Adding morphological data into our phylogenomic matrices suggests sponges (Porifera) as the sister to other animals, indicating that modest data addition shifts this node away from comb jellies (Ctenophora). Furthermore, by adding the first genomic protein data of the exceptionally distinct and branching Polyplacotoma mediterranea, we solidify its position as sister to all other placozoans; a divergence we estimate to be over 400 million years old. Yet even this deep split sits on a long branch to other animals, suggesting a bottleneck event followed by diversification. Ancestral state reconstructions indicate large shifts in gene content within Placozoa, with Hoilungia hongkongensis and its closest relatives having the most unique genetics.
Collapse
|
3
|
Christian RW, Hewitt SL, Roalson EH, Dhingra A. Genome-Scale Characterization of Predicted Plastid-Targeted Proteomes in Higher Plants. Sci Rep 2020; 10:8281. [PMID: 32427841 PMCID: PMC7237471 DOI: 10.1038/s41598-020-64670-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 04/20/2020] [Indexed: 12/20/2022] Open
Abstract
Plastids are morphologically and functionally diverse organelles that are dependent on nuclear-encoded, plastid-targeted proteins for all biochemical and regulatory functions. However, how plastid proteomes vary temporally, spatially, and taxonomically has been historically difficult to analyze at a genome-wide scale using experimental methods. A bioinformatics workflow was developed and evaluated using a combination of fast and user-friendly subcellular prediction programs to maximize performance and accuracy for chloroplast transit peptides and demonstrate this technique on the predicted proteomes of 15 sequenced plant genomes. Gene family grouping was then performed in parallel using modified approaches of reciprocal best BLAST hits (RBH) and UCLUST. A total of 628 protein families were found to have conserved plastid targeting across angiosperm species using RBH, and 828 using UCLUST. However, thousands of clusters were also detected where only one species had predicted plastid targeting, most notably in Panicum virgatum which had 1,458 proteins with species-unique targeting. An average of 45% overlap was found in plastid-targeted protein-coding gene families compared with Arabidopsis, but an additional 20% of proteins matched against the full Arabidopsis proteome, indicating a unique evolution of plastid targeting. Neofunctionalization through subcellular relocalization is known to impart novel biological functions but has not been described before on a genome-wide scale for the plastid proteome. Further work to correlate these predicted novel plastid-targeted proteins to transcript abundance and high-throughput proteomics will uncover unique aspects of plastid biology and shed light on how the plastid proteome has evolved to influence plastid morphology and biochemistry.
Collapse
Affiliation(s)
- Ryan W Christian
- Department of Horticulture, Washington State University, Pullman, WA, USA
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA
| | - Seanna L Hewitt
- Department of Horticulture, Washington State University, Pullman, WA, USA
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA
| | - Eric H Roalson
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | - Amit Dhingra
- Department of Horticulture, Washington State University, Pullman, WA, USA.
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA.
| |
Collapse
|
4
|
Galpert D, Fernández A, Herrera F, Antunes A, Molina-Ruiz R, Agüero-Chapin G. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics 2018; 19:166. [PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencia de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Alberto Fernández
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Agostinho Antunes
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba.
| |
Collapse
|
5
|
Kaur G, Guruprasad K, Temple BRS, Shirvanyants DG, Dokholyan NV, Pati PK. Structural complexity and functional diversity of plant NADPH oxidases. Amino Acids 2018; 50:79-94. [PMID: 29071531 PMCID: PMC6492275 DOI: 10.1007/s00726-017-2491-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 09/11/2017] [Indexed: 10/18/2022]
Abstract
Plant NADPH oxidases also known as respiratory burst oxidase homologs (Rbohs) are a family of membrane-bound enzymes that play diverse roles in the defense response and morphogenetic processes via regulated generation of reactive oxygen species. Rbohs are associated with a variety of functions, although the reason for this is not clear. To evaluate using bioinformatics, the possible mechanisms for the observed functional diversity within the plant kingdom, 127 Rboh protein sequences representing 26 plant species were analyzed. Multiple clusters were identified with gene duplications that were both dicot as well as monocot-specific. The N-terminal sequences were observed to be highly variable. The conserved cysteine (equivalent of Cys890) in C-terminal of AtRbohD suggested that the redox-based modification like S-nitrosylation may regulate the activity of other Rbohs. Three-dimensional models corresponding to the N-terminal domain for Rbohs from Arabidopsis thaliana and Oryza sativa were constructed and molecular dynamics studies were carried out to study the role of Ca2+ in the folding of Rboh proteins. Certain mutations indicated possibly affect the structure and function of the plant NADPH oxidases, thereby providing the rationale for further experimental validation.
Collapse
Affiliation(s)
- Gurpreet Kaur
- Department of Biotechnology, Guru Nanak Dev University, Amritsar, India
- Bioinformatics, Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, India
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
- Max Planck Institute for Developmental Biology, Tuebingen, Germany
| | - Kunchur Guruprasad
- Bioinformatics, Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, India
| | - Brenda R S Temple
- R. L. Juliano Structural Bioinformatics Core Facility, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - David G Shirvanyants
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Pratap Kumar Pati
- Department of Biotechnology, Guru Nanak Dev University, Amritsar, India.
| |
Collapse
|
6
|
Battenberg K, Lee EK, Chiu JC, Berry AM, Potter D. OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement. BMC Bioinformatics 2017. [PMID: 28633662 PMCID: PMC5479036 DOI: 10.1186/s12859-017-1726-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Identifying orthologous genes is an initial step required for phylogenetics, and it is also a common strategy employed in functional genetics to find candidates for functionally equivalent genes across multiple species. At the same time, in silico orthology prediction tools often require large computational resources only available on computing clusters. Here we present OrthoReD, an open-source orthology prediction tool with accuracy comparable to published tools that requires only a desktop computer. The low computational resource requirement of OrthoReD is achieved by repeating orthology searches on one gene of interest at a time, thereby generating a reduced dataset to limit the scope of orthology search for each gene of interest. Results The output of OrthoReD was highly similar to the outputs of two other published orthology prediction tools, OrthologID and/or OrthoDB, for the three dataset tested, which represented three phyla with different ranges of species diversity and different number of genomes included. Median CPU time for ortholog prediction per gene by OrthoReD executed on a desktop computer was <15 min even for the largest dataset tested, which included all coding sequences of 100 bacterial species. Conclusions With high-throughput sequencing, unprecedented numbers of genes from non-model organisms are available with increasing need for clear information about their orthologies and/or functional equivalents in model organisms. OrthoReD is not only fast and accurate as an orthology prediction tool, but also gives researchers flexibility in the number of genes analyzed at a time, without requiring a high-performance computing cluster. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1726-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai Battenberg
- Department of Plant Sciences, University of California, Davis, CA, USA.
| | - Ernest K Lee
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Joanna C Chiu
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Alison M Berry
- Department of Plant Sciences, University of California, Davis, CA, USA
| | - Daniel Potter
- Department of Plant Sciences, University of California, Davis, CA, USA
| |
Collapse
|
7
|
Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. Gigascience 2016; 5:44. [PMID: 27776538 PMCID: PMC5078944 DOI: 10.1186/s13742-016-0152-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/12/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. FINDINGS In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. CONCLUSIONS Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.
Collapse
|
8
|
Planet PJ, Narechania A, Chen L, Mathema B, Boundy S, Archer G, Kreiswirth B. Architecture of a Species: Phylogenomics of Staphylococcus aureus. Trends Microbiol 2016; 25:153-166. [PMID: 27751626 DOI: 10.1016/j.tim.2016.09.009] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 09/07/2016] [Accepted: 09/22/2016] [Indexed: 12/11/2022]
Abstract
A deluge of whole-genome sequencing has begun to give insights into the patterns and processes of microbial evolution, but genome sequences have accrued in a haphazard manner, with biased sampling of natural variation that is driven largely by medical and epidemiological priorities. For instance, there is a strong bias for sequencing epidemic lineages of methicillin-resistant Staphylococcus aureus (MRSA) over sensitive isolates (methicillin-sensitive S. aureus: MSSA). As more diverse genomes are sequenced the emerging picture is of a highly subdivided species with a handful of relatively clonal groups (complexes) that, at any given moment, dominate in particular geographical regions. The establishment of hegemony of particular clones appears to be a dynamic process of successive waves of replacement of the previously dominant clone. Here we review the phylogenomic structure of a diverse range of S. aureus, including both MRSA and MSSA. We consider the utility of the concept of the 'core' genome and the impact of recombination and horizontal transfer. We argue that whole-genome surveillance of S. aureus populations could lead to better forecasting of antibiotic resistance and virulence of emerging clones, and a better understanding of the elusive biological factors that determine repeated strain replacement.
Collapse
Affiliation(s)
- Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA; Department of Pediatrics, Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia & University of Pennsylvania, Philadelphia, PA, USA.
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA
| | - Liang Chen
- Public Health Research Institute Center, New Jersey Medical School, Rutgers, Newark, NJ, USA
| | - Barun Mathema
- Public Health Research Institute Center, New Jersey Medical School, Rutgers, Newark, NJ, USA; Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Sam Boundy
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Gordon Archer
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Barry Kreiswirth
- Public Health Research Institute Center, New Jersey Medical School, Rutgers, Newark, NJ, USA
| |
Collapse
|
9
|
Ballesteros JA, Hormiga G. A New Orthology Assessment Method for Phylogenomic Data: Unrooted Phylogenetic Orthology. Mol Biol Evol 2016; 33:2117-34. [PMID: 27189539 DOI: 10.1093/molbev/msw069] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Current sequencing technologies are making available unprecedented amounts of genetic data for a large variety of species including nonmodel organisms. Although many phylogenomic surveys spend considerable time finding orthologs from the wealth of sequence data, these results do not transcend the original study and after being processed for specific phylogenetic purposes these orthologs do not become stable orthology hypotheses. We describe a procedure to detect and document the phylogenetic distribution of orthologs allowing researchers to use this information to guide selection of loci best suited to test specific evolutionary questions. At the core of this pipeline is a new phylogenetic orthology method that is neither affected by the position of the root nor requires explicit assignment of outgroups. We discuss the properties of this new orthology assessment method and exemplify its utility for phylogenomics using a small insects dataset. In addition, we exemplify the pipeline to identify and document stable orthologs for the group of orb-weaving spiders (Araneoidea) using RNAseq data. The scripts used in this study, along with sample files and additional documentation, are available at https://github.com/ballesterus/UPhO.
Collapse
Affiliation(s)
| | - Gustavo Hormiga
- Department of Biological Sciences, The George Washington University
| |
Collapse
|
10
|
Baker RH, Narechania A, DeSalle R, Johns PM, Reinhardt JA, Wilkinson GS. Spermatogenesis Drives Rapid Gene Creation and Masculinization of the X Chromosome in Stalk-Eyed Flies (Diopsidae). Genome Biol Evol 2016; 8:896-914. [PMID: 26951781 PMCID: PMC4824122 DOI: 10.1093/gbe/evw043] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Throughout their evolutionary history, genomes acquire new genetic material that facilitates phenotypic innovation and diversification. Developmental processes associated with reproduction are particularly likely to involve novel genes. Abundant gene creation impacts the evolution of chromosomal gene content and general regulatory mechanisms such as dosage compensation. Numerous studies in model organisms have found complex and, at times contradictory, relationships among these genomic attributes highlighting the need to examine these patterns in other systems characterized by abundant sexual selection. Therefore, we examined the association among novel gene creation, tissue-specific gene expression, and chromosomal gene content within stalk-eyed flies. Flies in this family are characterized by strong sexual selection and the presence of a newly evolved X chromosome. We generated RNA-seq transcriptome data from the testes for three species within the family and from seven additional tissues in the highly dimorphic species, Teleopsis dalmanni. Analysis of dipteran gene orthology reveals dramatic testes-specific gene creation in stalk-eyed flies, involving numerous gene families that are highly conserved in other insect groups. Identification of X-linked genes for the three species indicates that the X chromosome arose prior to the diversification of the family. The most striking feature of this X chromosome is that it is highly masculinized, containing nearly twice as many testes-specific genes as expected based on its size. All the major processes that may drive differential sex chromosome gene content—creation of genes with male-specific expression, development of male-specific expression from pre-existing genes, and movement of genes with male-specific expression—are elevated on the X chromosome of T. dalmanni. This masculinization occurs despite evidence that testes expressed genes do not achieve the same levels of gene expression on the X chromosome as they do on the autosomes.
Collapse
Affiliation(s)
- Richard H Baker
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| | - Philip M Johns
- Life Sciences Department, Yale-NUS College, Singapore, Singapore
| | | | | |
Collapse
|
11
|
Cibrián-Jaramillo A, Barona-Gómez F. Increasing Metagenomic Resolution of Microbiome Interactions Through Functional Phylogenomics and Bacterial Sub-Communities. Front Genet 2016; 7:4. [PMID: 26904093 PMCID: PMC4748306 DOI: 10.3389/fgene.2016.00004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 01/17/2016] [Indexed: 11/13/2022] Open
Abstract
The genomic composition of the microbiome and its relationship with the environment is an exciting open question in biology. Metagenomics is a useful tool in the discovery of previously unknown taxa, but its use to understand the functional and ecological capacities of the microbiome is limited until taxonomy and function are understood in the context of the community. We suggest that this can be achieved using a combined functional phylogenomics and co-culture-based experimental strategy that can increase our capacity to measure sub-community interactions. Functional phylogenomics can identify and partition the genome such that hidden gene functions and gene clusters with unique evolutionary signals are revealed. We can test these phylogenomic predictions using an experimental model based on sub-community populations that represent a subset of the diversity directly obtained from environmental samples. These populations increase the detection of mechanisms that drive functional forces in the assembly of the microbiome, in particular the role of metabolites from key taxa in community interactions. Our combined approach leverages the potential of metagenomics to address biological questions from ecological systems.
Collapse
Affiliation(s)
- Angélica Cibrián-Jaramillo
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Unidad de Genómica Avanzada, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav) Irapuato, Mexico
| | - Francisco Barona-Gómez
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Unidad de Genómica Avanzada, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav) Irapuato, Mexico
| |
Collapse
|
12
|
Schierwater B, Holland PWH, Miller DJ, Stadler PF, Wiegmann BM, Wörheide G, Wray GA, DeSalle R. Never Ending Analysis of a Century Old Evolutionary Debate: “Unringing” the Urmetazoon Bell. Front Ecol Evol 2016. [DOI: 10.3389/fevo.2016.00005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
13
|
Borowiec ML, Lee EK, Chiu JC, Plachetzki DC. Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa. BMC Genomics 2015; 16:987. [PMID: 26596625 PMCID: PMC4657218 DOI: 10.1186/s12864-015-2146-4] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 10/26/2015] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Understanding the phylogenetic relationships among major lineages of multicellular animals (the Metazoa) is a prerequisite for studying the evolution of complex traits such as nervous systems, muscle tissue, or sensory organs. Transcriptome-based phylogenies have dramatically improved our understanding of metazoan relationships in recent years, although several important questions remain. The branching order near the base of the tree, in particular the placement of the poriferan (sponges, phylum Porifera) and ctenophore (comb jellies, phylum Ctenophora) lineages is one outstanding issue. Recent analyses have suggested that the comb jellies are sister to all remaining metazoan phyla including sponges. This finding is surprising because it suggests that neurons and other complex traits, present in ctenophores and eumetazoans but absent in sponges or placozoans, either evolved twice in Metazoa or were independently, secondarily lost in the lineages leading to sponges and placozoans. RESULTS To address the question of basal metazoan relationships we assembled a novel dataset comprised of 1080 orthologous loci derived from 36 publicly available genomes representing major lineages of animals. From this large dataset we procured an optimized set of partitions with high phylogenetic signal for resolving metazoan relationships. This optimized data set is amenable to the most appropriate and computationally intensive analyses using site-heterogeneous models of sequence evolution. We also employed several strategies to examine the potential for long-branch attraction to bias our inferences. Our analyses strongly support the Ctenophora as the sister lineage to other Metazoa. We find no support for the traditional view uniting the ctenophores and Cnidaria. Our findings are supported by Bayesian comparisons of topological hypotheses and we find no evidence that they are biased by long-branch attraction. CONCLUSIONS Our study further clarifies relationships among early branching metazoan lineages. Our phylogeny supports the still-controversial position of ctenophores as sister group to all other metazoans. This study also provides a workflow and computational tools for minimizing systematic bias in genome-based phylogenetic analyses. Future studies of metazoan phylogeny will benefit from ongoing efforts to sequence the genomes of additional invertebrate taxa that will continue to inform our view of the relationships among the major lineages of animals.
Collapse
Affiliation(s)
- Marek L Borowiec
- Department of Entomology and Nematology, University of California, Davis, USA.
| | - Ernest K Lee
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, USA.
| | - Joanna C Chiu
- Department of Entomology and Nematology, University of California, Davis, USA.
| | - David C Plachetzki
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, USA.
| |
Collapse
|
14
|
Galpert D, del Río S, Herrera F, Ancede-Gallardo E, Antunes A, Agüero-Chapin G. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species. BIOMED RESEARCH INTERNATIONAL 2015; 2015:748681. [PMID: 26605337 PMCID: PMC4641943 DOI: 10.1155/2015/748681] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Revised: 07/26/2015] [Accepted: 08/20/2015] [Indexed: 11/17/2022]
Abstract
Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencias de la Computación, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
| | - Sara del Río
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
| | - Evys Ancede-Gallardo
- Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
| | - Agostinho Antunes
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIMAR/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Guillermin Agüero-Chapin
- Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIMAR/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal
| |
Collapse
|
15
|
Li L, Ji G, Ye C, Shu C, Zhang J, Liang C. PlantOrDB: a genome-wide ortholog database for land plants and green algae. BMC PLANT BIOLOGY 2015; 15:161. [PMID: 26112452 PMCID: PMC4481079 DOI: 10.1186/s12870-015-0531-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 05/21/2015] [Indexed: 05/07/2023]
Abstract
BACKGROUND Genes with different functions are originally generated from some ancestral genes by gene duplication, mutation and functional recombination. It is widely accepted that orthologs are homologous genes evolved from speciation events while paralogs are homologous genes resulted from gene duplication events.With the rapid increase of genomic data, identifying and distinguishing these genes among different species is becoming an important part of functional genomics research. DESCRIPTION Using 35 plant and 6 green algal genomes from Phytozome v9, we clustered 1,291,670 peptide sequences into 49,355 homologous gene families in terms of sequence similarity. For each gene family, we have generated a peptide sequence alignment and phylogenetic tree, and identified the speciation/duplication events for every node within the tree. For each node, we also identified and highlighted diagnostic characters that facilitate appropriate addition of a new query sequence into the existing phylogenetic tree and sequence alignment of its best matched gene family. Based on a desired species or subgroup of all species, users can view the phylogenetic tree, sequence alignment and diagnostic characters for a given gene family selectively. PlantOrDB not only allows users to identify orthologs or paralogs from phylogenetic trees, but also provides all orthologs that are built using Reciprocal Best Hit (RBH) pairwise alignment method. Users can upload their own sequences to find the best matched gene families, and visualize their query sequences within the relevant phylogenetic trees and sequence alignments. CONCLUSION PlantOrDB ( http://bioinfolab.miamioh.edu/plantordb ) is a genome-wide ortholog database for land plants and green algae. PlantOrDB offers highly interactive visualization, accurate query classification and powerful search functions useful for functional genomic research.
Collapse
Affiliation(s)
- Lei Li
- Department of Automation, Xiamen University, Fujian, 361005, China.
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
| | - Guoli Ji
- Department of Automation, Xiamen University, Fujian, 361005, China.
- Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, Fujian, 361005, China.
| | - Congting Ye
- Department of Automation, Xiamen University, Fujian, 361005, China.
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
| | - Changlong Shu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Jie Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Chun Liang
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
16
|
Planet PJ, Diaz L, Kolokotronis SO, Narechania A, Reyes J, Xing G, Rincon S, Smith H, Panesso D, Ryan C, Smith DP, Guzman M, Zurita J, Sebra R, Deikus G, Nolan RL, Tenover FC, Weinstock GM, Robinson DA, Arias CA. Parallel Epidemics of Community-Associated Methicillin-Resistant Staphylococcus aureus USA300 Infection in North and South America. J Infect Dis 2015; 212:1874-82. [PMID: 26048971 DOI: 10.1093/infdis/jiv320] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/13/2015] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) epidemic in the United States is attributed to the spread of the USA300 clone. An epidemic of CA-MRSA closely related to USA300 has occurred in northern South America (USA300 Latin-American variant, USA300-LV). Using phylogenomic analysis, we aimed to understand the relationships between these 2 epidemics. METHODS We sequenced the genomes of 51 MRSA clinical isolates collected between 1999 and 2012 from the United States, Colombia, Venezuela, and Ecuador. Phylogenetic analysis was used to infer the relationships and times since the divergence of the major clades. RESULTS Phylogenetic analyses revealed 2 dominant clades that segregated by geographical region, had a putative common ancestor in 1975, and originated in 1989, in North America, and in 1985, in South America. Emergence of these parallel epidemics coincides with the independent acquisition of the arginine catabolic mobile element (ACME) in North American isolates and a novel copper and mercury resistance (COMER) mobile element in South American isolates. CONCLUSIONS Our results reveal the existence of 2 parallel USA300 epidemics that shared a recent common ancestor. The simultaneous rapid dissemination of these 2 epidemic clades suggests the presence of shared, potentially convergent adaptations that enhance fitness and ability to spread.
Collapse
Affiliation(s)
- Paul J Planet
- Division of Pediatric Infectious Diseases, Department of Pediatrics, Columbia University, College of Physicians and Surgeons Sackler Institute for Comparative Genomics, American Museum of Natural History
| | - Lorena Diaz
- Molecular Genetics and Antimicrobial Resistance Unit, International Center for Microbial Genomics, Universidad El Bosque, Bogotá, Colombia
| | - Sergios-Orestis Kolokotronis
- Sackler Institute for Comparative Genomics, American Museum of Natural History Department of Biological Sciences, Fordham University, Bronx, New York
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History
| | - Jinnethe Reyes
- Molecular Genetics and Antimicrobial Resistance Unit, International Center for Microbial Genomics, Universidad El Bosque, Bogotá, Colombia
| | - Galen Xing
- Division of Pediatric Infectious Diseases, Department of Pediatrics, Columbia University, College of Physicians and Surgeons
| | - Sandra Rincon
- Molecular Genetics and Antimicrobial Resistance Unit, International Center for Microbial Genomics, Universidad El Bosque, Bogotá, Colombia
| | - Hannah Smith
- Division of Pediatric Infectious Diseases, Department of Pediatrics, Columbia University, College of Physicians and Surgeons
| | - Diana Panesso
- Division of Infectious Diseases, Department of Internal Medicine Molecular Genetics and Antimicrobial Resistance Unit, International Center for Microbial Genomics, Universidad El Bosque, Bogotá, Colombia
| | - Chanelle Ryan
- Division of Pediatric Infectious Diseases, Department of Pediatrics, Columbia University, College of Physicians and Surgeons
| | - Dylan P Smith
- Molecular Genetics and Antimicrobial Resistance Unit, International Center for Microbial Genomics, Universidad El Bosque, Bogotá, Colombia
| | | | - Jeannete Zurita
- Hospital Vozandes, Pontificia Universidad Catolica, Quito, Ecuador
| | - Robert Sebra
- Genome Center, Mount Sinai Hospital, New York City
| | | | - Rathel L Nolan
- Division of Infectious Diseases, Department of Internal Medicine
| | | | | | - D Ashley Robinson
- Division of Infectious Diseases, Department of Microbiology, University of Mississippi Medical Center, Jackson
| | - Cesar A Arias
- Division of Infectious Diseases, Department of Internal Medicine Department of Microbiology and Molecular Genetics, University of Texas Medical School at Houston Molecular Genetics and Antimicrobial Resistance Unit, International Center for Microbial Genomics, Universidad El Bosque, Bogotá, Colombia
| |
Collapse
|
17
|
Murphy KA, Unruh TR, Zhou LM, Zalom FG, Shearer PW, Beers EH, Walton VM, Miller B, Chiu JC. Using comparative genomics to develop a molecular diagnostic for the identification of an emerging pest Drosophila suzukii. BULLETIN OF ENTOMOLOGICAL RESEARCH 2015; 105:364-72. [PMID: 25804294 DOI: 10.1017/s0007485315000218] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Drosophila suzukii (Spotted Wing Drosophila) has recently become a serious invasive pest of fruit crops in the USA, Canada, and Europe, leading to substantial economic losses. D. suzukii is a direct pest, ovipositing directly into ripe or ripening fruits; in contrast, other Drosophilids utilize decaying or blemished fruits and are nuisance pests at worst. Immature stages of D. suzukii are difficult to differentiate from other Drosophilids, posing problems for research and for meeting quarantine restrictions designed to prevent the spread of this pest in fruit exports. Here we used a combined phylogenetic and bioinformatic approach to discover genetic markers suitable for a species diagnostic protocol of this agricultural pest. We describe a molecular diagnostic for rapid identification of single D. suzukii larva using multiplex polymerase chain reaction. Our molecular diagnostic was validated using nine different species of Drosophila for specificity and 19 populations of D. suzukii from different geographical regions to ensure utility within species.
Collapse
Affiliation(s)
- K A Murphy
- Department of Entomology and Nematology,College of Agricultural and Environmental Sciences,University of California,Davis,CA 95616,USA
| | | | - L M Zhou
- Department of Entomology and Nematology,College of Agricultural and Environmental Sciences,University of California,Davis,CA 95616,USA
| | - F G Zalom
- Department of Entomology and Nematology,College of Agricultural and Environmental Sciences,University of California,Davis,CA 95616,USA
| | - P W Shearer
- Mid-Columbia Agricultural and Extension Center,Oregon State University,Hood River,OR 97031,USA
| | - E H Beers
- Tree Fruit Research and Extension Center,Washington State University,Wenatchee,WA 98801,USA
| | - V M Walton
- Department of Horticulture,Oregon State University,Corvallis,OR 97331,USA
| | - B Miller
- Department of Horticulture,Oregon State University,Corvallis,OR 97331,USA
| | - J C Chiu
- Department of Entomology and Nematology,College of Agricultural and Environmental Sciences,University of California,Davis,CA 95616,USA
| |
Collapse
|
18
|
Yang Y, Smith SA. Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics. Mol Biol Evol 2014; 31:3081-92. [PMID: 25158799 PMCID: PMC4209138 DOI: 10.1093/molbev/msu245] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Orthology inference is central to phylogenomic analyses. Phylogenomic data sets commonly include transcriptomes and low-coverage genomes that are incomplete and contain errors and isoforms. These properties can severely violate the underlying assumptions of orthology inference with existing heuristics. We present a procedure that uses phylogenies for both homology and orthology assignment. The procedure first uses similarity scores to infer putative homologs that are then aligned, constructed into phylogenies, and pruned of spurious branches caused by deep paralogs, misassembly, frameshifts, or recombination. These final homologs are then used to identify orthologs. We explore four alternative tree-based orthology inference approaches, of which two are new. These accommodate gene and genome duplications as well as gene tree discordance. We demonstrate these methods in three published data sets including the grape family, Hymenoptera, and millipedes with divergence times ranging from approximately 100 to over 400 Ma. The procedure significantly increased the completeness and accuracy of the inferred homologs and orthologs. We also found that data sets that are more recently diverged and/or include more high-coverage genomes had more complete sets of orthologs. To explicitly evaluate sources of conflicting phylogenetic signals, we applied serial jackknife analyses of gene regions keeping each locus intact. The methods described here can scale to over 100 taxa. They have been implemented in python with independent scripts for each step, making it easy to modify or incorporate them into existing pipelines. All scripts are available from https://bitbucket.org/yangya/phylogenomic_dataset_construction.
Collapse
Affiliation(s)
- Ya Yang
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor
| |
Collapse
|
19
|
Alexeyenko A, Lindberg J, Pérez-Bercoff A, Sonnhammer ELL. Overview and comparison of ortholog databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 3:137-43. [PMID: 24980400 DOI: 10.1016/j.ddtec.2006.06.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Orthologs are an indispensable bridge to transfer biological knowledge between species, from protein annotations to sophisticated disease models. However, orthology assignment is not trivial. A large number of resources now exist, each with its own idiosyncrasies. The goal of this review is to compare their contents and clarify which database is most suited for a certain task.:
Collapse
Affiliation(s)
- Andrey Alexeyenko
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Julia Lindberg
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Asa Pérez-Bercoff
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden.
| |
Collapse
|
20
|
Parker D, Planet PJ, Soong G, Narechania A, Prince A. Induction of type I interferon signaling determines the relative pathogenicity of Staphylococcus aureus strains. PLoS Pathog 2014; 10:e1003951. [PMID: 24586160 PMCID: PMC3930619 DOI: 10.1371/journal.ppat.1003951] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Accepted: 01/10/2014] [Indexed: 12/31/2022] Open
Abstract
The tremendous success of S. aureus as a human pathogen has been explained primarily by its array of virulence factors that enable the organism to evade host immunity. Perhaps equally important, but less well understood, is the importance of the intensity of the host response in determining the extent of pathology induced by S. aureus infection, particularly in the pathogenesis of pneumonia. We compared the pathogenesis of infection caused by two phylogenetically and epidemiologically distinct strains of S. aureus whose behavior in humans has been well characterized. Induction of the type I IFN cascade by strain 502A, due to a NOD2-IRF5 pathway, was the major factor in causing severe pneumonia and death in a murine model of pneumonia and was associated with autolysis and release of peptidogylcan. In contrast to USA300, 502A was readily eliminated from epithelial surfaces in vitro. Nonetheless, 502A caused significantly increased tissue damage due to the organisms that were able to invade systemically and trigger type I IFN responses, and this was ameliorated in Ifnar⁻/⁻ mice. The success of USA300 to cause invasive infection appears to depend upon its resistance to eradication from epithelial surfaces, but not production of specific toxins. Our studies illustrate the important and highly variable role of type I IFN signaling within a species and suggest that targeted immunomodulation of specific innate immune signaling cascades may be useful to prevent the excessive morbidity associated with S. aureus pneumonia.
Collapse
Affiliation(s)
- Dane Parker
- Department of Pediatrics, Columbia University, New York, New York, United States of America
- * E-mail:
| | - Paul J. Planet
- Department of Pediatrics, Columbia University, New York, New York, United States of America
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Grace Soong
- Department of Pediatrics, Columbia University, New York, New York, United States of America
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Alice Prince
- Department of Pediatrics, Columbia University, New York, New York, United States of America
| |
Collapse
|
21
|
Abstract
Drosophila suzukii Matsumura (spotted wing drosophila) has recently become a serious pest of a wide variety of fruit crops in the United States as well as in Europe, leading to substantial yearly crop losses. To enable basic and applied research of this important pest, we sequenced the D. suzukii genome to obtain a high-quality reference sequence. Here, we discuss the basic properties of the genome and transcriptome and describe patterns of genome evolution in D. suzukii and its close relatives. Our analyses and genome annotations are presented in a web portal, SpottedWingFlyBase, to facilitate public access.
Collapse
|
22
|
Christin PA, Spriggs E, Osborne CP, Stromberg CAE, Salamin N, Edwards EJ. Molecular Dating, Evolutionary Rates, and the Age of the Grasses. Syst Biol 2013; 63:153-65. [DOI: 10.1093/sysbio/syt072] [Citation(s) in RCA: 137] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
23
|
Johnson B, Borowiec M, Chiu J, Lee E, Atallah J, Ward P. Phylogenomics Resolves Evolutionary Relationships among Ants, Bees, and Wasps. Curr Biol 2013; 23:2058-62. [DOI: 10.1016/j.cub.2013.08.050] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Revised: 08/01/2013] [Accepted: 08/21/2013] [Indexed: 12/30/2022]
|
24
|
Singh R, Ong-Abdullah M, Low ETL, Manaf MAA, Rosli R, Nookiah R, Ooi LCL, Ooi SE, Chan KL, Halim MA, Azizi N, Nagappan J, Bacher B, Lakey N, Smith SW, He D, Hogan M, Budiman MA, Lee EK, DeSalle R, Kudrna D, Goicoechea JL, Wing RA, Wilson RK, Fulton RS, Ordway JM, Martienssen RA, Sambanthamurthi R. Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature 2013; 500:335-9. [PMID: 23883927 PMCID: PMC3929164 DOI: 10.1038/nature12309] [Citation(s) in RCA: 272] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 05/16/2013] [Indexed: 11/09/2022]
Abstract
Oil palm is the most productive oil-bearing crop. Planted on only 5% of the total vegetable oil acreage, palm oil accounts for 33% of vegetable oil, and 45% of edible oil worldwide, but increased cultivation competes with dwindling rainforest reserves. We report the 1.8 gigabase (Gb) genome sequence of the African oil palm Elaeis guineensis, the predominant source of worldwide oil production. 1.535 Gb of assembled sequence and transcriptome data from 30 tissue types were used to predict at least 34,802 genes, including oil biosynthesis genes and homologues of WRINKLED1 (WRI1), and other transcriptional regulators1, which are highly expressed in the kernel. We also report the draft sequence of the S. American oil palm Elaeis oleifera, which has the same number of chromosomes (2n=32) and produces fertile interspecific hybrids with E. guineensis2, but appears to have diverged in the new world. Segmental duplications of chromosome arms define the palaeotetraploid origin of palm trees. The oil palm sequence enables the discovery of genes for important traits as well as somaclonal epigenetic alterations which restrict the use of clones in commercial plantings3, and thus helps achieve sustainability for biofuels and edible oils, reducing the rainforest footprint of this tropical plantation crop.
Collapse
Affiliation(s)
- Rajinder Singh
- Malaysian Palm Oil Board, 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
BACKGROUND The comparison of relative gene orders between two genomes offers deep insights into functional correlations of genes and the evolutionary relationships between the corresponding organisms. Methods for gene order analyses often require prior knowledge of homologies between all genes of the genomic dataset. Since such information is hard to obtain, it is common to predict homologous groups based on sequence similarity. These hypothetical groups of homologous genes are called gene families. RESULTS This manuscript promotes a new branch of gene order studies in which prior assignment of gene families is not required. As a case study, we present a new similarity measure between pairs of genomes that is related to the breakpoint distance. We propose an exact and a heuristic algorithm for its computation. We evaluate our methods on a dataset comprising 12 γ-proteobacteria from the literature. CONCLUSIONS In evaluating our algorithms, we show that the exact algorithm is suitable for computations on small genomes. Moreover, the results of our heuristic are close to those of the exact algorithm. In general, we demonstrate that gene order studies can be improved by direct, gene family assignment-free comparisons.
Collapse
Affiliation(s)
- Daniel Doerr
- Genome Informatics, Faculty of Technology, Center for Biotechnology (CeBiTec), Bielefeld University, Germany
- Institute for Bioinformatics, Center for Biotechnology (CeBiTec), Bielefeld University, Germany
| | - Annelyse Thévenin
- Genome Informatics, Faculty of Technology, Center for Biotechnology (CeBiTec), Bielefeld University, Germany
- Institute for Bioinformatics, Center for Biotechnology (CeBiTec), Bielefeld University, Germany
| | - Jens Stoye
- Genome Informatics, Faculty of Technology, Center for Biotechnology (CeBiTec), Bielefeld University, Germany
- Institute for Bioinformatics, Center for Biotechnology (CeBiTec), Bielefeld University, Germany
| |
Collapse
|
26
|
Fusari CM, Di Rienzo JA, Troglia C, Nishinakamasu V, Moreno MV, Maringolo C, Quiroz F, Álvarez D, Escande A, Hopp E, Heinz R, Lia VV, Paniego NB. Association mapping in sunflower for Sclerotinia Head Rot resistance. BMC PLANT BIOLOGY 2012; 12:93. [PMID: 22708963 PMCID: PMC3778846 DOI: 10.1186/1471-2229-12-93] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 05/21/2012] [Indexed: 05/04/2023]
Abstract
BACKGROUND Sclerotinia Head Rot (SHR) is one of the most damaging diseases of sunflower in Europe, Argentina, and USA, causing average yield reductions of 10 to 20 %, but leading to total production loss under favorable environmental conditions for the pathogen. Association Mapping (AM) is a promising choice for Quantitative Trait Locus (QTL) mapping, as it detects relationships between phenotypic variation and gene polymorphisms in existing germplasm without development of mapping populations. This article reports the identification of QTL for resistance to SHR based on candidate gene AM. RESULTS A collection of 94 sunflower inbred lines were tested for SHR under field conditions using assisted inoculation with the fungal pathogen Sclerotinia sclerotiorum. Given that no biological mechanisms or biochemical pathways have been clearly identified for SHR, 43 candidate genes were selected based on previous transcript profiling studies in sunflower and Brassica napus infected with S. sclerotiorum. Associations among SHR incidence and haplotype polymorphisms in 16 candidate genes were tested using Mixed Linear Models (MLM) that account for population structure and kinship relationships. This approach allowed detection of a significant association between the candidate gene HaRIC_B and SHR incidence (P < 0.01), accounting for a SHR incidence reduction of about 20 %. CONCLUSIONS These results suggest that AM will be useful in dissecting other complex traits in sunflower, thus providing a valuable tool to assist in crop breeding.
Collapse
Affiliation(s)
- Corina M Fusari
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA), 1686, Hurlingham, Buenos Aires, Argentina
| | - Julio A Di Rienzo
- Cátedra de Estadística y Biometría, Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba, 5000, Córdoba, Argentina
| | - Carolina Troglia
- Estación Experimental Agropecuaria Balcarce, INTA, 7620, Balcarce, Buenos Aires, Argentina
| | - Verónica Nishinakamasu
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA), 1686, Hurlingham, Buenos Aires, Argentina
| | - María Valeria Moreno
- Estación Experimental Agropecuaria Manfredi, INTA, 5988, Manfredi, Córdoba, Argentina
| | - Carla Maringolo
- Estación Experimental Agropecuaria Balcarce, INTA, 7620, Balcarce, Buenos Aires, Argentina
| | - Facundo Quiroz
- Estación Experimental Agropecuaria Balcarce, INTA, 7620, Balcarce, Buenos Aires, Argentina
| | - Daniel Álvarez
- Estación Experimental Agropecuaria Manfredi, INTA, 5988, Manfredi, Córdoba, Argentina
| | - Alberto Escande
- Estación Experimental Agropecuaria Balcarce, INTA, 7620, Balcarce, Buenos Aires, Argentina
| | - Esteban Hopp
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA), 1686, Hurlingham, Buenos Aires, Argentina
- Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Ruth Heinz
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA), 1686, Hurlingham, Buenos Aires, Argentina
- Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Verónica V Lia
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA), 1686, Hurlingham, Buenos Aires, Argentina
- Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Norma B Paniego
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA), 1686, Hurlingham, Buenos Aires, Argentina
| |
Collapse
|
27
|
Song G, Riemer C, Dickins B, Kim HL, Zhang L, Zhang Y, Hsu CH, Hardison RC, Nisc Comparative Sequencing Program, Green ED, Miller W. Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 2012; 4:586-601. [PMID: 22454131 PMCID: PMC3342878 DOI: 10.1093/gbe/evs032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2012] [Indexed: 12/13/2022] Open
Abstract
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, PA, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Pentony MM, Winters P, Penfold-Brown D, Drew K, Narechania A, DeSalle R, Bonneau R, Purugganan MD. The plant proteome folding project: structure and positive selection in plant protein families. Genome Biol Evol 2012; 4:360-71. [PMID: 22345424 PMCID: PMC3318447 DOI: 10.1093/gbe/evs015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Despite its importance, relatively little is known about the relationship between the structure, function, and evolution of proteins, particularly in land plant species. We have developed a database with predicted protein domains for five plant proteomes (http://pfp.bio.nyu.edu) and used both protein structural fold recognition and de novo Rosetta-based protein structure prediction to predict protein structure for Arabidopsis and rice proteins. Based on sequence similarity, we have identified ∼15,000 orthologous/paralogous protein family clusters among these species and used codon-based models to predict positive selection in protein evolution within 175 of these sequence clusters. Our results show that codons that display positive selection appear to be less frequent in helical and strand regions and are overrepresented in amino acid residues that are associated with a change in protein secondary structure. Like in other organisms, disordered protein regions also appear to have more selected sites. Structural information provides new functional insights into specific plant proteins and allows us to map positively selected amino acid sites onto protein structures and view these sites in a structural and functional context.
Collapse
Affiliation(s)
- M M Pentony
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Sarkar IN. A vector space model approach to identify genetically related diseases. J Am Med Inform Assoc 2012; 19:249-54. [PMID: 22227640 DOI: 10.1136/amiajnl-2011-000480] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE The relationship between diseases and their causative genes can be complex, especially in the case of polygenic diseases. Further exacerbating the challenges in their study is that many genes may be causally related to multiple diseases. This study explored the relationship between diseases through the adaptation of an approach pioneered in the context of information retrieval: vector space models. MATERIALS AND METHODS A vector space model approach was developed that bridges gene disease knowledge inferred across three knowledge bases: Online Mendelian Inheritance in Man, GenBank, and Medline. The approach was then used to identify potentially related diseases for two target diseases: Alzheimer disease and Prader-Willi Syndrome. RESULTS In the case of both Alzheimer Disease and Prader-Willi Syndrome, a set of plausible diseases were identified that may warrant further exploration. DISCUSSION This study furthers seminal work by Swanson, et al. that demonstrated the potential for mining literature for putative correlations. Using a vector space modeling approach, information from both biomedical literature and genomic resources (like GenBank) can be combined towards identification of putative correlations of interest. To this end, the relevance of the predicted diseases of interest in this study using the vector space modeling approach were validated based on supporting literature. CONCLUSION The results of this study suggest that a vector space model approach may be a useful means to identify potential relationships between complex diseases, and thereby enable the coordination of gene-based findings across multiple complex diseases.
Collapse
Affiliation(s)
- Indra Neil Sarkar
- Center for Clinical and Translational Science, University of Vermont, Burlington, Vermont 05405, USA.
| |
Collapse
|
30
|
Abstract
BACKGROUND A phylogenetic tree, showing ancestral relations among organisms, is commonly represented as a rooted tree with sets of bifurcating branches (dichotomies) for simplicity, although polytomies (multifurcating branches) may reflect more accurate evolutionary relationships. To represent the true evolutionary relationships, it is important to systematically identify the polytomies from a bifurcating tree and generate a taxonomy-compatible multifurcating tree. For this purpose we propose a novel approach, "PolyPhy", which would classify a set of bifurcating branches of a phylogenetic tree into a set of branches with dichotomies and polytomies by considering genome distances among genomes and tree topological properties. RESULTS PolyPhy employs a machine learning technique, BLR (Bayesian logistic regression) classifier, to identify possible bifurcating subtrees as polytomies from the trees resulted from ComPhy. Other than considering genome-scale distances between all pairs of species, PolyPhy also takes into account different properties of tree topology between dichotomy and polytomy, such as long-branch retraction and short-branch contraction, and quantifies these properties into comparable rates among different sub-branches. We extract three tree topological features, 'LR' (Leaf rate), 'IntraR' (Intra-subset branch rate) and 'InterR' (Inter-subset branch rate), all of which are calculated from bifurcating tree branch sets for classification. We have achieved F-measure (balanced measure between precision and recall) of 81% with about 0.9 area under the curve (AUC) of ROC. CONCLUSIONS PolyPhy is a fast and robust method to identify polytomies from phylogenetic trees based on genome-wide inference of evolutionary relationships among genomes. The software package and test data can be downloaded from http://digbio.missouri.edu/ComPhy/phyloTreeBiNonBi-1.0.zip.
Collapse
Affiliation(s)
- Guan Ning Lin
- Department of Computer Science and C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Department of Psychiatry, University of California, San Diego, CA 92093, USA
| | - Chao Zhang
- Department of Computer Science and C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Computer Science and C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
31
|
Lee EK, Cibrian-Jaramillo A, Kolokotronis SO, Katari MS, Stamatakis A, Ott M, Chiu JC, Little DP, Stevenson DW, McCombie WR, Martienssen RA, Coruzzi G, DeSalle R. A functional phylogenomic view of the seed plants. PLoS Genet 2011; 7:e1002411. [PMID: 22194700 PMCID: PMC3240601 DOI: 10.1371/journal.pgen.1002411] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 10/21/2011] [Indexed: 12/01/2022] Open
Abstract
A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to reconstruct their phylogenetic relationships. One of the more salient results is the resolution of some enigmatic relationships in seed plant phylogeny, such as the placement of Gnetales as sister to the rest of the gymnosperms. In using this novel phylogenomic approach, we were also able to identify overrepresented functional gene ontology categories in genes that provide positive branch support for major nodes prompting new hypotheses for genes associated with the diversification of angiosperms. For example, RNA interference (RNAi) has played a significant role in the divergence of monocots from other angiosperms, which has experimental support in Arabidopsis and rice. This analysis also implied that the second largest subunit of RNA polymerase IV and V (NRPD2) played a prominent role in the divergence of gymnosperms. This hypothesis is supported by the lack of 24nt siRNA in conifers, the maternal control of small RNA in the seeds of flowering plants, and the emergence of double fertilization in angiosperms. Our approach takes advantage of genomic data to define orthologs, reconstruct relationships, and narrow down candidate genes involved in plant evolution within a phylogenomic view of species' diversification.
Collapse
Affiliation(s)
- Ernest K. Lee
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Angelica Cibrian-Jaramillo
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
- Cullman Program in Molecular Systematics, The New York Botanical Garden, Bronx, New York, United States of America
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Sergios-Orestis Kolokotronis
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Manpreet S. Katari
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | | | - Michael Ott
- Department of Computer Science, Technische Universität München, Munich, Germany
| | - Joanna C. Chiu
- Department of Entomology, University of California Davis, Davis, California, United States of America
| | - Damon P. Little
- Cullman Program in Molecular Systematics, The New York Botanical Garden, Bronx, New York, United States of America
| | - Dennis Wm. Stevenson
- Cullman Program in Molecular Systematics, The New York Botanical Garden, Bronx, New York, United States of America
| | - W. Richard McCombie
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Robert A. Martienssen
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Gloria Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| |
Collapse
|
32
|
Kvist S, Narechania A, Oceguera-Figueroa A, Fuks B, Siddall ME. Phylogenomics of Reichenowia parasitica, an alphaproteobacterial endosymbiont of the freshwater leech Placobdella parasitica. PLoS One 2011; 6:e28192. [PMID: 22132238 PMCID: PMC3223239 DOI: 10.1371/journal.pone.0028192] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2011] [Accepted: 11/02/2011] [Indexed: 01/30/2023] Open
Abstract
Although several commensal alphaproteobacteria form close relationships with plant hosts where they aid in (e.g.,) nitrogen fixation and nodulation, only a few inhabit animal hosts. Among these, Reichenowia picta, R. ornata and R. parasitica, are currently the only known mutualistic, alphaproteobacterial endosymbionts to inhabit leeches. These bacteria are harbored in the epithelial cells of the mycetomal structures of their freshwater leech hosts, Placobdella spp., and these structures have no other obvious function than housing bacterial symbionts. However, the function of the bacterial symbionts has remained unclear. Here, we focused both on exploring the genomic makeup of R. parasitica and on performing a robust phylogenetic analysis, based on more data than previous hypotheses, to test its position among related bacteria. We sequenced a combined pool of host and symbiont DNA from 36 pairs of mycetomes and performed an in silico separation of the different DNA pools through subtractive scaffolding. The bacterial contigs were compared to 50 annotated bacterial genomes and the genome of the freshwater leech Helobdella robusta using a BLASTn protocol. Further, amino acid sequences inferred from the contigs were used as queries against the 50 bacterial genomes to establish orthology. A total of 358 orthologous genes were used for the phylogenetic analyses. In part, results suggest that R. parasitica possesses genes coding for proteins related to nitrogen fixation, iron/vitamin B translocation and plasmid survival. Our results also indicate that R. parasitica interacts with its host in part by transmembrane signaling and that several of its genes show orthology across Rhizobiaceae. The phylogenetic analyses support the nesting of R. parasitica within the Rhizobiaceae, as sister to a group containing Agrobacterium and Rhizobium species.
Collapse
Affiliation(s)
- Sebastian Kvist
- Richard Gilder Graduate School, American Museum of Natural History, New York, New York, United States of America
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Alejandro Oceguera-Figueroa
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
- Department of Biology, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Bella Fuks
- Long Island University Brooklyn Campus, Brooklyn, New York, United States of America
| | - Mark E. Siddall
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| |
Collapse
|
33
|
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 2011; 40:D1178-86. [PMID: 22110026 PMCID: PMC3245001 DOI: 10.1093/nar/gkr944] [Citation(s) in RCA: 2965] [Impact Index Per Article: 228.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.
Collapse
Affiliation(s)
- David M Goodstein
- US Department of Energy, Joint Genome Institute, Walnut Creek, CA 94598, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Salichos L, Rokas A. Evaluating ortholog prediction algorithms in a yeast model clade. PLoS One 2011; 6:e18755. [PMID: 21533202 PMCID: PMC3076445 DOI: 10.1371/journal.pone.0018755] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/15/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Accurate identification of orthologs is crucial for evolutionary studies and for functional annotation. Several algorithms have been developed for ortholog delineation, but so far, manually curated genome-scale biological databases of orthologous genes for algorithm evaluation have been lacking. We evaluated four popular ortholog prediction algorithms (MultiParanoid; and OrthoMCL; RBH: Reciprocal Best Hit; RSD: Reciprocal Smallest Distance; the last two extended into clustering algorithms cRBH and cRSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. RESULTS Examination of sensitivity [TP/(TP+FN)], specificity [TN/(TN+FP)], and accuracy [(TP+TN)/(TP+TN+FP+FN)] across a broad parameter range showed that cRBH was the most accurate and specific algorithm, whereas OrthoMCL was the most sensitive. Evaluation of the algorithms across a varying number of species showed that cRBH had the highest accuracy and lowest false discovery rate [FP/(FP+TP)], followed by cRSD. Of the six species in our set, three descended from an ancestor that underwent whole genome duplication. Subsequent differential duplicate loss events in the three descendants resulted in distinct classes of gene loss patterns, including cases where the genes retained in the three descendants are paralogs, constituting 'traps' for ortholog prediction algorithms. We found that the false discovery rate of all algorithms dramatically increased in these traps. CONCLUSIONS These results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones (e.g., OrthoMCL and MultiParanoid) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.
Collapse
Affiliation(s)
- Leonidas Salichos
- Department of Biological Sciences, Vanderbilt University, Nashville,
Tennessee, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville,
Tennessee, United States of America
| |
Collapse
|
35
|
Robbertse B, Yoder RJ, Boyd A, Reeves J, Spatafora JW. Hal: an automated pipeline for phylogenetic analyses of genomic data. PLOS CURRENTS 2011; 3:RRN1213. [PMID: 21327165 PMCID: PMC3038436 DOI: 10.1371/currents.rrn1213] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/07/2011] [Indexed: 11/21/2022]
Abstract
The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.
Collapse
Affiliation(s)
- Barbara Robbertse
- National Center for Biotechnology Information, Bethesda, Maryland; Peace Corps; Oregon State University and Bonzi Software Development
| | | | | | | | | |
Collapse
|
36
|
Trost B, Haakensen M, Pittet V, Ziola B, Kusalik A. Analysis and comparison of the pan-genomic properties of sixteen well-characterized bacterial genera. BMC Microbiol 2010; 10:258. [PMID: 20942950 PMCID: PMC3020658 DOI: 10.1186/1471-2180-10-258] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Accepted: 10/13/2010] [Indexed: 11/10/2022] Open
Abstract
Background The increasing availability of whole genome sequences allows the gene or protein content of different organisms to be compared, leading to burgeoning interest in the relatively new subfield of pan-genomics. However, while several studies have analyzed protein content relationships in specific groups of bacteria, there has yet to be a study that provides a general characterization of protein content relationships in a broad range of bacteria. Results A variation on reciprocal BLAST hits was used to infer relationships among proteins in several groups of bacteria, and data regarding protein conservation and uniqueness in different bacterial genera are reported in terms of "core proteomes", "unique proteomes", and "singlets". We also analyzed the relationship between protein content similarity and the percent identity of the 16S rRNA gene in pairs of bacterial isolates from the same genus, and found that the strength of this relationship varied substantially depending on the genus, perhaps reflecting different rates of genome evolution and/or horizontal gene transfer. Finally, core proteomes and unique proteomes were used to study the proteomic cohesiveness of several bacterial species, revealing that some bacterial species had little cohesiveness in their protein content, with some having fewer proteins unique to that species than randomly-chosen sets of isolates from the same genus. Conclusions The results described in this study aid our understanding of protein content relationships in different bacterial groups, allowing us to make further inferences regarding genome-environment relationships, genome evolution, and the soundness of existing taxonomic classifications.
Collapse
Affiliation(s)
- Brett Trost
- Department of Computer Science, University of Saskatchewan, 176 Thorvaldson Building, 110 Science Place, Saskatoon, Saskatchewan, S7N 5C9, Canada.
| | | | | | | | | |
Collapse
|
37
|
Morescalchi MA, Barucca M, Stingo V, Capriglione T. Polypteridae (Actinopterygii: Cladistia) and DANA-SINEs insertions. Mar Genomics 2010; 3:79-84. [PMID: 21798200 DOI: 10.1016/j.margen.2010.06.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Revised: 06/07/2010] [Accepted: 06/15/2010] [Indexed: 01/09/2023]
Abstract
SINE sequences are interspersed throughout virtually all eukaryotic genomes and greatly outnumber the other repetitive elements. These sequences are of increasing interest for phylogenetic studies because of their diagnostic power for establishing common ancestry among taxa, once properly characterized. We identified and characterized a peculiar family of composite tRNA-derived short interspersed SINEs, DANA-SINEs, associated with mutational activities in Danio rerio, in a group of species belonging to one of the most basal bony fish families, the Polypteridae, in order to investigate their own inner specific phylogenetic relationships. DANA sequences were identified, sequenced and then localized, by means of fluorescent in situ hybridization (FISH), in six Polypteridae species (Polypterus delhezi, P. ornatipinnis, P. palmas, P. buettikoferi P. senegalus and Erpetoichthys calabaricus) After cloning, the sequences obtained were aligned for phylogenetic analysis, comparing them with three Dipnoan lungfish species (Protopterus annectens, P. aethiopicus, Lepidosiren paradoxa), and Lethenteron reissneri (Petromyzontidae)was used as outgroup. The obtained overlapping MP, ML and NJ tree clustered together the species belonging to the two taxonomically different Osteichthyans groups: the Polypteridae, by one side, and the Protopteridae by the other, with the monotypic genus Erpetoichthys more distantly related to the Polypterus genus comprising three distinct groups: P. palmas and P. buettikoferi, P. delhezi and P. ornatipinnis and P. senegalus. In situ hybridization with DANA probes marked along the whole chromosome arms in the metaphases of all the Polypteridae species examined.
Collapse
Affiliation(s)
- Maria Alessandra Morescalchi
- Dipartimento di Scienze della Vita, Seconda Università degli Studi di Napoli, via Vivaldi 43, 81100, Caserta, Italy.
| | | | | | | |
Collapse
|
38
|
Cibrián-Jaramillo A, De la Torre-Bárcena JE, Lee EK, Katari MS, Little DP, Stevenson DW, Martienssen R, Coruzzi GM, DeSalle R. Using phylogenomic patterns and gene ontology to identify proteins of importance in plant evolution. Genome Biol Evol 2010; 2:225-39. [PMID: 20624728 PMCID: PMC2997538 DOI: 10.1093/gbe/evq012] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/14/2010] [Indexed: 01/01/2023] Open
Abstract
We use measures of congruence on a combined expressed sequenced tag genome phylogeny to identify proteins that have potential significance in the evolution of seed plants. Relevant proteins are identified based on the direction of partitioned branch and hidden support on the hypothesis obtained on a 16-species tree, constructed from 2,557 concatenated orthologous genes. We provide a general method for detecting genes or groups of genes that may be under selection in directions that are in agreement with the phylogenetic pattern. Gene partitioning methods and estimates of the degree and direction of support of individual gene partitions to the overall data set are used. Using this approach, we correlate positive branch support of specific genes for key branches in the seed plant phylogeny. In addition to basic metabolic functions, such as photosynthesis or hormones, genes involved in posttranscriptional regulation by small RNAs were significantly overrepresented in key nodes of the phylogeny of seed plants. Two genes in our matrix are of critical importance as they are involved in RNA-dependent regulation, essential during embryo and leaf development. These are Argonaute and the RNA-dependent RNA polymerase 6 found to be overrepresented in the angiosperm clade. We use these genes as examples of our phylogenomics approach and show that identifying partitions or genes in this way provides a platform to explain some of the more interesting organismal differences among species, and in particular, in the evolution of plants.
Collapse
Affiliation(s)
- Angélica Cibrián-Jaramillo
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Bonaventura MPD, Lee EK, DeSalle R, Planet PJ. A whole-genome phylogeny of the family Pasteurellaceae. Mol Phylogenet Evol 2010; 54:950-6. [DOI: 10.1016/j.ympev.2009.08.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 08/05/2009] [Accepted: 08/11/2009] [Indexed: 11/16/2022]
|
40
|
Vedhagiri K, Natarajaseenivasan K, Chellapandi P, Prabhakaran SG, Selvin J, Sharma S, Vijayachari P. Evolutionary implication of outer membrane lipoprotein-encoding genes ompL1, UpL32 and lipL41 of pathogenic Leptospira species. GENOMICS PROTEOMICS & BIOINFORMATICS 2010; 7:96-106. [PMID: 19944382 PMCID: PMC5054405 DOI: 10.1016/s1672-0229(08)60038-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Leptospirosis is recognized as the most widespread zoonosis with a global distribution. In this study, the antigenic variation in Leptospira interrogans and Leptospira borgpetersenii isolated from human urine and field rat kidney was preliminarily confirmed by microscopic agglutination test using monoclonal antibodies, and was further subjected to amplification and identification of outer membrane lipoproteins with structural gene variation. Sequence similarity analysis revealed that these protein sequences, namely OmpL1, LipL32 and LipL41, showed no more homologies to outer membrane lipoproteins of non-pathogenic Leptospira and other closely related Spirochetes, but showed a strong identity within L. interrogans, suggesting intra-specific phylogenetic lineages that might be originated from a common pathogenic leptospiral origin. Moreover, the ompL1 gene showed more antigenic variation than UpL32 and lipL41 due to less conservation in secondary structural evolution within closely related species. Phylogenetically, ompLl and lipL41 of these strains gave a considerable proximity to L. weilii and L. santaro-sai. The ompLl gene of L. interrogans clustered distinctly from other pathogenic and non-pathogenic leptospiral species. The diversity of ompL genes has been analyzed and it envisaged that sequence-specific variations at antigenic determinant sites would result in slow evolutionary changes along with new serovar origination within closely related species. Thus, a crucial work on effective recombinant vaccine development and engineered antibodies will hopefully meet to solve the therapeutic challenges.
Collapse
Affiliation(s)
- K Vedhagiri
- Division of Medical Microbiology, Department of Microbiology, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620024, Tamilnadu, India
| | | | | | | | | | | | | |
Collapse
|
41
|
Camp E, Sánchez-Sánchez AV, García-España A, Desalle R, Odqvist L, Enrique O'Connor J, Mullor JL. Nanog regulates proliferation during early fish development. Stem Cells 2009; 27:2081-91. [PMID: 19544407 DOI: 10.1002/stem.133] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Nanog is involved in controlling pluripotency and differentiation of stem cells in vitro. However, its function in vivo has been studied only in mouse embryos and various reports suggest that Nanog may not be required for the regulation of differentiation. To better understand endogenous Nanog function, more animal models should be introduced to complement the murine model. Here, we have identified the homolog of the mammalian Nanog gene in teleost fish and describe the endogenous expression of Ol-Nanog mRNA and protein during medaka (Oryzias latipes) embryonic development and in the adult gonads. Using medaka fish as a vertebrate model to study Nanog function, we demonstrate that Ol-Nanog is necessary for S-phase transition and proliferation in the developing embryo. Moreover, inhibition or overexpression of Ol-Nanog does not affect gene expression of various pluripotency and differentiation markers, suggesting that this transcription factor may not play a direct role in embryonic germ layer differentiation. STEM CELLS 2009;27:2081-2091.
Collapse
Affiliation(s)
- Esther Camp
- Department of Regenerative Medicine, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | | | | | | | | | | | | |
Collapse
|
42
|
Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, Van de Peer Y, Vandepoele K. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. THE PLANT CELL 2009; 21:3718-31. [PMID: 20040540 PMCID: PMC2814516 DOI: 10.1105/tpc.109.071506] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Revised: 12/04/2009] [Accepted: 12/10/2009] [Indexed: 05/17/2023]
Abstract
The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage.
Collapse
Affiliation(s)
- Sebastian Proost
- Department of Plant Systems Biology, Flanders Institute for Biotechnology, B-9052 Ghent, Belgium.
| | | | | | | | | | | | | |
Collapse
|
43
|
|
44
|
de la Torre-Bárcena JE, Kolokotronis SO, Lee EK, Stevenson DW, Brenner ED, Katari MS, Coruzzi GM, DeSalle R. The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data. PLoS One 2009; 4:e5764. [PMID: 19503618 PMCID: PMC2685480 DOI: 10.1371/journal.pone.0005764] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 04/16/2009] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group. METHODOLOGY We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations. CONCLUSIONS We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.
Collapse
Affiliation(s)
- Jose Eduardo de la Torre-Bárcena
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
- Cullman Molecular Systematics Laboratory and Genomics Laboratory, The New York Botanical Garden, Bronx, New York, United States of America
| | - Sergios-Orestis Kolokotronis
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Ernest K. Lee
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Dennis Wm. Stevenson
- Cullman Molecular Systematics Laboratory and Genomics Laboratory, The New York Botanical Garden, Bronx, New York, United States of America
| | - Eric D. Brenner
- Cullman Molecular Systematics Laboratory and Genomics Laboratory, The New York Botanical Garden, Bronx, New York, United States of America
| | - Manpreet S. Katari
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Gloria M. Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| |
Collapse
|
45
|
Sato N. Gclust: trans-kingdom classification of proteins using automatic individual threshold setting. Bioinformatics 2009; 25:599-605. [DOI: 10.1093/bioinformatics/btp047] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
46
|
Abstract
OrthologID (http://nypg.bio.nyu.edu/orthologid/) allows for the rapid and accurate identification of gene orthology within a character-based phylogenetic framework. The Web application has two functions - an orthologous group search and a query orthology classification. The former determines orthologous gene sets for complete genomes and identifies diagnostic characters that define each orthologous gene set; and the latter allows for the classification of unknown query sequences to orthology groups. The first module of the Web application, the gene family generator, uses an E-value based approach to sort genes into gene families. An alignment constructor then aligns members of gene families and the resulting gene family alignments are submitted to the tree builder to obtain gene family guide trees. Finally, the diagnostics generator extracts diagnostic characters from guide trees and these diagnostics are used to determine gene orthology for query sequences.
Collapse
Affiliation(s)
- Mary Egan
- Department of Biology, Montclair State University, Montclair, NJ, USA
| | | | | | | | | |
Collapse
|
47
|
Rautenberg A, Filatov D, Svennblad B, Heidari N, Oxelman B. Conflicting phylogenetic signals in the SlX1/Y1 gene in Silene. BMC Evol Biol 2008; 8:299. [PMID: 18973668 PMCID: PMC2636791 DOI: 10.1186/1471-2148-8-299] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2008] [Accepted: 10/30/2008] [Indexed: 11/19/2022] Open
Abstract
Background Increasing evidence from DNA sequence data has revealed that phylogenies based on different genes may drastically differ from each other. This may be due to either inter- or intralineage processes, or to methodological or stochastic errors. Here we investigate a spectacular case where two parts of the same gene (SlX1/Y1) show conflicting phylogenies within Silene (Caryophyllaceae). SlX1 and SlY1 are sex-linked genes on the sex chromosomes of dioecious members of Silene sect. Elisanthe. Results We sequenced the homologues of the SlX1/Y1 genes in several Sileneae species. We demonstrate that different parts of the SlX1/Y1 region give different phylogenetic signals. The major discrepancy is that Silene vulgaris and S. sect. Conoimorpha (S. conica and relatives) exchange positions. To determine whether gene duplication followed by recombination (an intralineage process) may explain the phylogenetic conflict in the Silene SlX1/Y1 gene, we use a novel probabilistic, multiple primer-pair PCR approach. We did not find any evidence supporting gene duplication/loss as explanation to the phylogenetic conflict. Conclusion The phylogenetic conflict in the Silene SlX1/Y1 gene cannot be explained by paralogy or artefacts, such as in vitro recombination during PCR. The support for the conflict is strong enough to exclude methodological or stochastic errors as likely sources. Instead, the phylogenetic incongruence may have been caused by recombination of two divergent alleles following ancient interspecific hybridization or incomplete lineage sorting. These events probably took place several million years ago. This example clearly demonstrates that different parts of the genome may have different evolutionary histories and stresses the importance of using multiple genes in reconstruction of taxonomic relationships.
Collapse
Affiliation(s)
- Anja Rautenberg
- Department of Systematic Biology, EBC, Uppsala University, Sweden.
| | | | | | | | | |
Collapse
|
48
|
Abstract
Automated use of phylogenetic trees to deduce orthology relationships in proteins. Reliable orthology prediction is central to comparative genomics. Although orthology is defined by phylogenetic criteria, most automated prediction methods are based on pairwise sequence comparisons. Recently, automated phylogeny-based orthology prediction has emerged as a feasible alternative for genome-wide studies.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics and Genomics Program, Center for Genomic Regulation, Doctor Aiguader 88, Barcelona, Spain.
| |
Collapse
|
49
|
Fu Z, Jiang T. Clustering of main orthologs for multiple genomes. J Bioinform Comput Biol 2008; 6:573-84. [PMID: 18574863 DOI: 10.1142/s0219720008003540] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 12/01/2007] [Accepted: 01/03/2008] [Indexed: 11/18/2022]
Abstract
The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication has been proposed in Fu et al. MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario, minimizing the number of genome rearrangement and (postspeciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwise genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse, and human genomes, and validate our results using gene annotations and gene function classifications in the public databases. We further compare our results to the ortholog clusters predicted by MultiParanoid, which is an extension of the well-known program InParanoid for pairwise genome comparisons. The comparison reveals that MultiMSOAR gives more detailed and accurate orthology information, since it can effectively distinguish main orthologs from inparalogs.
Collapse
Affiliation(s)
- Zheng Fu
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA 92521, USA.
| | | |
Collapse
|
50
|
Wu H, Mao F, Olman V, Xu Y. On application of directons to functional classification of genes in prokaryotes. Comput Biol Chem 2008; 32:176-84. [PMID: 18440870 DOI: 10.1016/j.compbiolchem.2008.02.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 02/15/2008] [Indexed: 11/30/2022]
Abstract
Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.
Collapse
Affiliation(s)
- Hongwei Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Savannah, GA 31407, USA
| | | | | | | |
Collapse
|