1
|
Brieuc MSO, Naish KA. Detecting signatures of positive selection in partial sequences generated on a large scale: pitfalls, procedures and resources. Mol Ecol Resour 2011; 11 Suppl 1:172-83. [PMID: 21429173 DOI: 10.1111/j.1755-0998.2010.02948.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Studying the actions of selection provides insight into adaptation, population divergence and gene function. Next-generation sequencing produces large amounts of partial sequences, potentially facilitating efforts to detect signatures of selection based on comparisons between synonymous (d(S)) and nonsynonymous (d(N)) substitutions, and single nucleotide polymorphism assays placed in selected genes would improve the ability to study adaptation in population surveys. However, sequences generated by these technologies are typically short. In nonmodel organisms that are a focus of evolutionary studies, the lack of a reference genome that facilitates the assembly of short sequences has limited surveys of positive selection in large numbers of genes. Here, we describe a series of steps to facilitate these surveys. We provide PERL scripts to assist data analysis, and describe the use of commonly available programs. We demonstrate these approaches in six salmon species, which have partially duplicated genomes. We recommend using multiway blast to optimize the number of alignments between partial coding sequences. Reading frames should be manually detected after alignment with sequences in Genbank using the BLASTX program. We encourage the use of a phylogenetic approach to separate orthologs from paralogs in duplicated genomes. Simple simulations on a gene known to have undergone selection in salmon species, transferrin, showed that the ability to detect selection in short sequences (<600 bp) depended on the proportion of codons under selection (1-2%) within that sequence. This relationship was less relevant in longer sequences. In this exploratory study, we detected 11 genes showing evidence of positive selection.
Collapse
Affiliation(s)
- Marine S O Brieuc
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, 98195, USA.
| | | |
Collapse
|
2
|
Steinway SN, Dannenfelser R, Laucius CD, Hayes JE, Nayak S. JCoDA: a tool for detecting evolutionary selection. BMC Bioinformatics 2010; 11:284. [PMID: 20507581 PMCID: PMC2887424 DOI: 10.1186/1471-2105-11-284] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 05/27/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. RESULTS JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. CONCLUSIONS JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
Collapse
Affiliation(s)
- Steven N Steinway
- Department of Biology, The College of New Jersey, 2000 Pennington Road, Ewing, NJ 08628, USA
| | | | | | | | | |
Collapse
|
3
|
Sherwood CC, Raghanti MA, Stimpson CD, Spocter MA, Uddin M, Boddy AM, Wildman DE, Bonar CJ, Lewandowski AH, Phillips KA, Erwin JM, Hof PR. Inhibitory interneurons of the human prefrontal cortex display conserved evolution of the phenotype and related genes. Proc Biol Sci 2009; 277:1011-20. [PMID: 19955152 DOI: 10.1098/rspb.2009.1831] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Inhibitory interneurons participate in local processing circuits, playing a central role in executive cognitive functions of the prefrontal cortex. Although humans differ from other primates in a number of cognitive domains, it is not currently known whether the interneuron system has changed in the course of primate evolution leading to our species. In this study, we examined the distribution of different interneuron subtypes in the prefrontal cortex of anthropoid primates as revealed by immunohistochemistry against the calcium-binding proteins calbindin, calretinin and parvalbumin. In addition, we tested whether genes involved in the specification, differentiation and migration of interneurons show evidence of positive selection in the evolution of humans. Our findings demonstrate that cellular distributions of interneuron subtypes in human prefrontal cortex are similar to other anthropoid primates and can be explained by general scaling rules. Furthermore, genes underlying interneuron development are highly conserved at the amino acid level in primate evolution. Taken together, these results suggest that the prefrontal cortex in humans retains a similar inhibitory circuitry to that in closely related primates, even though it performs functional operations that are unique to our species. Thus, it is likely that other significant modifications to the connectivity and molecular biology of the prefrontal cortex were overlaid on this conserved interneuron architecture in the course of human evolution.
Collapse
Affiliation(s)
- Chet C Sherwood
- Department of Anthropology, The George Washington University, Washington, DC 20052, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Phylogenomic analyses reveal convergent patterns of adaptive evolution in elephant and human ancestries. Proc Natl Acad Sci U S A 2009; 106:20824-9. [PMID: 19926857 DOI: 10.1073/pnas.0911239106] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Specific sets of brain-expressed genes, such as aerobic energy metabolism genes, evolved adaptively in the ancestry of humans and may have evolved adaptively in the ancestry of other large-brained mammals. The recent addition of genomes from two afrotherians (elephant and tenrec) to the expanding set of publically available sequenced mammalian genomes provided an opportunity to test this hypothesis. Elephants resemble humans by having large brains and long life spans; tenrecs, in contrast, have small brains and short life spans. Thus, we investigated whether the phylogenomic patterns of adaptive evolution are more similar between elephant and human than between either elephant and tenrec lineages or human and mouse lineages, and whether aerobic energy metabolism genes are especially well represented in the elephant and human patterns. Our analyses encompassed approximately 6,000 genes in each of these lineages with each gene yielding extensive coding sequence matches in interordinal comparisons. Each gene's nonsynonymous and synonymous nucleotide substitution rates and dN/dS ratios were determined. Then, from gene ontology information on genes with the higher dN/dS ratios, we identified the more prevalent sets of genes that belong to specific functional categories and that evolved adaptively. Elephant and human lineages showed much slower nucleotide substitution rates than tenrec and mouse lineages but more adaptively evolved genes. In correlation with absolute brain size and brain oxygen consumption being largest in elephants and next largest in humans, adaptively evolved aerobic energy metabolism genes were most evident in the elephant lineage and next most evident in the human lineage.
Collapse
|
5
|
Hou ZC, Romero R, Wildman DE. Phylogeny of the Ferungulata (Mammalia: Laurasiatheria) as determined from phylogenomic data. Mol Phylogenet Evol 2009; 52:660-4. [PMID: 19435603 DOI: 10.1016/j.ympev.2009.05.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Revised: 04/17/2009] [Accepted: 05/04/2009] [Indexed: 11/17/2022]
Abstract
Great progress has been made toward resolving the evolutionary relationships among extant mammals, yet there are still areas of disagreement. The relationships among ferungulates that have high quality draft genome sequences available (i.e. dog, cow, horse) are unresolved, and thus we examined their phylogeny using currently known mammalian 1:1 orthologs. This dataset consists of 40 million base pairs from 2705 protein-coding genes. Maximum likelihood and Bayesian analyses of the combined and individual gene phylogenies strongly support a sister grouping of cow and horse to the exclusion of dog although topology tests could not rule out a horse and dog sister group relationship.
Collapse
Affiliation(s)
- Zhuo-Cheng Hou
- Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development/NIH/DHHS, Detroit, MI 48201, USA
| | | | | |
Collapse
|
6
|
Toleno DM, Renaud G, Wolfsberg TG, Islam M, Wildman DE, Siegmund KD, Hacia JG. Development and evaluation of new mask protocols for gene expression profiling in humans and chimpanzees. BMC Bioinformatics 2009; 10:77. [PMID: 19265541 PMCID: PMC2660304 DOI: 10.1186/1471-2105-10-77] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2008] [Accepted: 03/05/2009] [Indexed: 12/02/2022] Open
Abstract
Background Cross-species gene expression analyses using oligonucleotide microarrays designed to evaluate a single species can provide spurious results due to mismatches between the interrogated transcriptome and arrayed probes. Based on the most recent human and chimpanzee genome assemblies, we developed updated and accessible probe masking methods that allow human Affymetrix oligonucleotide microarrays to be used for robust genome-wide expression analyses in both species. In this process, only data from oligonucleotide probes predicted to have robust hybridization sensitivity and specificity for both transcriptomes are retained for analysis. Results To characterize the utility of this resource, we applied our mask protocols to existing expression data from brains, livers, hearts, testes, and kidneys derived from both species and determined the effects probe numbers have on expression scores of specific transcripts. In all five tissues, probe sets with decreasing numbers of probes showed non-linear trends towards increased variation in expression scores. The relationships between expression variation and probe number in brain data closely matched those observed in simulated expression data sets subjected to random probe masking. However, there is evidence that additional factors affect the observed relationships between gene expression scores and probe number in tissues such as liver and kidney. In parallel, we observed that decreasing the number of probes within probe sets lead to linear increases in both gained and lost inferences of differential cross-species expression in all five tissues, which will affect the interpretation of expression data subject to masking. Conclusion We introduce a readily implemented and updated resource for human and chimpanzee transcriptome analysis through a commonly used microarray platform. Based on empirical observations derived from the analysis of five distinct data sets, we provide novel guidelines for the interpretation of masked data that take the number of probes present in a given probe set into consideration. These guidelines are applicable to other customized applications that involve masking data from specific subsets of probes.
Collapse
Affiliation(s)
- Donna M Toleno
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA 90089, USA.
| | | | | | | | | | | | | |
Collapse
|
7
|
Wong P, Althammer S, Hildebrand A, Kirschner A, Pagel P, Geissler B, Smialowski P, Blöchl F, Oesterheld M, Schmidt T, Strack N, Theis FJ, Ruepp A, Frishman D. An evolutionary and structural characterization of mammalian protein complex organization. BMC Genomics 2008; 9:629. [PMID: 19108706 PMCID: PMC2645396 DOI: 10.1186/1471-2164-9-629] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2008] [Accepted: 12/23/2008] [Indexed: 12/25/2022] Open
Abstract
Background We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. Results As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. Conclusion We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.
Collapse
Affiliation(s)
- Philip Wong
- Helmholtz Center Munich-German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Ingolstädter Landstrasse 1, Neuherberg, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Hou Z, Romero R, Uddin M, Than NG, Wildman DE. Adaptive history of single copy genes highly expressed in the term human placenta. Genomics 2008; 93:33-41. [PMID: 18848617 DOI: 10.1016/j.ygeno.2008.09.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2008] [Revised: 08/06/2008] [Accepted: 09/05/2008] [Indexed: 11/25/2022]
Abstract
The chorioallantoic placenta is a shared derived feature of "placental" mammals essential for the success of eutherian reproduction. Identifying the genes involved in the emergence of the placenta may provide clues for understanding the biology of this organ. Here we identify among 4960 single copy genes in mammals, 222 that show high expression levels in human placentas at term. Further, we present evidence that 94 of these 222 genes evolved adaptively during human evolutionary history since the time of the last common ancestor of eutherian mammals. Remarkably, the majority of positive selection occurred on the eutherian stem lineage suggesting that ancient adaptations have been retained in the human placenta. Of these positively selected genes, 28 have been shown to play a role in human pregnancy and placental biology, and at least 26 have important pregnancy-related phenotypes in mice. Adaptations in genes highly expressed in human placenta are attractive candidates for functional and clinical studies.
Collapse
Affiliation(s)
- Zhuocheng Hou
- Perinatology Research Branch, NICHD/NIH/DHHS Wayne State University, Detroit, MI 48201, USA
| | | | | | | | | |
Collapse
|
9
|
Distinct genomic signatures of adaptation in pre- and postnatal environments during human evolution. Proc Natl Acad Sci U S A 2008; 105:3215-20. [PMID: 18305157 DOI: 10.1073/pnas.0712400105] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The human genome evolution project seeks to reveal the genetic underpinnings of key phenotypic features that are distinctive of humans, such as a greatly enlarged cerebral cortex, slow development, and long life spans. This project has focused predominantly on genotypic changes during the 6-million-year descent from the last common ancestor (LCA) of humans and chimpanzees. Here, we argue that adaptive genotypic changes during earlier periods of evolutionary history also helped shape the distinctive human phenotype. Using comparative genome sequence data from 10 vertebrate species, we find a signature of human ancestry-specific adaptive evolution in 1,240 genes during their descent from the LCA with rodents. We also find that the signature of adaptive evolution is significantly different for highly expressed genes in human fetal and adult-stage tissues. Functional annotation clustering shows that on the ape stem lineage, an especially evident adaptively evolved biological pathway contains genes that function in mitochondria, are crucially involved in aerobic energy production, and are highly expressed in two energy-demanding tissues, heart and brain. Also, on this ape stem lineage, there was adaptive evolution among genes associated with human autoimmune and aging-related diseases. During more recent human descent, the adaptively evolving, highly expressed genes in fetal brain are involved in mediating neuronal connectivity. Comparing adaptively evolving genes from pre- and postnatal-stage tissues suggests that different selective pressures act on the development vs. the maintenance of the human phenotype.
Collapse
|
10
|
Wildman DE, Uddin M, Opazo JC, Liu G, Lefort V, Guindon S, Gascuel O, Grossman LI, Romero R, Goodman M. Genomics, biogeography, and the diversification of placental mammals. Proc Natl Acad Sci U S A 2007; 104:14395-400. [PMID: 17728403 PMCID: PMC1958817 DOI: 10.1073/pnas.0704342104] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2007] [Indexed: 11/18/2022] Open
Abstract
Previous molecular analyses of mammalian evolutionary relationships involving a wide range of placental mammalian taxa have been restricted in size from one to two dozen gene loci and have not decisively resolved the basal branching order within Placentalia. Here, on extracting from thousands of gene loci both their coding nucleotide sequences and translated amino acid sequences, we attempt to resolve key uncertainties about the ancient branching pattern of crown placental mammals. Focusing on approximately 1,700 conserved gene loci, those that have the more slowly evolving coding sequences, and using maximum-likelihood, Bayesian inference, maximum parsimony, and neighbor-joining (NJ) phylogenetic tree reconstruction methods, we find from almost all results that a clade (the southern Atlantogenata) composed of Afrotheria and Xenarthra is the sister group of all other (the northern Boreoeutheria) crown placental mammals, among boreoeutherians Rodentia groups with Lagomorpha, and the resultant Glires is close to Primates. Only the NJ tree for nucleotide sequences separates Rodentia (murids) first and then Lagomorpha (rabbit) from the other placental mammals. However, this nucleotide NJ tree still depicts Atlantogenata and Boreoeutheria but minus Rodentia and Lagomorpha. Moreover, the NJ tree for amino acid sequences does depict the basal separation to be between Atlantogenata and a Boreoeutheria that includes Rodentia and Lagomorpha. Crown placental mammalian diversification appears to be largely the result of ancient plate tectonic events that allowed time for convergent phenotypes to evolve in the descendant clades.
Collapse
Affiliation(s)
- Derek E. Wildman
- Perinatology Research Branch, National Institute of Child Health and Human Development/National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20892
- Center For Molecular Medicine and Genetics, and
- Departments of Obstetrics and Gynecology and
| | | | - Juan C. Opazo
- Center For Molecular Medicine and Genetics, and
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588; and
| | - Guozhen Liu
- Center For Molecular Medicine and Genetics, and
| | - Vincent Lefort
- Laboratory of Computer Science, Robotics, and Microelectronics, Centre National de la Recherche Scientifique, Université Montpellier II, 161 Rue Ada, 34392 Montpellier, France
| | - Stephane Guindon
- Laboratory of Computer Science, Robotics, and Microelectronics, Centre National de la Recherche Scientifique, Université Montpellier II, 161 Rue Ada, 34392 Montpellier, France
| | - Olivier Gascuel
- Laboratory of Computer Science, Robotics, and Microelectronics, Centre National de la Recherche Scientifique, Université Montpellier II, 161 Rue Ada, 34392 Montpellier, France
| | | | - Roberto Romero
- Perinatology Research Branch, National Institute of Child Health and Human Development/National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20892
| | - Morris Goodman
- Center For Molecular Medicine and Genetics, and
- Anatomy and Cell Biology, Wayne State University, Detroit, MI 48201
| |
Collapse
|