1
|
The Structure of Evolutionary Model Space for Proteins across the Tree of Life. BIOLOGY 2023; 12:biology12020282. [PMID: 36829559 PMCID: PMC9952988 DOI: 10.3390/biology12020282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/04/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023]
Abstract
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the "model space" for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Collapse
|
2
|
Garcia PS, Duchemin W, Flandrois JP, Gribaldo S, Grangeasse C, Brochier-Armanet C. A Comprehensive Evolutionary Scenario of Cell Division and Associated Processes in the Firmicutes. Mol Biol Evol 2021; 38:2396-2412. [PMID: 33533884 PMCID: PMC8136486 DOI: 10.1093/molbev/msab034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The cell cycle is a fundamental process that has been extensively studied in bacteria. However, many of its components and their interactions with machineries involved in other cellular processes are poorly understood. Furthermore, most knowledge relies on the study of a few models, but the real diversity of the cell division apparatus and its evolution are largely unknown. Here, we present a massive in-silico analysis of cell division and associated processes in around 1,000 genomes of the Firmicutes, a major bacterial phylum encompassing models (i.e. Bacillus subtilis, Streptococcus pneumoniae, and Staphylococcus aureus), as well as many important pathogens. We analyzed over 160 proteins by using an original approach combining phylogenetic reconciliation, phylogenetic profiles, and gene cluster survey. Our results reveal the presence of substantial differences among clades and pinpoints a number of evolutionary hotspots. In particular, the emergence of Bacilli coincides with an expansion of the gene repertoires involved in cell wall synthesis and remodeling. We also highlight major genomic rearrangements at the emergence of Streptococcaceae. We establish a functional network in Firmicutes that allows identifying new functional links inside one same process such as between FtsW (peptidoglycan polymerase) and a previously undescribed Penicilin-Binding Protein or between different processes, such as replication and cell wall synthesis. Finally, we identify new candidates involved in sporulation and cell wall synthesis. Our results provide a previously undescribed view on the diversity of the bacterial cell cycle, testable hypotheses for further experimental studies, and a methodological framework for the analysis of any other biological system.
Collapse
Affiliation(s)
- Pierre S Garcia
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918 Villeurbanne F-69622, France.,Molecular Microbiology and Structural Biochemistry, UMR 5086, Université Claude Bernard Lyon 1, CNRS, Lyon, France.,Department of Microbiology, Unit "Evolutionary Biology of the Microbial Cell", Institut Pasteur, Paris, France
| | - Wandrille Duchemin
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918 Villeurbanne F-69622, France
| | - Jean-Pierre Flandrois
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918 Villeurbanne F-69622, France
| | - Simonetta Gribaldo
- Department of Microbiology, Unit "Evolutionary Biology of the Microbial Cell", Institut Pasteur, Paris, France
| | - Christophe Grangeasse
- Molecular Microbiology and Structural Biochemistry, UMR 5086, Université Claude Bernard Lyon 1, CNRS, Lyon, France
| | - Céline Brochier-Armanet
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918 Villeurbanne F-69622, France
| |
Collapse
|
3
|
Norn C, André I, Theobald DL. A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices. Protein Sci 2021; 30:2057-2068. [PMID: 34218472 PMCID: PMC8442976 DOI: 10.1002/pro.4155] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/25/2021] [Accepted: 06/29/2021] [Indexed: 12/30/2022]
Abstract
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.
Collapse
Affiliation(s)
- Christoffer Norn
- Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Ingemar André
- Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Douglas L Theobald
- Biochemistry Department, Brandeis University, Waltham, Massachusetts, USA
| |
Collapse
|
4
|
Chang H, Nie Y, Zhang N, Zhang X, Sun H, Mao Y, Qiu Z, Huang Y. MtOrt: an empirical mitochondrial amino acid substitution model for evolutionary studies of Orthoptera insects. BMC Evol Biol 2020; 20:57. [PMID: 32429841 PMCID: PMC7236349 DOI: 10.1186/s12862-020-01623-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 05/05/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amino acid substitution models play an important role in inferring phylogenies from proteins. Although different amino acid substitution models have been proposed, only a few were estimated from mitochondrial protein sequences for specific taxa such as the mtArt model for Arthropoda. The increasing of mitochondrial genome data from broad Orthoptera taxa provides an opportunity to estimate the Orthoptera-specific mitochondrial amino acid empirical model. RESULTS We sequenced complete mitochondrial genomes of 54 Orthoptera species, and estimated an amino acid substitution model (named mtOrt) by maximum likelihood method based on the 283 complete mitochondrial genomes available currently. The results indicated that there are obvious differences between mtOrt and the existing models, and the new model can better fit the Orthoptera mitochondrial protein datasets. Moreover, topologies of trees constructed using mtOrt and existing models are frequently different. MtOrt does indeed have an impact on likelihood improvement as well as tree topologies. The comparisons between the topologies of trees constructed using mtOrt and existing models show that the new model outperforms the existing models in inferring phylogenies from Orthoptera mitochondrial protein data. CONCLUSIONS The new mitochondrial amino acid substitution model of Orthoptera shows obvious differences from the existing models, and outperforms the existing models in inferring phylogenies from Orthoptera mitochondrial protein sequences.
Collapse
Affiliation(s)
- Huihui Chang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Yimeng Nie
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Nan Zhang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Xue Zhang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Huimin Sun
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Ying Mao
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Zhongying Qiu
- School of Basic Medical Sciences & Shaanxi Key Laboratory of Brain Disorders, Xi'an Medical University, Xi'an, 710021, China
| | - Yuan Huang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China.
| |
Collapse
|
5
|
FLAVI: An Amino Acid Substitution Model for Flaviviruses. J Mol Evol 2020; 88:445-452. [PMID: 32356020 DOI: 10.1007/s00239-020-09943-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 04/15/2020] [Indexed: 10/24/2022]
Abstract
Amino acid substitution models represent substitution rates among amino acids during the evolution. The models play an important role in analyzing protein sequences, especially inferring phylogenies. The rapid evolution of flaviviruses is expanding the threat in public health. A number of models have been estimated for some viruses, however, they are unable to properly represent amino acid substitution patterns of flaviviruses. In this study, we collected protein sequences from the flavivirus genus to specifically estimate an amino acid substitution model, called FLAVI, for flaviviruses. Experiments showed that the collected dataset was sufficient to estimate a stable model. More importantly, the FLAVI model was remarkably better than other existing models in analyzing flavivirus protein sequences. We recommend researchers to use the FLAVI model when studying protein sequences of flaviviruses or closely related viruses.
Collapse
|
6
|
Flandrois JP, Brochier-Armanet C, Briolay J, Abrouk D, Schwob G, Normand P, Fernandez MP. Taxonomic assignment of uncultured prokaryotes with long range PCR targeting the spectinomycin operon. Res Microbiol 2019; 170:280-287. [PMID: 31279085 DOI: 10.1016/j.resmic.2019.06.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 05/02/2019] [Accepted: 06/25/2019] [Indexed: 11/28/2022]
Abstract
The taxonomic assignment of uncultured prokaryotes to known taxa is a major challenge in microbial systematics. This relies usually on the phylogenetic analysis of the ribosomal small subunit RNA or a few housekeeping genes. Recent works have disclosed ribosomal proteins as valuable markers for systematics and, due to the boom in complete genome sequencing, their use has become widespread. Yet, in the case of uncultured strains, for which complete genome sequences cannot be easily obtained, sequencing many markers is complicated and time consuming. Taking the advantage of the organization of ribosomal protein coding genes in large gene clusters, we amplified a 32 kb conserved region encompassing the spectinomycin (spc) operon using long range PCR from isolated and from uncultured nodular endophytic Frankia strains. The phylogenetic analysis of the 27 ribosomal protein genes contained in this region provided a robust phylogenetic tree consistent with phylogenies based on larger set of markers, indicating that this subset of ribosomal proteins contains enough phylogenetic signal to address systematic issues. This work shows that using long range PCR could break down the barrier preventing the use of ribosomal proteins as phylogenetic markers when complete genome sequences cannot be easily obtained.
Collapse
Affiliation(s)
- Jean-Pierre Flandrois
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France.
| | - Céline Brochier-Armanet
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France.
| | - Jérôme Briolay
- Université de Lyon, Université Lyon 1, DTAMB, Villeurbanne, France.
| | - Danis Abrouk
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| | - Guillaume Schwob
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| | - Philippe Normand
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| | - Maria P Fernandez
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| |
Collapse
|
7
|
Fernandes NM, Schrago CG. A multigene timescale and diversification dynamics of Ciliophora evolution. Mol Phylogenet Evol 2019; 139:106521. [PMID: 31152779 DOI: 10.1016/j.ympev.2019.106521] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2019] [Revised: 05/24/2019] [Accepted: 05/28/2019] [Indexed: 11/25/2022]
Abstract
Ciliophora is one of the most diverse lineages of unicellular eukaryotes. Nevertheless, a robust timescale including all main lineages and employing properly identified ciliate fossils as primary calibrations is lacking. Here, we inferred a time-calibrated multigene phylogeny of Ciliophora evolution, and we used this timetree to investigate the rates and patterns of lineage diversification through time. We implemented a two-step analytical approach that favored both gene and taxon sampling, reducing the uncertainty of time estimates and yielding narrower credibility intervals on the ribosomal-derived chronogram. We estimate the origin of Ciliophora at 1143 Ma, which is substantially younger than previously proposed ages, and the huge diversity explosion occurred during the Paleozoic. Among the current groups recognized as classes, Spirotrichea diverged earlier, its origin was dated at ca. 850 Ma, and Protocruziea was the younger class, with crown age estimated at 56 Ma. Macroevolutionary analysis detected a significant rate shift in diversification dynamics in the spirotrichean clade Hypotrichia + Oligotrichia + Choreotrichia, which had accelerated speciation rate ca. 570 Ma, during the Ediacaran-Cambrian transition. For all crown lineages investigated, speciation rates declined through time, whereas extinction rates remained low and relatively constant throughout the evolutionary history of ciliates.
Collapse
Affiliation(s)
- Noemi Mendes Fernandes
- Laboratório de Protistologia, Departamento de Zoologia, Universidade Federal do Rio de Janeiro, Brazil.
| | - Carlos G Schrago
- Laboratório de Biologia Evolutiva Teórica e Aplicada, Departamento de Genética, Universidade Federal do Rio de Janeiro, Brazil
| |
Collapse
|
8
|
Maestri E, Pavlicevic M, Montorsi M, Marmiroli N. Meta-Analysis for Correlating Structure of Bioactive Peptides in Foods of Animal Origin with Regard to Effect and Stability. Compr Rev Food Sci Food Saf 2018; 18:3-30. [PMID: 33337011 DOI: 10.1111/1541-4337.12402] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 09/28/2018] [Accepted: 09/29/2018] [Indexed: 01/09/2023]
Abstract
Amino acid (AA) sequences of 807 bioactive peptides from foods of animal origin were examined in order to correlate peptide structure with activity (antihypertensive, antioxidative, immunomodulatory, antimicrobial, hypolipidemic, antithrombotic, and opioid) and stability in vivo. Food sources, such as milk, meat, eggs, and marine products, show different frequencies of bioactive peptides exhibiting specific effects. There is a correlation of peptide structure and effect, depending on type and position of AA. Opioid peptides contain a high percentage of aromatic AA residues, while antimicrobial peptides show an excess of positively charged AAs. AA residue position is significant, with those in the first and penultimate positions having the biggest effects on peptide activity. Peptides that have activity in vivo contain a high percentage (67%) of proline residues, but the positions of proline in the sequence depend on the length of the peptide. We also discuss the influence of processing on activity of these peptides, as well as methods for predicting release from the source protein and activity of peptides.
Collapse
Affiliation(s)
- Elena Maestri
- Dept. of Chemistry, Life Sciences and Environmental Sustainability, Univ. of Parma, Parco Area delle Scienze 11/A, 43124, Parma, Italy.,Interdepartmental Centre for Food Safety, Technologies and Innovation for Agri-food (SITEIA.PARMA), Univ. of Parma, Parco Area delle Scienze, 43124, Parma, Italy
| | - Milica Pavlicevic
- Inst. for Food Technology and Biochemistry, Faculty of Agriculture, Univ. of Belgrade, Belgrade, Serbia
| | - Michela Montorsi
- Dept. of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open Univ., Via F. Daverio 7, 20122, Milan, Italy.,Consorzio Italbiotec, Via Fantoli, 16/15, 20138, Milano, Italy.,Inst. of Bioimaging and Molecular Physiology, National Council of Research (CNR), Via Fratelli Cervi 93, 20090, Segrate, Italy
| | - Nelson Marmiroli
- Dept. of Chemistry, Life Sciences and Environmental Sustainability, Univ. of Parma, Parco Area delle Scienze 11/A, 43124, Parma, Italy.,Interdepartmental Centre for Food Safety, Technologies and Innovation for Agri-food (SITEIA.PARMA), Univ. of Parma, Parco Area delle Scienze, 43124, Parma, Italy.,Consorzio Italbiotec, Via Fantoli, 16/15, 20138, Milano, Italy
| |
Collapse
|
9
|
Wu J, Yonezawa T, Kishino H. Rates of Molecular Evolution Suggest Natural History of Life History Traits and a Post-K-Pg Nocturnal Bottleneck of Placentals. Curr Biol 2017; 27:3025-3033.e5. [DOI: 10.1016/j.cub.2017.08.043] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Revised: 06/12/2017] [Accepted: 08/17/2017] [Indexed: 11/25/2022]
|
10
|
Origin of the HIV-1 group O epidemic in western lowland gorillas. Proc Natl Acad Sci U S A 2015; 112:E1343-52. [PMID: 25733890 DOI: 10.1073/pnas.1502022112] [Citation(s) in RCA: 113] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
HIV-1, the cause of AIDS, is composed of four phylogenetic lineages, groups M, N, O, and P, each of which resulted from an independent cross-species transmission event of simian immunodeficiency viruses (SIVs) infecting African apes. Although groups M and N have been traced to geographically distinct chimpanzee communities in southern Cameroon, the reservoirs of groups O and P remain unknown. Here, we screened fecal samples from western lowland (n = 2,611), eastern lowland (n = 103), and mountain (n = 218) gorillas for gorilla SIV (SIVgor) antibodies and nucleic acids. Despite testing wild troops throughout southern Cameroon (n = 14), northern Gabon (n = 16), the Democratic Republic of Congo (n = 2), and Uganda (n = 1), SIVgor was identified at only four sites in southern Cameroon, with prevalences ranging from 0.8-22%. Amplification of partial and full-length SIVgor sequences revealed extensive genetic diversity, but all SIVgor strains were derived from a single lineage within the chimpanzee SIV (SIVcpz) radiation. Two fully sequenced gorilla viruses from southwestern Cameroon were very closely related to, and likely represent the source population of, HIV-1 group P. Most of the genome of a third SIVgor strain, from central Cameroon, was very closely related to HIV-1 group O, again pointing to gorillas as the immediate source. Functional analyses identified the cytidine deaminase APOBEC3G as a barrier for chimpanzee-to-gorilla, but not gorilla-to-human, virus transmission. These data indicate that HIV-1 group O, which spreads epidemically in west central Africa and is estimated to have infected around 100,000 people, originated by cross-species transmission from western lowland gorillas.
Collapse
|
11
|
Dang CC, Le VS, Gascuel O, Hazes B, Le QS. FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets. BMC Bioinformatics 2014; 15:341. [PMID: 25344302 PMCID: PMC4287512 DOI: 10.1186/1471-2105-15-341] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 09/29/2014] [Indexed: 11/11/2022] Open
Abstract
Background Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled. Results The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used. Conclusions FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com.
Collapse
Affiliation(s)
| | | | | | | | - Quang Si Le
- The Wellcome Trust Center for Human Genetics, Oxford University, Oxford, UK.
| |
Collapse
|
12
|
Tourasse NJ, Stabell FB, Kolstø AB. Survey of chimeric IStron elements in bacterial genomes: multiple molecular symbioses between group I intron ribozymes and DNA transposons. Nucleic Acids Res 2014; 42:12333-51. [PMID: 25324310 PMCID: PMC4227781 DOI: 10.1093/nar/gku939] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
IStrons are chimeric genetic elements composed of a group I intron associated with an insertion sequence (IS). The group I intron is a catalytic RNA providing the IStron with self-splicing ability, which renders IStron insertions harmless to the host genome. The IS element is a DNA transposon conferring mobility, and thus allowing the IStron to spread in genomes. IStrons are therefore a striking example of a molecular symbiosis between unrelated genetic elements endowed with different functions. In this study, we have conducted the first comprehensive survey of IStrons in sequenced genomes that provides insights into the distribution, diversity, origin and evolution of IStrons. We show that IStrons have a restricted phylogenetic distribution limited to two bacterial phyla, the Firmicutes and the Fusobacteria. Nevertheless, diverse IStrons representing two major groups targeting different insertion site motifs were identified. This taken with the finding that while the intron components of all IStrons belong to the same structural class, they are fused to different IS families, indicates that multiple intron–IS symbioses have occurred during evolution. In addition, introns and IS elements related to those that were at the origin of IStrons were also identified.
Collapse
Affiliation(s)
- Nicolas J Tourasse
- Laboratory for Microbial Dynamics (LaMDa), Department of Pharmaceutical Biosciences, University of Oslo, Oslo, Norway Institut de Biologie Physico-Chimique, UMR CNRS 7141, Université Pierre et Marie Curie, Paris, France
| | - Fredrik B Stabell
- Laboratory for Microbial Dynamics (LaMDa), Department of Pharmaceutical Biosciences, University of Oslo, Oslo, Norway
| | - Anne-Brit Kolstø
- Laboratory for Microbial Dynamics (LaMDa), Department of Pharmaceutical Biosciences, University of Oslo, Oslo, Norway
| |
Collapse
|
13
|
Liu Y, Cox CJ, Wang W, Goffinet B. Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias. Syst Biol 2014; 63:862-78. [PMID: 25070972 DOI: 10.1093/sysbio/syu049] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic analyses using concatenation of genomic-scale data have been seen as the panacea for resolving the incongruences among inferences from few or single genes. However, phylogenomics may also suffer from systematic errors, due to the, perhaps cumulative, effects of saturation, among-taxa compositional (GC content) heterogeneity, or codon-usage bias plaguing the individual nucleotide loci that are concatenated. Here, we provide an example of how these factors affect the inferences of the phylogeny of early land plants based on mitochondrial genomic data. Mitochondrial sequences evolve slowly in plants and hence are thought to be suitable for resolving deep relationships. We newly assembled mitochondrial genomes from 20 bryophytes, complemented these with 40 other streptophytes (land plants plus algal outgroups), compiling a data matrix of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the concatenated nucleotide data resolve mosses as sister-group to the remaining land plants. However, the corresponding translated amino acid data support the liverwort lineage in this position. Both results receive weak to moderate support in maximum-likelihood analyses, but strong support in Bayesian inferences. Tests of alternative hypotheses using either nucleotide or amino acid data provide implicit support for their respective optimal topologies, and clearly reject the hypotheses that bryophytes are monophyletic, liverworts and mosses share a unique common ancestor, or hornworts are sister to the remaining land plants. We determined that land plant lineages differ in their nucleotide composition, and in their usage of synonymous codon variants. Composition heterogeneous Bayesian analyses employing a nonstationary model that accounts for variation in among-lineage composition, and inferences from degenerated nucleotide data that avoid the effects of synonymous substitutions that underlie codon-usage bias, again recovered liverworts being sister to the remaining land plants but without support. These analyses indicate that the inference of an early-branching moss lineage based on the nucleotide data is caused by convergent compositional biases. Accommodating among-site amino acid compositional heterogeneity (CAT-model) yields no support for the optimal resolution of liverwort as sister to the rest of land plants, suggesting that the robust inference of the liverwort position in homogeneous analyses may be due in part to compositional biases among sites. All analyses support a paraphyletic bryophytes with hornworts composing the sister-group to tracheophytes. We conclude that while genomic data may generate highly supported phylogenetic trees, these inferences may be artifacts. We suggest that phylogenomic analyses should assess the possible impact of potential biases through comparisons of protein-coding gene data and their amino acid translations by evaluating the impact of substitutional saturation, synonymous substitutions, and compositional biases through data deletion strategies and by analyzing the data using heterogeneous composition models. We caution against relying on any one presentation of the data (nucleotide or amino acid) or any one type of analysis even when analyzing large-scale data sets, no matter how well-supported, without fully exploring the effects of substitution models.
Collapse
Affiliation(s)
- Yang Liu
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Cymon J Cox
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Wei Wang
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Bernard Goffinet
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| |
Collapse
|
14
|
Nardi F, Liò P, Carapelli A, Frati F. MtPAN(3): site-class specific amino acid replacement matrices for mitochondrial proteins of Pancrustacea and Collembola. Mol Phylogenet Evol 2014; 75:239-44. [PMID: 24525199 DOI: 10.1016/j.ympev.2014.02.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Revised: 01/30/2014] [Accepted: 02/02/2014] [Indexed: 11/26/2022]
Abstract
Phylogenetic analyses of Pancrustacea have generally relied on empirical models of amino acid substitution estimated from large reference datasets and applied to the entire alignment. More recently, following the observation that different sites, or groups of sites, may evolve under different evolutionary constraints, methods have been developed to deal with site or site-class specific models. A set of three matrices has been here developed based on an alignment of complete mitochondrial pancrustacean genomes partitioned using an unsupervised clustering procedure acting over per-site physiochemical properties. The performance of the proposed matrix set - named MtPAN(3) - was compared to relevant single matrix models (MtZOA, MtART, MtPAN) under ML and BI. While the application of the new model does not solve some of the topological problems frequently encountered with pancrustacean mitogenomic phylogenetic analyses, MtPAN(3) largely outperforms its competitors based on AIC and Bayes factors, indicating a significantly improved fit to the empirical data. The applicability of the new model, as well as of multiple matrix models in general, is discussed and an R/BioPerl script that implements the procedure is provided.
Collapse
Affiliation(s)
- Francesco Nardi
- Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy.
| | - Pietro Liò
- Computer Laboratory, University of Cambridge. William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK.
| | - Antonio Carapelli
- Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy.
| | - Francesco Frati
- Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy.
| |
Collapse
|