1
|
Azouri D, Granit O, Alburquerque M, Mansour Y, Pupko T, Mayrose I. The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning. Mol Biol Evol 2024; 41:msae105. [PMID: 38829798 PMCID: PMC11180600 DOI: 10.1093/molbev/msae105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 05/17/2024] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.
Collapse
Affiliation(s)
- Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Oz Granit
- Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Yishay Mansour
- Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| |
Collapse
|
2
|
Roestel JA, Wiersema JH, Jansen RK, Borsch T, Gruenstaeudl M. On the importance of sequence alignment inspections in plastid phylogenomics - an example from revisiting the relationships of the water-lilies. Cladistics 2024. [PMID: 38761095 DOI: 10.1111/cla.12584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/27/2024] [Accepted: 04/29/2024] [Indexed: 05/20/2024] Open
Abstract
The water-lily clade represents the second earliest-diverging branch of angiosperms. Most of its species belong to Nymphaeaceae, of which the "core Nymphaeaceae"-comprising the genera Euryale, Nymphaea and Victoria-is the most diverse clade. Despite previous molecular phylogenetic studies on the core Nymphaeaceae, various aspects of their evolutionary relationships have remained unresolved. The length-variable introns and intergenic spacers are known to contain most of the sequence variability within the water-lily plastomes. Despite the challenges with multiple sequence alignment, any new molecular phylogenetic investigation on the core Nymphaeaceae should focus on these noncoding plastome regions. For example, a new plastid phylogenomic study on the core Nymphaeaceae should generate DNA sequence alignments of all plastid introns and intergenic spacers based on the principle of conserved sequence motifs. In this investigation, we revisit the phylogenetic history of the core Nymphaeaceae by employing such an approach. Specifically, we use a plastid phylogenomic analysis strategy in which all coding and noncoding partitions are separated and then undergo software-driven DNA sequence alignment, followed by a motif-based alignment inspection and adjustment. This approach allows us to increase the reliability of the character base compared to the default practice of aligning complete plastomes through software algorithms alone. Our approach produces significantly different phylogenetic tree reconstructions for several of the plastome regions under study. The results of these reconstructions underscore that Nymphaea is paraphyletic in its current circumscription, that each of the five subgenera of Nymphaea is monophyletic, and that the subgenus Nymphaea is sister to all other subgenera of Nymphaea. Our results also clarify many evolutionary relationships within the Nymphaea subgenera Brachyceras, Hydrocallis and Nymphaea. In closing, we discuss whether the phylogenetic reconstructions obtained through our motif-based alignment adjustments are in line with morphological evidence on water-lily evolution.
Collapse
Affiliation(s)
- Jessica A Roestel
- Institut für Biologie, Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Berlin, 14195, Germany
| | - John H Wiersema
- Department of Botany, National Museum of Natural History - Smithsonian Institution, Washington, DC, 37012, USA
| | - Robert K Jansen
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712, USA
| | - Thomas Borsch
- Institut für Biologie, Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Berlin, 14195, Germany
- Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, 14195, Berlin, Germany
| | - Michael Gruenstaeudl
- Institut für Biologie, Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Berlin, 14195, Germany
- Department of Biological Sciences, Fort Hays State University, Hays, KS, 67601, USA
| |
Collapse
|
3
|
Gahtori R, Tripathi AH, Chand G, Pande A, Joshi P, Rai RC, Upadhyay SK. Phytochemical Screening of Nyctanthes arbor-tristis Plant Extracts and Their Antioxidant and Antibacterial Activity Analysis. Appl Biochem Biotechnol 2024; 196:436-456. [PMID: 37140779 DOI: 10.1007/s12010-023-04552-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/18/2023] [Indexed: 05/05/2023]
Abstract
Nyctanthes arbor-tristis, alias "Vishnu Parijat," is a medicinal plant used to treat various inflammation-associated ailments and to combat innumerable infections in the traditional system of medicine. In the present study, we collected the samples of N. arbor-tristis from the lower Himalayan region of Uttarakhand, India, and carried out their molecular identification through DNA barcoding. To examine the antioxidant and antibacterial activities, we prepared the ethanolic and aqueous extracts (from flowers and leaves) and performed their phytochemical analysis by using different qualitative and quantitative approaches. The phytoextracts showed marked antioxidant potential, as revealed by a comprehensive set of assays. The ethanolic leaf extract showed marked antioxidant potential towards DPPH, ABTS, and NO scavenging (IC50 = 30.75 ± 0.006, 30.83 ± 0.002, and 51.23 ± 0.009 μg/mL, respectively). We used TLC-bioautography assay to characterize different antioxidant constituents (based on their Rf values) in the chromatograms ran under different mobile phases. For one of the prominent antioxidant spots in TLC bioautography, GC-MS analysis identified cis-9-hexadecenal and n-hexadecanoic acid as the major constituents. Furthermore, in antibacterial study, the ethanolic leaf extract showed marked activity against Aeromonas salmonicida (113.40 mg/mL of extract was equivalent to 100 μg/mL of kanamycin). In contrast, the ethanolic flower extract showed considerable antibacterial activity against Pseudomonas aeruginosa (125.85 mg/mL of extract ≡100 μg/mL of kanamycin). This study presents the phylogenetic account and unravels the antioxidant-related properties and antibacterial potential of N. arbor-tristis.
Collapse
Affiliation(s)
- Rekha Gahtori
- Department of Biotechnology, Kumaun University, Bhimtal Campus, Nainital, Uttarakhand, 263136, India
| | - Ankita H Tripathi
- Department of Biotechnology, Kumaun University, Bhimtal Campus, Nainital, Uttarakhand, 263136, India
| | - Garima Chand
- Department of Chemistry, Kumaun University, DSB Campus, Nainital, Uttarakhand, 263001, India
| | - Amit Pande
- ICAR-Directorate Coldwater Fisheries Research, Bhimtal, Uttarakhand, 263136, India
| | - Penny Joshi
- Department of Chemistry, Kumaun University, DSB Campus, Nainital, Uttarakhand, 263001, India
| | - Ramesh Chandra Rai
- Translational Health Science and Technology Institute (THSTI), Faridabad, Haryana, 121001, India.
| | - Santosh K Upadhyay
- Department of Biotechnology, Kumaun University, Bhimtal Campus, Nainital, Uttarakhand, 263136, India.
| |
Collapse
|
4
|
Bowman J, Enard D, Lynch VJ. Phylogenomics reveals an almost perfect polytomy among the almost ungulates ( Paenungulata). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570590. [PMID: 38106080 PMCID: PMC10723481 DOI: 10.1101/2023.12.07.570590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Phylogenetic studies have resolved most relationships among Eutherian Orders. However, the branching order of elephants (Proboscidea), hyraxes (Hyracoidea), and sea cows (Sirenia) (i.e., the Paenungulata) has remained uncertain since at least 1758, when Linnaeus grouped elephants and manatees into a single Order (Bruta) to the exclusion of hyraxes. Subsequent morphological, molecular, and large-scale phylogenomic datasets have reached conflicting conclusions on the branching order within Paenungulates. We use a phylogenomic dataset of alignments from 13,388 protein-coding genes across 261 Eutherian mammals to infer phylogenetic relationships within Paenungulates. We find that gene trees almost equally support the three alternative resolutions of Paenungulate relationships and that despite strong support for a Proboscidea+Hyracoidea split in the multispecies coalescent (MSC) tree, there is significant evidence for gene tree uncertainty, incomplete lineage sorting, and introgression among Proboscidea, Hyracoidea, and Sirenia. Indeed, only 8-10% of genes have statistically significant phylogenetic signal to reject the hypothesis of a Paenungulate polytomy. These data indicate little support for any resolution for the branching order Proboscidea, Hyracoidea, and Sirenia within Paenungulata and suggest that Paenungulata may be as close to a real, or at least unresolvable, polytomy as possible.
Collapse
Affiliation(s)
- Jacob Bowman
- Department of Biological Sciences, University at Buffalo, SUNY, 551 Cooke Hall, Buffalo, NY, USA
| | - David Enard
- Department of Ecology and Evolutionary Biology. University of Arizona, Tucson, AZ, USA
| | - Vincent J. Lynch
- Department of Biological Sciences, University at Buffalo, SUNY, 551 Cooke Hall, Buffalo, NY, USA
| |
Collapse
|
5
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
6
|
Bastolla U, Abia D, Piette O. PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score. Bioinformatics 2023; 39:btad630. [PMID: 37847775 PMCID: PMC10628387 DOI: 10.1093/bioinformatics/btad630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/01/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION https://github.com/ugobas/PC_ali.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - David Abia
- Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - Oscar Piette
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
7
|
Streicher MB, Johnson SD, Willows‐Munro S. Effect of fuchsin fixation of pollen on DNA barcode recovery. Ecol Evol 2023; 13:e10475. [PMID: 37664513 PMCID: PMC10468989 DOI: 10.1002/ece3.10475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 07/12/2023] [Accepted: 07/21/2023] [Indexed: 09/05/2023] Open
Abstract
Pollen grains attached to insects are a valuable source of ecological information which can be used to reconstruct visitation networks. Morphological pollen identification relies on light microscopy with pollen usually stained and mounted in fuchsin jelly, which is also used to remove pollen from the bodies of insects. Pollen embedded in fuchsin jelly could potentially be used for DNA barcoding and metabarcoding (large-scale taxonomic identification of complex mixed samples) and thus provide additional information for pollination networks. In this study, we determine whether fuchsin-embedded pollen can be used for downstream molecular applications. We evaluate the quality of plant barcode (ITS) sequences amplified from DNA extracted from both fresh (untreated) pollen, and pollen which had been embedded in fuchsin jelly. We show that the addition of fuchsin to DNA extraction does not impact DNA barcode sequence quality during short-term storage. DNA extractions from both untreated and fuchsin-treated pollen produced reliable barcode sequences of high quality. Our findings suggest that pollen which has been collected, stained, and embedded in fuchsin jelly for preliminary microscopy work can be used within several days for downstream genetic analysis, though the quality of DNA from pollen stored in fuchsin jelly for extended periods is yet to be established.
Collapse
Affiliation(s)
- Melanie B. Streicher
- Centre for Functional Biodiversity, School of Life SciencesUniversity of KwaZulu‐NatalScottsvilleSouth Africa
| | - Steven D. Johnson
- Centre for Functional Biodiversity, School of Life SciencesUniversity of KwaZulu‐NatalScottsvilleSouth Africa
| | | |
Collapse
|
8
|
Vargas-Castro I, Crespo-Picazo JL, Jiménez Martínez MÁ, Marco-Cabedo V, Muñoz-Baquero M, García-Párraga D, Sánchez-Vizcaíno JM. First description of a lesion in the upper digestive mucosa associated with a novel gammaherpesvirus in a striped dolphin (Stenella coeruleoalba) stranded in the Western Mediterranean Sea. BMC Vet Res 2023; 19:118. [PMID: 37563731 PMCID: PMC10413511 DOI: 10.1186/s12917-023-03677-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND A wide variety of lesions have been associated with herpesvirus in cetaceans. However, descriptions of herpesvirus infections in the digestive system of cetaceans are scarce. CASE REPORT A young female striped dolphin stranded in the Valencian Community (Spain) on the 6th August 2021. The animal showed external macroscopic lesions suggestive of an aggressive interaction with bottlenose dolphins (rake marks in the epidermis). Internally, the main findings included congestion of the central nervous system and multiple, well-defined, whitish, irregularly shaped, proliferative lesions on the oropharyngeal and laryngopharyngeal mucosa. Histopathology revealed lymphoplasmacytic and histiocytic meningoencephalitis, consistent with neuro brucellosis. The oropharyngeal and laryngopharyngeal plaques were comprised histologically of focally extensive epithelial hyperplasia. As part of the health surveillance program tissue samples were tested for cetacean morbillivirus using a real-time reverse transcription-PCR, for Brucella spp. using a real-time PCR, and for herpesvirus using a conventional nested PCR. All samples were negative for cetacean morbillivirus; molecular positivity for Brucella spp. was obtained in pharyngeal tonsils and cerebrospinal fluid; herpesvirus was detected in a proliferative lesion in the upper digestive mucosa. Phylogenetic analysis showed that the herpesvirus sequence was included in the Gammaherpesvirinae subfamily. This novel sequence showed the greatest identity with other Herpesvirus sequences detected in skin, pharyngeal and genital lesions in five different species. CONCLUSIONS To the best of the authors' knowledge, this is the first report of a proliferative lesion in the upper digestive mucosa associated with gammaherpesvirus posititvity in a striped dolphin (Stenella coeruleoalba).
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, 28040, Spain.
| | - José Luis Crespo-Picazo
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
| | - Mª Ángeles Jiménez Martínez
- Department of Animal Medicine and Surgery, Veterinary Faculty, Complutense University of Madrid, Madrid, 28040, Spain
| | - Vicente Marco-Cabedo
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
| | - Marta Muñoz-Baquero
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
| | - Daniel García-Párraga
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
- Biology Department, Oceanogràfic, Ciudad de las Artes y las Ciencias, 46013, Valencia, Spain
| | - José Manuel Sánchez-Vizcaíno
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, 28040, Spain
| |
Collapse
|
9
|
Vargas-Castro I, Peletto S, Mattioda V, Goria M, Serracca L, Varello K, Sánchez-Vizcaíno JM, Puleio R, Nocera FD, Lucifora G, Acutis P, Casalone C, Grattarola C, Giorda F. Epidemiological and genetic analysis of Cetacean Morbillivirus circulating on the Italian coast between 2018 and 2021. Front Vet Sci 2023; 10:1216838. [PMID: 37583469 PMCID: PMC10424449 DOI: 10.3389/fvets.2023.1216838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/04/2023] [Indexed: 08/17/2023] Open
Abstract
Cetacean morbillivirus (CeMV) has caused several outbreaks, unusual mortality events, and interepidemic single-lethal disease episodes in the Mediterranean Sea. Since 2012, a new strain with a northeast (NE) Atlantic origin has been circulating among Mediterranean cetaceans, causing numerous deaths. The objective of this study was to determine the prevalence of CeMV in cetaceans stranded in Italy between 2018 and 2021 and characterize the strain of CeMV circulating. Out of the 354 stranded cetaceans along the Italian coastlines, 113 were CeMV-positive. This prevalence (31.9%) is one of the highest reported without an associated outbreak. All marine sectors along the Italian coastlines, except for the northern Adriatic coast, reported a positive molecular diagnosis of CeMV. In one-third of the CeMV-positive cetaceans submitted to a histological evaluation, a chronic form of the infection (detectable viral antigen, the absence of associated lesions, and concomitant coinfections) was suspected. Tissues from 24 animals were used to characterize the strain, obtaining 57 sequences from phosphoprotein, nucleocapsid, and fusion protein genes, which were submitted to GenBank. Our sequences showed the highest identity with NE-Atlantic strain sequences, and in the phylogenetic study, they clustered together with them. Regarding age and species, most of these individuals were adults (17/24, 70.83%) and striped dolphins (19/24, 79.16%). This study improves our understanding on the NE-Atlantic CeMV strain in the Italian waters, supporting the hypothesis of an endemic circulation of the virus in this area; however, additional studies are necessary to deeply comprehend the epidemiology of this strain in the Mediterranean Sea.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, Spain
| | - Simone Peletto
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Virginia Mattioda
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Maria Goria
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Laura Serracca
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Katia Varello
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | | | - Roberto Puleio
- Istituto Zooprofilattico Sperimentale della Sicilia, Palermo, Italy
| | - Fabio Di Nocera
- Istituto Zooprofilattico Sperimentale del Mezzogiorno, Naples, Italy
| | - Giuseppe Lucifora
- Istituto Zooprofilattico Sperimentale del Mezzogiorno, Naples, Italy
| | - Pierluigi Acutis
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Cristina Casalone
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Carla Grattarola
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Federica Giorda
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| |
Collapse
|
10
|
Xiang C, Gao F, Jakovlić I, Lei H, Hu Y, Zhang H, Zou H, Wang G, Zhang D. Using PhyloSuite for molecular phylogeny and tree-based analyses. IMETA 2023; 2:e87. [PMID: 38868339 PMCID: PMC10989932 DOI: 10.1002/imt2.87] [Citation(s) in RCA: 49] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/04/2023] [Accepted: 01/15/2023] [Indexed: 06/14/2024]
Abstract
Phylogenetic analysis has entered the genomics (multilocus) era. For less experienced researchers, conquering the large number of software programs required for a multilocus-based phylogenetic reconstruction can be somewhat daunting and time-consuming. PhyloSuite, a software with a user-friendly GUI, was designed to make this process more accessible by integrating multiple software programs needed for multilocus and single-gene phylogenies and further streamlining the whole process. In this protocol, we aim to explain how to conduct each step of the phylogenetic pipeline and tree-based analyses in PhyloSuite. We also present a new version of PhyloSuite (v1.2.3), wherein we fixed some bugs, made some optimizations, and introduced some new functions, including a number of tree-based analyses, such as signal-to-noise calculation, saturation analysis, spurious species identification, and etc. The step-by-step protocol includes background information (i.e., what the step does), reasons (i.e., why do the step), and operations (i.e., how to do it). This protocol will help researchers quick-start their way through the multilocus phylogenetic analysis, especially those interested in conducting organelle-based analyses.
Collapse
Affiliation(s)
- Chuan‐Yu Xiang
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Fangluan Gao
- Institute of Plant Virology, Fujian Agriculture and Forestry UniversityFuzhouChina
| | - Ivan Jakovlić
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Hong‐Peng Lei
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Ye Hu
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Hong Zhang
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| | - Hong Zou
- Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of SciencesWuhanChina
| | - Gui‐Tang Wang
- Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of SciencesWuhanChina
| | - Dong Zhang
- State Key Laboratory of Grassland Agro‐Ecosystems, and College of EcologyLanzhou UniversityLanzhouChina
| |
Collapse
|
11
|
Combrink L, Humphreys IR, Washburn Q, Arnold HK, Stagaman K, Kasschau KD, Jolles AE, Beechler BR, Sharpton TJ. Best practice for wildlife gut microbiome research: A comprehensive review of methodology for 16S rRNA gene investigations. Front Microbiol 2023; 14:1092216. [PMID: 36910202 PMCID: PMC9992432 DOI: 10.3389/fmicb.2023.1092216] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 01/18/2023] [Indexed: 02/24/2023] Open
Abstract
Extensive research in well-studied animal models underscores the importance of commensal gastrointestinal (gut) microbes to animal physiology. Gut microbes have been shown to impact dietary digestion, mediate infection, and even modify behavior and cognition. Given the large physiological and pathophysiological contribution microbes provide their host, it is reasonable to assume that the vertebrate gut microbiome may also impact the fitness, health and ecology of wildlife. In accordance with this expectation, an increasing number of investigations have considered the role of the gut microbiome in wildlife ecology, health, and conservation. To help promote the development of this nascent field, we need to dissolve the technical barriers prohibitive to performing wildlife microbiome research. The present review discusses the 16S rRNA gene microbiome research landscape, clarifying best practices in microbiome data generation and analysis, with particular emphasis on unique situations that arise during wildlife investigations. Special consideration is given to topics relevant for microbiome wildlife research from sample collection to molecular techniques for data generation, to data analysis strategies. Our hope is that this article not only calls for greater integration of microbiome analyses into wildlife ecology and health studies but provides researchers with the technical framework needed to successfully conduct such investigations.
Collapse
Affiliation(s)
- Leigh Combrink
- Department of Microbiology, Oregon State University, Corvallis, OR, United States.,Department of Biomedical Sciences, Carlson College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States.,School of Natural Resources and the Environment, University of Arizona, Tucson, AZ, United States
| | - Ian R Humphreys
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Quinn Washburn
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Holly K Arnold
- Department of Microbiology, Oregon State University, Corvallis, OR, United States.,Department of Biomedical Sciences, Carlson College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States
| | - Keaton Stagaman
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Kristin D Kasschau
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Anna E Jolles
- Department of Biomedical Sciences, Carlson College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States.,Department of Integrative Biology, Oregon State University, Corvallis, OR, United States
| | - Brianna R Beechler
- Department of Biomedical Sciences, Carlson College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Microbiology, Oregon State University, Corvallis, OR, United States.,Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
12
|
Kuang M, Zhang Y, Lam TW, Ting HF. MLProbs: A Data-Centric Pipeline for Better Multiple Sequence Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:524-533. [PMID: 35120007 DOI: 10.1109/tcbb.2022.3148382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In this paper, we explore using the data-centric approach to tackle the Multiple Sequence Alignment (MSA) construction problem. Unlike the algorithm-centric approach, which reduces the construction problem to a combinatorial optimization problem based on an abstract mathematical model, the data-centric approach explores using classification models trained from existing benchmark data to guide the construction. We identified two simple classifications to help us choose a better alignment tool and determine whether and how much to carry out realignment. We show that shallow machine-learning algorithms suffice to train sensitive models for these classifications. Based on these models, we implemented a new multiple sequence alignment pipeline, called MLProbs. Compared with 10 other popular alignment tools over four benchmark databases (namely, BAliBASE, OXBench, OXBench-X and SABMark), MLProbs consistently gives the highest TC score. More importantly, MLProbs shows non-trivial improvement for protein families with low similarity; in particular, when evaluated against the 1,356 protein families with similarity ≤ 50%, MLProbs achieves a TC score of 56.93, while the next best three tools are in the range of [55.41, 55.91] (increased by more than 1.8%). We also compared the performance of MLProbs and other MSA tools in two real-life applications - Phylogenetic Tree Construction Analysis and Protein Secondary Structure Prediction - and MLProbs also had the best performance. In our study, we used only shallow machine-learning algorithms to train our models. It would be interesting to study whether deep-learning methods can help make further improvements, so we suggest some possible research directions in the conclusion section.
Collapse
|
13
|
The Quality of Sequence Data Affects Biodiversity and Conservation Perspectives in the Neotropical Damselfly Megaloprepus caerulatus. DIVERSITY 2022. [DOI: 10.3390/d14121056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Ideally, the footprint of the evolutionary history of a species is drawn from integrative studies including quantitative and qualitative taxonomy, biogeography, ecology, and molecular genetics. In today’s research, species delimitations and identification of conservation units is often accompanied by a set of—at minimum—two sequence markers appropriate for the systematic level under investigation. Two such studies re-evaluated the species status in the world’s largest Odonata, the Neotropical damselfly Megaloprepus caerulatus. The species status of the genus Megaloprepus has long been debated. Despite applying a highly similar set of sequence markers, the two studies reached different conclusions concerning species status and population genetic relationships. In this study, we took the unique opportunity to compare the two datasets and analyzed the reasons for those incongruences. The two DNA sequence markers used (16S rDNA and CO1) were re-aligned using a strict conservative approach and the analyses used in both studies were repeated. Going step by step back to the first line of data handling, we show that a high number of unresolved characters in the sequence alignments as well as internal gaps are responsible for the different outcomes in terms of species delimitations and population genetic relationships. Overall, this study shows that high quality raw sequence data are an indispensable requirement, not only in odonate research.
Collapse
|
14
|
Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022; 2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]
Abstract
While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. Availability and implementation Our software is available open source at https://github.com/nishatbristy007/NSB. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Ahnaf Faisal
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | | |
Collapse
|
15
|
Shen C, Park M, Warnow T. WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment. J Comput Biol 2022; 29:782-801. [PMID: 35575747 DOI: 10.1089/cmb.2021.0585] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that occurred in the evolutionary history relating the sequences, or the inclusion of sequences that are not fully assembled. Ultra-large alignments using Phylogeny-Aware Profiles (UPP) (Nguyen et al. 2015) is one of the most accurate approaches for aligning data sets that exhibit sequence length heterogeneity: it constructs an alignment on the subset of sequences it considers "full-length," represents this "backbone alignment" using an ensemble of hidden Markov models (HMMs), and then adds each remaining sequence into the backbone alignment based on an HMM selected for that sequence from the ensemble. Our new method, WeIghTed Consensus Hmm alignment (WITCH), improves on UPP in three important ways: first, it uses a statistically principled technique to weight and rank the HMMs; second, it uses k>1 HMMs from the ensemble rather than a single HMM; and third, it combines the alignments for each of the selected HMMs using a consensus algorithm that takes the weights into account. We show that this approach provides improved alignment accuracy compared with UPP and other leading alignment methods, as well as improved accuracy for maximum likelihood trees based on these alignments.
Collapse
Affiliation(s)
- Chengze Shen
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Minhyuk Park
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
16
|
Forni D, Cagliani R, Molteni C, Arrigoni F, Mozzi A, Clerici M, De Gioia L, Sironi M. Homology-based classification of accessory proteins in coronavirus genomes uncovers extremely dynamic evolution of gene content. Mol Ecol 2022; 31:3672-3692. [PMID: 35575901 PMCID: PMC9328142 DOI: 10.1111/mec.16531] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 04/21/2022] [Accepted: 05/12/2022] [Indexed: 11/30/2022]
Abstract
Coronaviruses (CoVs) have complex genomes that encode a fixed array of structural and nonstructural components, as well as a variety of accessory proteins that differ even among closely related viruses. Accessory proteins often play a role in the suppression of immune responses and may represent virulence factors. Despite their relevance for CoV phenotypic variability, information on accessory proteins is fragmentary. We applied a systematic approach based on homology detection to create a comprehensive catalogue of accessory proteins encoded by CoVs. Our analyses grouped accessory proteins into 379 orthogroups and 12 super‐groups. No orthogroup was shared by the four CoV genera and very few were present in all or most viruses in the same genus, reflecting the dynamic evolution of CoV genomes. We observed differences in the distribution of accessory proteins in CoV genera. Alphacoronaviruses harboured the largest diversity of accessory open reading frames (ORFs), deltacoronaviruses the smallest. However, the average number of accessory proteins per genome was highest in betacoronaviruses. Analysis of the evolutionary history of some orthogroups indicated that the different CoV genera adopted similar evolutionary strategies. Thus, alphacoronaviruses and betacoronaviruses acquired phosphodiesterases and spike‐like accessory proteins independently, whereas horizontal gene transfer from reoviruses endowed betacoronaviruses and deltacoronaviruses with fusion‐associated small transmembrane (FAST) proteins. Finally, analysis of accessory ORFs in annotated CoV genomes indicated ambiguity in their naming. This complicates cross‐communication among researchers and hinders automated searches of large data sets (e.g., PubMed, GenBank). We suggest that orthogroup membership is used together with a naming system to provide information on protein function.
Collapse
Affiliation(s)
- Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Cristian Molteni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Federica Arrigoni
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|
17
|
Aledo JC. Phylogenies from unaligned proteomes using sequence environments of amino acid residues. Sci Rep 2022; 12:7497. [PMID: 35523825 PMCID: PMC9076898 DOI: 10.1038/s41598-022-11370-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 04/21/2022] [Indexed: 11/09/2022] Open
Abstract
Alignment-free methods for sequence comparison and phylogeny inference have attracted a great deal of attention in recent years. Several algorithms have been implemented in diverse software packages. Despite the great number of existing methods, most of them are based on word statistics. Although they propose different filtering and weighting strategies and explore different metrics, their performance may be limited by the phylogenetic signal preserved in these words. Herein, we present a different approach based on the species-specific amino acid neighborhood preferences. These differential preferences can be assessed in the context of vector spaces. In this way, a distance-based method to build phylogenies has been developed and implemented into an easy-to-use R package. Tests run on real-world datasets show that this method can reconstruct phylogenetic relationships with high accuracy, and often outperforms other alignment-free approaches. Furthermore, we present evidence that the new method can perform reliably on datasets formed by non-orthologous protein sequences, that is, the method not only does not require the identification of orthologous proteins, but also does not require their presence in the analyzed dataset. These results suggest that the neighborhood preference of amino acids conveys a phylogenetic signal that may be of great utility in phylogenomics.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Department of Molecular Biology and Biochemistry, University of Málaga, 29071, Málaga, Spain.
| |
Collapse
|
18
|
Genomic Fishing and Data Processing for Molecular Evolution Research. Methods Protoc 2022; 5:mps5020026. [PMID: 35314663 PMCID: PMC8938851 DOI: 10.3390/mps5020026] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/01/2022] [Accepted: 03/04/2022] [Indexed: 11/19/2022] Open
Abstract
Molecular evolution analyses, such as detection of adaptive/purifying selection or ancestral protein reconstruction, typically require three inputs for a target gene (or gene family) in a particular group of organisms: sequence alignment, model of evolution, and phylogenetic tree. While modern advances in high-throughput sequencing techniques have led to rapid accumulation of genomic-scale data in public repositories and databases, mining such vast amount of information often remains a challenging enterprise. Here, we describe a comprehensive, versatile workflow aimed at the preparation of genome-extracted datasets readily available for molecular evolution research. The workflow involves: (1) fishing (searching and capturing) specific gene sequences of interest from taxonomically diverse genomic data available in databases at variable levels of annotation, (2) processing and depuration of retrieved sequences, (3) production of a multiple sequence alignment, (4) selection of best-fit model of evolution, and (5) solid reconstruction of a phylogenetic tree.
Collapse
|
19
|
Özçelik E, Kuru N, Adebali O. Phylostat: a web-based tool to analyze paralogous clade divergence in phylogenetic trees. Turk J Biol 2022; 45:667-673. [PMID: 35068947 PMCID: PMC8733950 DOI: 10.3906/biy-2105-18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/29/2021] [Indexed: 11/03/2022] Open
Abstract
Phylogenetic trees are useful tools to infer evolutionary relationships between genetic entities. Phylogenetics enables not only evolution-based gene clustering but also the assignment of gene duplication and deletion events to the nodes when coupled with statistical approaches such as bootstrapping. However, extensive gene duplication and deletion events bring along a challenge in interpreting phylogenetic trees and require manual inference. In particular, there has been no robust method of determining whether one of the paralog clades systematically shows higher divergence following the gene duplication event as a sign of functional divergence. Here, we provide Phylostat, a graphical user interface that enables clade divergence analysis, visually and statistically. Phylostat is a web-based tool built on phylo.io to allow comparative clade divergence analysis, which is available at https://phylostat.adebalilab.org under an MIT open-source licence.
Collapse
Affiliation(s)
- Elif Özçelik
- Molecular Biology, Genetic and Bioengineering, Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul Turkey
| | - Nurdan Kuru
- Molecular Biology, Genetic and Bioengineering, Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul Turkey
| | - Ogün Adebali
- Molecular Biology, Genetic and Bioengineering, Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul Turkey
| |
Collapse
|
20
|
Baloch AA, Kakar KU, Nawaz Z, Mushtaq M, Abro A, Khan S, Latif A. Comparative genomics and evolutionary analysis of plant CNGCs. Biol Methods Protoc 2022; 7:bpac018. [PMID: 36032330 PMCID: PMC9400807 DOI: 10.1093/biomethods/bpac018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/26/2022] [Indexed: 12/04/2022] Open
Abstract
Comparative genomics and computational biology offer powerful research tools for studying evolutionary mechanisms of organisms, and the identification and characterization of conserved/distant genes and gene families. The plant CNGC gene family encodes evolutionary conserved ion channel proteins involved in important signaling pathways and biological functions. The fundamental ideas and standard procedures for genome-wide identification and evolutionary analysis of plant cyclic nucleotide-gated ion channels employing various software, tools, and online servers have been discussed. In particular, this developed method focused on practical procedures involving the comparative analysis of paralogs and orthologs of CNGC genes in different plant species at different levels including phylogenetic analysis, nomenclature and classification, gene structure, molecular protein evolution, and duplication events as mechanisms of gene family expansion and synteny.
Collapse
Affiliation(s)
- Akram Ali Baloch
- Department of Biotechnology, Faculty of Life Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS) , Quetta, Pakistan
| | - Kaleem U Kakar
- Department of Microbiology, Faculty of Life Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS) , Quetta, Pakistan
| | - Zarqa Nawaz
- Department of Botany, University of Central Punjab , Rawalpindi, Pakistan
| | - Muhammad Mushtaq
- Department of Biotechnology, Faculty of Life Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS) , Quetta, Pakistan
| | - Asma Abro
- Department of Biotechnology, Faculty of Life Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS) , Quetta, Pakistan
| | - Samiullah Khan
- Department of Biotechnology, Faculty of Life Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS) , Quetta, Pakistan
| | - Abdul Latif
- Department of Microbiology, Faculty of Life Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS) , Quetta, Pakistan
| |
Collapse
|
21
|
Ma D, Xin Y, Guo Z, Shi Y, Zhang L, Li Y, Gu Z, Ding Z, Shi G. Ancestral sequence reconstruction and spatial structure analysis guided alteration of longer-chain substrate catalysis for Thermomicrobium roseum lipase. Enzyme Microb Technol 2022; 156:109989. [PMID: 35134708 DOI: 10.1016/j.enzmictec.2022.109989] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/17/2021] [Accepted: 01/04/2022] [Indexed: 01/10/2023]
Affiliation(s)
- Danlei Ma
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| | - Yu Xin
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China.
| | - Zitao Guo
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| | - Yi Shi
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| | - Liang Zhang
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China.
| | - Youran Li
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| | - Zhenghua Gu
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| | - Zhongyang Ding
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| | - Guiyang Shi
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, PR China
| |
Collapse
|
22
|
Vargas-Castro I, Melero M, Crespo-Picazo JL, Jiménez MDLÁ, Sierra E, Rubio-Guerri C, Arbelo M, Fernández A, García-Párraga D, Sánchez-Vizcaíno JM. Systematic Determination of Herpesvirus in Free-Ranging Cetaceans Stranded in the Western Mediterranean: Tissue Tropism and Associated Lesions. Viruses 2021; 13:v13112180. [PMID: 34834986 PMCID: PMC8621769 DOI: 10.3390/v13112180] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/22/2021] [Accepted: 10/25/2021] [Indexed: 11/16/2022] Open
Abstract
The monitoring of herpesvirus infection provides useful information when assessing marine mammals’ health. This paper shows the prevalence of herpesvirus infection (80.85%) in 47 cetaceans stranded on the coast of the Valencian Community, Spain. Of the 966 tissues evaluated, 121 tested positive when employing nested-PCR (12.53%). The largest proportion of herpesvirus-positive tissue samples was in the reproductive system, nervous system, and tegument. Herpesvirus was more prevalent in females, juveniles, and calves. More than half the DNA PCR positive tissues contained herpesvirus RNA, indicating the presence of actively replicating virus. This RNA was most frequently found in neonates. Fourteen unique sequences were identified. Most amplified sequences belonged to the Gammaherpesvirinae subfamily, but a greater variation was found in Alphaherpesvirinae sequences. This is the first report of systematic herpesvirus DNA and RNA determination in free-ranging cetaceans. Nine (19.14%) were infected with cetacean morbillivirus and all of them (100%) were coinfected with herpesvirus. Lesions similar to those caused by herpesvirus in other species were observed, mainly in the skin, upper digestive tract, genitalia, and central nervous system. Other lesions were also attributable to concomitant etiologies or were nonspecific. It is necessary to investigate the possible role of herpesvirus infection in those cases.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
- Correspondence:
| | - Mar Melero
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
- Division of External Health, Government Delegation in the Community of Madrid, Ministry of Territorial Policy, 28071 Madrid, Spain
| | - José Luis Crespo-Picazo
- Research Department, Fundación Oceanogràfic de la Comunitat Valenciana, 46013 Valencia, Spain; (J.L.C.-P.); (D.G.-P.)
| | - María de los Ángeles Jiménez
- Department of Animal Medicine and Surgery, Veterinary Faculty, Complutense University of Madrid, 28040 Madrid, Spain;
| | - Eva Sierra
- Division of Veterinary Histology and Pathology, Institute for Animal Health, Veterinary School, University of Las Palmas de Gran Canaria, 35416 Canary Islands, Spain; (E.S.); (M.A.); (A.F.)
| | - Consuelo Rubio-Guerri
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
- Department of Pharmacy, Facultad de CC de la Salud, UCH-CEU University, 46113 Valencia, Spain
| | - Manuel Arbelo
- Division of Veterinary Histology and Pathology, Institute for Animal Health, Veterinary School, University of Las Palmas de Gran Canaria, 35416 Canary Islands, Spain; (E.S.); (M.A.); (A.F.)
| | - Antonio Fernández
- Division of Veterinary Histology and Pathology, Institute for Animal Health, Veterinary School, University of Las Palmas de Gran Canaria, 35416 Canary Islands, Spain; (E.S.); (M.A.); (A.F.)
| | - Daniel García-Párraga
- Research Department, Fundación Oceanogràfic de la Comunitat Valenciana, 46013 Valencia, Spain; (J.L.C.-P.); (D.G.-P.)
| | - José Manuel Sánchez-Vizcaíno
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
| |
Collapse
|
23
|
Echevarría LY, De la Riva I, Venegas PJ, Rojas-Runjaic FJM, R Dias I, Castroviejo-Fisher S. Total evidence and sensitivity phylogenetic analyses of egg-brooding frogs (Anura: Hemiphractidae). Cladistics 2021; 37:375-401. [PMID: 34478194 DOI: 10.1111/cla.12447] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2020] [Indexed: 01/06/2023] Open
Abstract
We study the phylogenetic relationships of egg-brooding frogs, a group of 118 neotropical species, unique among anurans by having embryos with large bell-shaped gills and females carrying their eggs on the dorsum, exposed or inside a pouch. We assembled a total evidence dataset of published and newly generated data containing 51 phenotypic characters and DNA sequences of 20 loci for 143 hemiphractids and 127 outgroup terminals. We performed six analytical strategies combining different optimality criteria (parsimony and maximum likelihood), alignment methods (tree- and similarity-alignment), and three different indel coding schemes (fifth character state, unknown nucleotide, and presence/absence characters matrix). Furthermore, we analyzed a subset of the total evidence dataset to evaluate the impact of phenotypic characters on hemiphractid phylogenetic relationships. Our main results include: (i) monophyly of Hemiphractidae and its six genera for all our analyses, novel relationships among hemiphractid genera, and non-monophyly of Hemiphractinae according to our preferred phylogenetic hypothesis; (ii) non-monophyly of current supraspecific taxonomies of Gastrotheca, an updated taxonomy is provided; (iii) previous differences among studies were mainly caused by differences in analytical factors, not by differences in character/taxon sampling; (iv) optimality criteria, alignment method, and indel coding caused differences among optimal topologies, in that order of degree; (v) in most cases, parsimony analyses are more sensitive to the addition of phenotypic data than maximum likelihood analyses; (vi) adding phenotypic data resulted in an increase of shared clades for most analyses.
Collapse
Affiliation(s)
- Lourdes Y Echevarría
- Laboratório de Sistemática de Vertebrados, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil.,División de Herpetología-Centro de Ornitología y Biodiversidad (CORBIDI), Urb. Huertos de San Antonio, Santa Rita No. 105 Of. 202, Surco, Lima, Perú
| | - Ignacio De la Riva
- Museo Nacional de Ciencias Naturales-CSIC, C/José Gutiérrez Abascal 2, Madrid, 28006, Spain
| | - Pablo J Venegas
- División de Herpetología-Centro de Ornitología y Biodiversidad (CORBIDI), Urb. Huertos de San Antonio, Santa Rita No. 105 Of. 202, Surco, Lima, Perú
| | | | - Iuri R Dias
- Graduate Program in Zoology, Universidade Estadual de Santa Cruz, Rodovia Jorge Amado, km 16, Ilhéus, Bahia, 45662-900, Brazil
| | - Santiago Castroviejo-Fisher
- Laboratório de Sistemática de Vertebrados, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil.,Department of Herpetology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
24
|
Zhang C, Zhao Y, Braun EL, Mirarab S. TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13696] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology Program University of California San Diego CA USA
| | - Yiming Zhao
- Electrical and Computer Engineering Department University of California San Diego CA USA
| | - Edward L. Braun
- Department of Biology and Genetics Institute University of Florida Gainesville FL USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department University of California San Diego CA USA
| |
Collapse
|
25
|
Aguilar-Vega C, Rivera B, Lucientes J, Gutiérrez-Boada I, Sánchez-Vizcaíno JM. A study of the composition of the Obsoletus complex and genetic diversity of Culicoides obsoletus populations in Spain. Parasit Vectors 2021; 14:351. [PMID: 34217330 PMCID: PMC8254917 DOI: 10.1186/s13071-021-04841-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 06/11/2021] [Indexed: 11/10/2022] Open
Abstract
Background The Culicoides obsoletus species complex (henceforth ‘Obsoletus complex’) is implicated in the transmission of several arboviruses that can cause severe disease in livestock, such as bluetongue, African horse sickness, epizootic hemorrhagic disease and Schmallenberg disease. Thus, this study aimed to increase our knowledge of the composition and genetic diversity of the Obsoletus complex by partial sequencing of the cytochrome c oxidase I (cox1) gene in poorly studied areas of Spain. Methods A study of C. obsoletus populations was carried out using a single-tube multiplex polymerase chain reaction (PCR) assay that was designed to differentiate the Obsoletus complex sibling species Culicoides obsoletus and Culicoides scoticus, based on the partial amplification of the cox1 gene, as well as cox1 georeferenced sequences from Spain available at GenBank. We sampled 117 insects of the Obsoletus complex from six locations and used a total of 238 sequences of C. obsoletus (ss) individuals (sampled here, and from GenBank) from 14 sites in mainland Spain, the Balearic Islands and the Canary Islands for genetic diversity and phylogenetic analyses. Results We identified 90 C. obsoletus (ss), 19 Culicoides scoticus and five Culicoides montanus midges from the six collection sites sampled, and found that the genetic diversity of C. obsoletus (ss) were higher in mainland Spain than in the Canary Islands. The multiplex PCR had limitations in terms of specificity, and no cryptic species within the Obsoletus complex were identified. Conclusions Within the Obsoletus complex, C. obsoletus (ss) was the predominant species in the analyzed sites of mainland Spain. Information about the species composition of the Obsoletus complex could be of relevance for future epidemiological studies when specific aspects of the vector competence and capacity of each species have been identified. Our results indicate that the intraspecific divergence is higher in C. obsoletus (ss) northern populations, and demonstrate the isolation of C. obsoletus (ss) populations of the Canary Islands. Graphical abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04841-z.
Collapse
Affiliation(s)
- Cecilia Aguilar-Vega
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain.
| | - Belén Rivera
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain
| | - Javier Lucientes
- Department of Animal Pathology (Animal Health), Faculty of Veterinary Medicine, AgriFood Institute of Aragón IA2, University of Zaragoza, Zaragoza, Spain
| | - Isabel Gutiérrez-Boada
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain
| | - José Manuel Sánchez-Vizcaíno
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain
| |
Collapse
|
26
|
Abstract
Multiple sequence alignment is a core first step in many bioinformatics analyses, and errors in these alignments can have negative consequences for scientific studies. In this article, we review some of the recent literature evaluating multiple sequence alignment methods and identify specific challenges that arise when performing these evaluations. In particular, we discuss the different trends observed in simulation studies and when using biological benchmarks. Overall, we find that multiple sequence alignment, far from being a "solved problem," would benefit from new attention.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
27
|
EntroPhylo: An entropy-based tool to select phylogenetic informative regions and primer design. INFECTION GENETICS AND EVOLUTION 2021; 92:104857. [PMID: 33838312 DOI: 10.1016/j.meegid.2021.104857] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 03/18/2021] [Accepted: 04/05/2021] [Indexed: 11/24/2022]
Abstract
We present a novel entropy-based computational tool that selects phylogenetic informative genomic regions associated with degenerate primer design. This tool identifies proper phylogenetic markers and proposes suitable degenerate primers to amplify and sequence them. The algorithm calculates the entropy value per site, and the selected region is used for primer design. In order to evaluate the tool, sequences of bovine papillomavirus L1 gene were obtained. Once the molecular region was selected, the primers were designed by the software and used in a PCR reaction for viral detection. Three positive samples were tested with four different concentrations, and it was possible to detect the virus in all samples. The results show the applicability of a tool that can select informative regions for phylogenetic analysis and design primers to amplify and sequence these regions, becoming relevant for several studies focusing on pathogen detection, as well as phylogenetic and genetics studies of populations.
Collapse
|
28
|
Azouri D, Abadi S, Mansour Y, Mayrose I, Pupko T. Harnessing machine learning to guide phylogenetic-tree search algorithms. Nat Commun 2021; 12:1983. [PMID: 33790270 PMCID: PMC8012635 DOI: 10.1038/s41467-021-22073-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 02/26/2021] [Indexed: 02/01/2023] Open
Abstract
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.
Collapse
Affiliation(s)
- Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Shiran Abadi
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Yishay Mansour
- Balvatnik School of Computer Science, Tel-Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
| |
Collapse
|
29
|
Cheng W, Ji T, Zhou S, Shi Y, Jiang L, Zhang Y, Yan D, Yang Q, Song Y, Cai R, Xu W. Molecular epidemiological characteristics of echovirus 6 in mainland China: extensive circulation of genotype F from 2007 to 2018. Arch Virol 2021; 166:1305-1312. [PMID: 33638089 PMCID: PMC8036204 DOI: 10.1007/s00705-020-04934-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 11/04/2020] [Indexed: 11/26/2022]
Abstract
Echovirus 6 (E6) is associated with various clinical diseases and is frequently detected in environmental sewage. Despite its high prevalence in humans and the environment, little is known about its molecular phylogeography in mainland China. In this study, 114 of 21,539 (0.53%) clinical specimens from hand, foot, and mouth disease (HFMD) cases collected between 2007 and 2018 were positive for E6. The complete VP1 sequences of 87 representative E6 strains, including 24 strains from this study, were used to investigate the evolutionary genetic characteristics and geographical spread of E6 strains. Phylogenetic analysis based on VP1 nucleotide sequence divergence showed that, globally, E6 strains can be grouped into six genotypes, designated A to F. Chinese E6 strains collected between 1988 and 2018 were found to belong to genotypes C, E, and F, with genotype F being predominant from 2007 to 2018. There was no significant difference in the geographical distribution of each genotype. The evolutionary rate of E6 was estimated to be 3.631 × 10-3 substitutions site-1 year-1 (95% highest posterior density [HPD]: 3.2406 × 10-3-4.031 × 10-3 substitutions site-1 year-1) by Bayesian MCMC analysis. The most recent common ancestor of the E6 genotypes was traced back to 1863, whereas their common ancestor in China was traced back to around 1962. A small genetic shift was detected in the Chinese E6 population size in 2009 according to Bayesian skyline analysis, which indicated that there might have been an epidemic around that year.
Collapse
Affiliation(s)
- Wenjun Cheng
- Medical School, Anhui University of Science and Technology, Huainan, 232001, Anhui, People's Republic of China
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China
| | - Tianjiao Ji
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China
| | - Shuaifeng Zhou
- Hunan Provincial Centers for Disease Control and Prevention, Changsha, People's Republic of China
| | - Yong Shi
- Jiangxi Provincial Centers for Disease Control and Prevention, Nanchang, People's Republic of China
| | - Lili Jiang
- Yunnan Provincial Centers for Disease Control and Prevention, Kunming, People's Republic of China
| | - Yong Zhang
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China
| | - Dongmei Yan
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China
| | - Qian Yang
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China
| | - Yang Song
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China
| | - Ru Cai
- Medical School, Anhui University of Science and Technology, Huainan, 232001, Anhui, People's Republic of China.
| | - Wenbo Xu
- Medical School, Anhui University of Science and Technology, Huainan, 232001, Anhui, People's Republic of China.
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, People's Republic of China.
| |
Collapse
|
30
|
|
31
|
Ji X, Zhang Z, Holbrook A, Nishimura A, Baele G, Rambaut A, Lemey P, Suchard MA. Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics. Mol Biol Evol 2020; 37:3047-3060. [PMID: 32458974 PMCID: PMC7530611 DOI: 10.1093/molbev/msaa130] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
Collapse
Affiliation(s)
- Xiang Ji
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA
| | - Zhenyu Zhang
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA
| | - Andrew Holbrook
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA
| | - Akihiko Nishimura
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Andrew Rambaut
- Institute of Evolutionary Biology, Centre for Immunology, Infection and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| |
Collapse
|
32
|
Smith SA, Walker-Hale N, Walker JF. Intragenic Conflict in Phylogenomic Data Sets. Mol Biol Evol 2020; 37:3380-3388. [DOI: 10.1093/molbev/msaa170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | | | - Joseph F Walker
- The Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
33
|
Portik DM, Wiens JJ. Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses? Syst Biol 2020; 70:440-462. [PMID: 32797207 DOI: 10.1093/sysbio/syaa064] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/02/2020] [Accepted: 08/03/2020] [Indexed: 11/14/2022] Open
Abstract
Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from hundreds or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e., removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these data sets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data ($\sim $5000 loci) and subsampled data sets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic data sets (e.g., length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several "best practices" for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the phylogenetic results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. [Alignment; concatenated analysis; phylogenomics; sequence length heterogeneity; species-tree analysis; trimming].
Collapse
Affiliation(s)
- Daniel M Portik
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.,California Academy of Sciences, San Francisco, CA 94118, USA
| | - John J Wiens
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
34
|
Vargas-Castro I, Crespo-Picazo JL, Rivera-Arroyo B, Sánchez R, Marco-Cabedo V, Jiménez-Martínez MÁ, Fayos M, Serdio Á, García-Párraga D, Sánchez-Vizcaíno JM. Alpha- and gammaherpesviruses in stranded striped dolphins (Stenella coeruleoalba) from Spain: first molecular detection of gammaherpesvirus infection in central nervous system of odontocetes. BMC Vet Res 2020; 16:288. [PMID: 32787898 PMCID: PMC7425534 DOI: 10.1186/s12917-020-02511-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/06/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Herpesvirus infections in cetaceans have always been attributed to the Alphaherpesvirinae and Gammaherpesvirinae subfamilies. To date, gammaherpesviruses have not been reported in the central nervous system of odontocetes. CASE PRESENTATION A mass stranding of 14 striped dolphins (Stenella coeruleoalba) occurred in Cantabria (Spain) on 18th May 2019. Tissue samples were collected and tested for herpesvirus using nested polymerase chain reaction (PCR), and for cetacean morbillivirus using reverse transcription-PCR. Cetacean morbillivirus was not detected in any of the animals, while gammaherpesvirus was detected in nine male and one female dolphins. Three of these males were coinfected by alphaherpesviruses. Alphaherpesvirus sequences were detected in the cerebrum, spinal cord and tracheobronchial lymph node, while gammaherpesvirus sequences were detected in the cerebrum, cerebellum, spinal cord, pharyngeal tonsils, mesenteric lymph node, tracheobronchial lymph node, lung, skin and penile mucosa. Macroscopic and histopathological post-mortem examinations did not unveil the potential cause of the mass stranding event or any evidence of severe infectious disease in the dolphins. The only observed lesions that may be associated with herpesvirus were three cases of balanitis and one penile papilloma. CONCLUSIONS To the authors' knowledge, this is the first report of gammaherpesvirus infection in the central nervous system of odontocete cetaceans. This raises new questions for future studies about how gammaherpesviruses reach the central nervous system and how infection manifests clinically.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain.
| | | | - Belén Rivera-Arroyo
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain
| | - Rocío Sánchez
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain
| | | | | | - Manena Fayos
- Centro de Recuperación de Fauna Silvestre de Cantabria, 39690, Santander, Spain.,Tragsatec, 39005, Santander, Spain
| | - Ángel Serdio
- Dirección General de Biodiversidad, Medio Ambiente y Cambio Climático, 39011, Santander, Spain
| | | | - José Manuel Sánchez-Vizcaíno
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain
| |
Collapse
|
35
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
36
|
Systems Biology Analysis Reveals Eight SLC22 Transporter Subgroups, Including OATs, OCTs, and OCTNs. Int J Mol Sci 2020; 21:ijms21051791. [PMID: 32150922 PMCID: PMC7084758 DOI: 10.3390/ijms21051791] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 03/02/2020] [Accepted: 03/03/2020] [Indexed: 02/07/2023] Open
Abstract
The SLC22 family of OATs, OCTs, and OCTNs is emerging as a central hub of endogenous physiology. Despite often being referred to as “drug” transporters, they facilitate the movement of metabolites and key signaling molecules. An in-depth reanalysis supports a reassignment of these proteins into eight functional subgroups, with four new subgroups arising from the previously defined OAT subclade: OATS1 (SLC22A6, SLC22A8, and SLC22A20), OATS2 (SLC22A7), OATS3 (SLC22A11, SLC22A12, and Slc22a22), and OATS4 (SLC22A9, SLC22A10, SLC22A24, and SLC22A25). We propose merging the OCTN (SLC22A4, SLC22A5, and Slc22a21) and OCT-related (SLC22A15 and SLC22A16) subclades into the OCTN/OCTN-related subgroup. Using data from GWAS, in vivo models, and in vitro assays, we developed an SLC22 transporter-metabolite network and similar subgroup networks, which suggest how multiple SLC22 transporters with mono-, oligo-, and multi-specific substrate specificity interact to regulate metabolites. Subgroup associations include: OATS1 with signaling molecules, uremic toxins, and odorants, OATS2 with cyclic nucleotides, OATS3 with uric acid, OATS4 with conjugated sex hormones, particularly etiocholanolone glucuronide, OCT with neurotransmitters, and OCTN/OCTN-related with ergothioneine and carnitine derivatives. Our data suggest that the SLC22 family can work among itself, as well as with other ADME genes, to optimize levels of numerous metabolites and signaling molecules, involved in organ crosstalk and inter-organismal communication, as proposed by the remote sensing and signaling theory.
Collapse
|
37
|
McNaughton AL, Revill PA, Littlejohn M, Matthews PC, Ansari MA. Analysis of genomic-length HBV sequences to determine genotype and subgenotype reference sequences. J Gen Virol 2020; 101:271-283. [PMID: 32134374 PMCID: PMC7416611 DOI: 10.1099/jgv.0.001387] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/08/2020] [Indexed: 12/11/2022] Open
Abstract
Hepatitis B virus (HBV) is a diverse, partially double-stranded DNA virus, with 9 genotypes (A-I), and a putative 10th genotype (J), characterized thus far. Given the broadening interest in HBV sequencing, there is an increasing requirement for a consistent, unified approach to HBV genotype and subgenotype classification. We set out to generate an updated resource of reference sequences using the diversity of all genomic-length HBV sequences available in public databases. We collated and aligned genomic-length HBV sequences from public databases and used maximum-likelihood phylogenetic analysis to identify genotype clusters. Within each genotype, we examined the phylogenetic support for currently defined subgenotypes, as well as identifying well-supported clades and deriving reference sequences for them. Based on the phylogenies generated, we present a comprehensive set of HBV reference sequences at the genotype and subgenotype level. All of the generated data, including the alignments, phylogenies and chosen reference sequences, are available online (https://doi.org/10.6084/m9.figshare.8851946) as a simple open-access resource.
Collapse
Affiliation(s)
- Anna L. McNaughton
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, UK
| | - Peter A. Revill
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Margaret Littlejohn
- Victorian Infectious Diseases Reference Laboratory, Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Philippa C. Matthews
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, UK
- Department of Infectious Diseases and Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- Oxford NIHR Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - M. Azim Ansari
- Wellcome Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
| |
Collapse
|
38
|
Khaledian E, Brayton KA, Broschat SL. A Systematic Approach to Bacterial Phylogeny Using Order Level Sampling and Identification of HGT Using Network Science. Microorganisms 2020; 8:microorganisms8020312. [PMID: 32102454 PMCID: PMC7074868 DOI: 10.3390/microorganisms8020312] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/12/2020] [Accepted: 02/20/2020] [Indexed: 11/16/2022] Open
Abstract
Reconstructing and visualizing phylogenetic relationships among living organisms is a fundamental challenge because not all organisms share the same genes. As a result, the first phylogenetic visualizations employed a single gene, e.g., rRNA genes, sufficiently conserved to be present in all organisms but divergent enough to provide discrimination between groups. As more genome data became available, researchers began concatenating different combinations of genes or proteins to construct phylogenetic trees believed to be more robust because they incorporated more information. However, the genes or proteins chosen were based on ad hoc approaches. The large number of complete genome sequences available today allows the use of whole genomes to analyze relationships among organisms rather than using an ad hoc set of genes. We present a systematic approach for constructing a phylogenetic tree based on simultaneously clustering the complete proteomes of 360 bacterial species. From the homologous clusters, we identify 49 protein sequences shared by 99% of the organisms to build a tree. Of the 49 sequences, 47 have homologous sequences in both archaea and eukarya. The clusters are also used to create a network from which bacterial species with horizontally-transferred genes from other phyla are identified.
Collapse
Affiliation(s)
- Ehdieh Khaledian
- School of Electrical Engineering and Computer Science, Washington State University, P.O. Box 642752, Pullman, WA 99164, USA; (K.A.B.); (S.L.B.)
- Correspondence:
| | - Kelly A. Brayton
- School of Electrical Engineering and Computer Science, Washington State University, P.O. Box 642752, Pullman, WA 99164, USA; (K.A.B.); (S.L.B.)
- Department of Veterinary Microbiology and Pathology, Washington State University, P.O. Box 647040, Pullman, WA 99164, USA
- Paul G. Allen School for Global Animal Health, Washington State University, P.O. Box 647090, Pullman, WA 99164, USA
| | - Shira L. Broschat
- School of Electrical Engineering and Computer Science, Washington State University, P.O. Box 642752, Pullman, WA 99164, USA; (K.A.B.); (S.L.B.)
- Department of Veterinary Microbiology and Pathology, Washington State University, P.O. Box 647040, Pullman, WA 99164, USA
- Paul G. Allen School for Global Animal Health, Washington State University, P.O. Box 647090, Pullman, WA 99164, USA
| |
Collapse
|
39
|
Modi V, Dunbrack RL. A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains. Sci Rep 2019; 9:19790. [PMID: 31875044 PMCID: PMC6930252 DOI: 10.1038/s41598-019-56499-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 11/14/2019] [Indexed: 12/21/2022] Open
Abstract
Studies on the structures and functions of individual kinases have been used to understand the biological properties of other kinases that do not yet have experimental structures. The key factor in accurate inference by homology is an accurate sequence alignment. We present a parsimonious, structure-based multiple sequence alignment (MSA) of 497 human protein kinase domains excluding atypical kinases. The alignment is arranged in 17 blocks of conserved regions and unaligned blocks in between that contain insertions of varying lengths present in only a subset of kinases. The aligned blocks contain well-conserved elements of secondary structure and well-known functional motifs, such as the DFG and HRD motifs. From pairwise, all-against-all alignment of 272 human kinase structures, we estimate the accuracy of our MSA to be 97%. The remaining inaccuracy comes from a few structures with shifted elements of secondary structure, and from the boundaries of aligned and unaligned regions, where compromises need to be made to encompass the majority of kinases. A new phylogeny of the protein kinase domains in the human genome based on our alignment indicates that ten kinases previously labeled as "OTHER" can be confidently placed into the CAMK group. These kinases comprise the Aurora kinases, Polo kinases, and calcium/calmodulin-dependent kinase kinases.
Collapse
Affiliation(s)
- Vivek Modi
- Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA, 19111, USA
| | - Roland L Dunbrack
- Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA, 19111, USA.
| |
Collapse
|
40
|
Du Y, Wu S, Edwards SV, Liu L. The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life. BMC Evol Biol 2019; 19:203. [PMID: 31694538 PMCID: PMC6833305 DOI: 10.1186/s12862-019-1534-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 10/21/2019] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.
Collapse
Affiliation(s)
- Yan Du
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30606 USA
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu 221116 People’s Republic of China
| | - Scott V. Edwards
- Department of Organismic & Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138 USA
| | - Liang Liu
- Liang Liu, Department of Statistics and Institute of Bioinformatics, University of Georgia, 310 Herty Drive, Athens, GA 30606 USA
| |
Collapse
|
41
|
Sloutsky R, Naegle KM. ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models. eLife 2019; 8:e47676. [PMID: 31621582 PMCID: PMC6797483 DOI: 10.7554/elife.47676] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Accepted: 09/19/2019] [Indexed: 12/27/2022] Open
Abstract
Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.
Collapse
Affiliation(s)
- Roman Sloutsky
- Program in Computational and Systems BiologyWashington UniversitySt. LouisUnited States
- Department for Biomedical EngineeringWashington UniversitySt. LouisUnited States
- Department of Biochemistry and Molecular BiologyUniversity of MassachusettsAmherstUnited States
- Center for Biological Systems EngineeringWashington UniversitySt. LouisUnited States
| | - Kristen M Naegle
- Department for Biomedical EngineeringWashington UniversitySt. LouisUnited States
- Center for Biological Systems EngineeringWashington UniversitySt. LouisUnited States
- Department of Biomedical EngineeringUniversity of VirginiaCharlottesvilleUnited States
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleUnited States
| |
Collapse
|
42
|
Ali RH, Bogusz M, Whelan S. Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments. Mol Biol Evol 2019; 36:2340-2351. [PMID: 31209473 PMCID: PMC6933875 DOI: 10.1093/molbev/msz142] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Multiple sequence alignment (MSA) is ubiquitous in evolution and bioinformatics. MSAs are usually taken to be a known and fixed quantity on which to perform downstream analysis despite extensive evidence that MSA accuracy and uncertainty affect results. These errors are known to cause a wide range of problems for downstream evolutionary inference, ranging from false inference of positive selection to long branch attraction artifacts. The most popular approach to dealing with this problem is to remove (filter) specific columns in the MSA that are thought to be prone to error. Although popular, this approach has had mixed success and several studies have even suggested that filtering might be detrimental to phylogenetic studies. We present a graph-based clustering method to address MSA uncertainty and error in the software Divvier (available at https://github.com/simonwhelan/Divvier), which uses a probabilistic model to identify clusters of characters that have strong statistical evidence of shared homology. These clusters can then be used to either filter characters from the MSA (partial filtering) or represent each of the clusters in a new column (divvying). We validate Divvier through its performance on real and simulated benchmarks, finding Divvier substantially outperforms existing filtering software by retaining more true pairwise homologies calls and removing more false positive pairwise homologies. We also find that Divvier, in contrast to other filtering tools, can alleviate long branch attraction artifacts induced by MSA and reduces the variation in tree estimates caused by MSA uncertainty.
Collapse
Affiliation(s)
- Raja Hashim Ali
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
- Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan
| | - Marcin Bogusz
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Simon Whelan
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
43
|
Matsui M, Iwasaki W. Graph Splitting: A Graph-Based Approach for Superfamily-Scale Phylogenetic Tree Reconstruction. Syst Biol 2019; 69:265-279. [DOI: 10.1093/sysbio/syz049] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 07/09/2019] [Accepted: 07/20/2019] [Indexed: 11/12/2022] Open
Abstract
Abstract
A protein superfamily contains distantly related proteins that have acquired diverse biological functions through a long evolutionary history. Phylogenetic analysis of the early evolution of protein superfamilies is a key challenge because existing phylogenetic methods show poor performance when protein sequences are too diverged to construct an informative multiple sequence alignment (MSA). Here, we propose the Graph Splitting (GS) method, which rapidly reconstructs a protein superfamily-scale phylogenetic tree using a graph-based approach. Evolutionary simulation showed that the GS method can accurately reconstruct phylogenetic trees and be robust to major problems in phylogenetic estimation, such as biased taxon sampling, heterogeneous evolutionary rates, and long-branch attraction when sequences are substantially diverge. Its application to an empirical data set of the triosephosphate isomerase (TIM)-barrel superfamily suggests rapid evolution of protein-mediated pyrimidine biosynthesis, likely taking place after the RNA world. Furthermore, the GS method can also substantially improve performance of widely used MSA methods by providing accurate guide trees.
Collapse
Affiliation(s)
- Motomu Matsui
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan
- Atmosphere and Ocean Research Institute, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8564, Japan
| |
Collapse
|
44
|
Lambert MÈ, Arsenault J, Delisle B, Audet P, Poljak Z, D'Allaire S. Impact of alignment algorithm on the estimation of pairwise genetic similarity of porcine reproductive and respiratory syndrome virus (PRRSV). BMC Vet Res 2019; 15:135. [PMID: 31068211 PMCID: PMC6505299 DOI: 10.1186/s12917-019-1890-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 04/29/2019] [Indexed: 12/19/2022] Open
Abstract
Background Porcine reproductive and respiratory syndrome (PRRS) is a major threat to the swine industry. It is caused by the PRRS virus (PRRSV). Determination and comparison of the nucleotide sequences of PRRSV strains provides useful information in support of control initiatives or epidemiological studies on transmission patterns. The alignment of sequences is the first step in analyzing sequence data, with multiple algorithms being available, but little is known on the impact of this methodological choice. Here, a study was conducted to evaluate the impact of different alignment algorithms on the resulting aligned sequence dataset and on practical issues when applied to a large field database of PRRSV open reading frame (ORF) 5 sequences collected in Quebec, Canada, from 2010 to 2014. Five multiple sequence alignment programs were compared: Clustal W, Clustal Omega, Muscle, T-Coffee and MAFFT. Results The resulting alignments showed very similar results in terms of average pairwise genetic similarity, proportion of pairwise comparisons having ≥97.5% genetic similarity and sum of pairs (SP) score, except for T-Coffee where increased length of aligned datasets as well as limitation to handle large datasets were observed. Conclusions Based on efficiency at minimizing the number of gaps in different dataset sizes with default open gap values as well as the capability to handle a large number of sequences in a timely manner, the use of Clustal Omega might be recommended for the management of PRRSV extensive database for both research and surveillance purposes.
Collapse
Affiliation(s)
- Marie-Ève Lambert
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada. .,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.
| | - Julie Arsenault
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| | - Benjamin Delisle
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| | - Pascal Audet
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
| | - Sylvie D'Allaire
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| |
Collapse
|
45
|
Plastid phylogenomic insights into the evolution of Caryophyllales. Mol Phylogenet Evol 2019; 134:74-86. [DOI: 10.1016/j.ympev.2018.12.023] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 12/17/2018] [Accepted: 12/19/2018] [Indexed: 11/22/2022]
|
46
|
Nute M, Saleh E, Warnow T. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets. Syst Biol 2019; 68:396-411. [PMID: 30329135 PMCID: PMC6472439 DOI: 10.1093/sysbio/syy068] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/27/2018] [Accepted: 10/11/2018] [Indexed: 01/15/2023] Open
Abstract
The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical coestimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In other words, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities. [BAli-Phy; homology; multiple sequence alignment; protein sequences; structural alignment.]
Collapse
Affiliation(s)
- Michael Nute
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Ehsan Saleh
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1205 W. Clark St., Urbana, IL 61801, USA.,National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
47
|
He J, Zhao H, Cheng Z, Ke Y, Liu J, Ma H. Evolution Analysis of the Fasciclin-Like Arabinogalactan Proteins in Plants Shows Variable Fasciclin-AGP Domain Constitutions. Int J Mol Sci 2019; 20:E1945. [PMID: 31010036 PMCID: PMC6514703 DOI: 10.3390/ijms20081945] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/17/2019] [Accepted: 04/19/2019] [Indexed: 01/03/2023] Open
Abstract
The fasciclin-like arabinogalactan proteins (FLAs) play important roles in plant development and adaptation to the environment. FLAs contain both fasciclin domains and arabinogalactan protein (AGP) regions, which have been identified in several plants. The evolutionary history of this gene family in plants is still undiscovered. In this study, we identified the FLA gene family in 13 plant species covering major lineages of plants using bioinformatics methods. A total of 246 FLA genes are identified with gene copy numbers ranging from one (Chondrus crispus) to 49 (Populus trichocarpa). These FLAs are classified into seven groups, mainly based on the phylogenetic analysis of plant FLAs. All FLAs in land plants contain one or two fasciclin domains, while in algae, several FLAs contain four or six fasciclin domains. It has been proposed that there was a divergence event, represented by the reduced number of fasciclin domains from algae to land plants in evolutionary history. Furthermore, introns in FLA genes are lost during plant evolution, especially from green algae to land plants. Moreover, it is found that gene duplication events, including segmental and tandem duplications are essential for the expansion of FLA gene families. The duplicated gene pairs in FLA gene family mainly evolve under purifying selection. Our findings give insight into the origin and expansion of the FLA gene family and help us understand their functions during the process of evolution.
Collapse
Affiliation(s)
- Jiadai He
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Hua Zhao
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Zhilu Cheng
- College of Landscape Architecture and Arts, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Yuwei Ke
- College of Life Sciences, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Jiaxi Liu
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Haoli Ma
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| |
Collapse
|
48
|
Six Impossible Things before Breakfast: Assumptions, Models, and Belief in Molecular Dating. Trends Ecol Evol 2019; 34:474-486. [PMID: 30904189 DOI: 10.1016/j.tree.2019.01.017] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 01/16/2023]
Abstract
Confidence in molecular dating analyses has grown with the increasing sophistication of the methods. Some problematic cases where molecular dates disagreed with paleontological estimates appear to have been resolved with a growing agreement between molecules and fossils. But we cannot relax just yet. The growing analytical sophistication of many molecular dating methods relies on an increasingly large number of assumptions about evolutionary history and processes. Many of these assumptions are based on statistical tractability rather than being informed by improved understanding of molecular evolution, yet changing the assumptions can influence molecular dates. How can we tell if the answers we get are driven more by the assumptions we make than by the molecular data being analyzed?
Collapse
|
49
|
High-Throughput Reconstruction of Ancestral Protein Sequence, Structure, and Molecular Function. Methods Mol Biol 2019; 1851:135-170. [PMID: 30298396 DOI: 10.1007/978-1-4939-8736-8_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Ancestral protein sequence reconstruction is a powerful technique for explicitly testing hypotheses about the evolution of molecular function, allowing researchers to meticulously dissect how historical changes in protein sequence impacted functional repertoire by altering the protein's 3D structure. These techniques have provided concrete, experimentally validated insights into ancient evolutionary processes and help illuminate the complex relationship between protein sequence, structure, and function. Inferring the protein family phylogenies on which ancestral sequence reconstruction depends and reconstructing the sequences, themselves, are amenable to high-throughput computational analysis. However, determining the structures of ancestral-reconstructed proteins and characterizing their functions typically rely on time-consuming and expensive laboratory analyses, limiting most current studies to examining a relatively small number of specific hypotheses. For this reason, we have little detailed, unbiased information about how molecular function evolves across large protein family phylogenies. Here we describe a generalized protocol that integrates ancestral sequence reconstruction with structural homology modeling and structure-based molecular affinity prediction to characterize historical changes in protein function across families with thousands of individual sequences. We highlight key steps in the analysis protocol requiring particularly careful attention to avoid introducing potential errors as well as steps for which computationally efficient subroutines can be substituted for more intensive approaches, allowing researchers to scale the analysis up or down, depending on available resources and requirements for reproducibility and scientific rigor. In our view, this approach provides a compelling compliment to more laboratory-intensive procedures, generating important contextual information that can help guide detailed experiments.
Collapse
|
50
|
Herman JL. Enhancing Statistical Multiple Sequence Alignment and Tree Inference Using Structural Information. Methods Mol Biol 2019; 1851:183-214. [PMID: 30298398 DOI: 10.1007/978-1-4939-8736-8_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
For highly divergent sequences, there is often insufficient information to reliably construct alignments and phylogenetic trees. Since protein structure may be strongly conserved despite large divergences in sequence, structural information can be used to help identify homology in such cases.While there exist well-studied models of sequence evolution, structurally informed alignment methods have typically made use of geometric measures of deviation that do not take into account the underlying mutational processes. In order to integrate structural information into sequence-based evolutionary models, we recently developed a stochastic model of structural evolution on a phylogenetic tree and implemented this as the StructAlign plugin for the StatAlign statistical alignment package.In this chapter, we will outline the types of analyses that can be carried out using StructAlign, illustrating how the inclusion of structural information can be used to inform joint estimation of alignments and trees. StructAlign can also be used to infer branch-specific rates of structural evolution, and analysis of an example globin dataset highlights strong variation in the inferred rate across the tree. While structure is more highly conserved within clades, the rate of structural divergence as a function of sequence variation is larger between functionally divergent proteins. Allowing for the rate of structural divergence to vary over the tree results in an improved fit to the empirically observed pairwise RMSD values.
Collapse
Affiliation(s)
- Joseph L Herman
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|