Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

87
(from Reference Citation Analysis)

Article PDFs (52)

Cited by > 0 (60)

Searched Name

phylogenetic trees

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	The k-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees. J Comput Biol 2024;31:328-344. [PMID: 38271573 PMCID: PMC11057537 DOI: 10.1089/cmb.2023.0312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024] Open Abstract Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels. Collapse Key Words k-Robinson–Foulds dissimilarity labeled trees phylogenetic trees Collapse MESH Headings Humans Algorithms Neoplasms/genetics Mutation Computational Biology/methods Phylogeny Collapse Grants Collapse
2	TreeViewer: Flexible, modular software to visualise and manipulate phylogenetic trees. Ecol Evol 2024;14:e10873. [PMID: 38314311 PMCID: PMC10834882 DOI: 10.1002/ece3.10873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 12/04/2023] [Indexed: 02/06/2024] Open Abstract Phylogenetic trees illustrate evolutionary relationships between taxa or genes. Tree figures are crucial when presenting results and data, and by creating clear and effective plots, researchers can describe many kinds of evolutionary patterns. However, producing tree plots can be a time-consuming task, especially as multiple different programs are often needed to adjust and illustrate all data associated with a tree. We present TreeViewer, a new software to draw phylogenetic trees. TreeViewer is flexible, modular, and user-friendly. Plots are produced as the result of a user-defined pipeline, which can be finely customised and easily applied to different trees. Every feature of the program is documented and easily accessible, either in the online manual or within the program's interface. We show how TreeViewer can be used to produce publication-ready figures, saving time by not requiring additional graphical post-processing tools. TreeViewer is freely available for Windows, macOS, and Linux operating systems and distributed under an AGPLv3 licence from https://treeviewer.org. It has a graphical user interface (GUI), as well as a command-line interface, which is useful to work with very large trees and for automated pipelines. A detailed user manual with examples and tutorials is also available. TreeViewer is mainly aimed at users wishing to produce highly customised, publication-quality tree figures using a single GUI software tool. Compared to other GUI tools, TreeViewer offers a richer feature set and a finer degree of customisation. Compared to command-line-based tools and software libraries, TreeViewer's graphical interface is more accessible. The flexibility of TreeViewer's approach to phylogenetic tree plotting enables the program to produce a wide variety of publication-ready figures. Users are encouraged to create their own custom modules to expand the functionalities of the program. This sets the scene for an ever-expanding and ever-adapting software framework that can easily adjust to respond to new challenges. Collapse Key Words NEXUS Newick figures graphical interface phylogenetic trees phylogenetics Collapse MESH Headings Collapse Grants University of Bristol Royal Society Collapse
3	Molecular Characterization and Pathogenicity of an Infectious cDNA Clone of Youcai Mosaic Virus on Solanum nigrum. Int J Mol Sci 2024;25:1620. [PMID: 38338897 PMCID: PMC10855738 DOI: 10.3390/ijms25031620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/02/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open Abstract Virus infections cause devastative economic losses for various plant species, and early diagnosis and prevention are the most effective strategies to avoid the losses. Exploring virus genomic evolution and constructing virus infectious cDNA clones is essential to achieve a deeper understanding of the interaction between host plant and virus. Therefore, this work aims to guide people to better prevent, control, and utilize the youcai mosaic virus (YoMV). Here, the YoMV was found to infect the Solanum nigrum under natural conditions. Then, an infectious cDNA clone of YoMV was successfully constructed using triple-shuttling vector-based yeast recombination. Furthermore, we established phylogenetic trees based on the complete genomic sequences, the replicase gene, movement protein gene, and coat protein gene using the corresponding deposited sequences in NCBI. Simultaneously, the evolutionary relationship of the YoMV discovered on S. nigrum to others was determined and analyzed. Moreover, the constructed cDNA infectious clone of YoMV from S. nigrum could systematically infect the Nicotiana benthamiana and S. nigrum by agrobacterium-mediated infiltration. Our investigation supplied a reverse genetic tool for YoMV study, which will also contribute to in-depth study and profound understanding of the interaction between YoMV and host plant. Collapse Key Words Solanum nigrum L. infectious cDNA clone phylogenetic trees youcai mosaic virus Collapse MESH Headings Humans Virulence Solanum nigrum/genetics DNA, Complementary/genetics Phylogeny Tobamovirus/genetics Plant Diseases Collapse Grants Collapse
4	Robustness of Felsenstein's Versus Transfer Bootstrap Supports With Respect to Taxon Sampling. Syst Biol 2023;72:1280-1295. [PMID: 37756489 PMCID: PMC10939309 DOI: 10.1093/sysbio/syad052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/26/2023] [Accepted: 08/09/2023] [Indexed: 09/29/2023] Open Abstract The bootstrap method is based on resampling sequence alignments and re-estimating trees. Felsenstein's bootstrap proportions (FBP) are the most common approach to assess the reliability and robustness of sequence-based phylogenies. However, when increasing taxon sampling (i.e., the number of sequences) to hundreds or thousands of taxa, FBP tend to return low support for deep branches. The transfer bootstrap expectation (TBE) has been recently suggested as an alternative to FBP. TBE is measured using a continuous transfer index in [0,1] for each bootstrap tree, instead of the binary {0,1} index used in FBP to measure the presence/absence of the branch of interest. TBE has been shown to yield higher and more informative supports while inducing a very low number of falsely supported branches. Nonetheless, it has been argued that TBE must be used with care due to sampling issues, especially in datasets with a high number of closely related taxa. In this study, we conduct multiple experiments by varying taxon sampling and comparing FBP and TBE support values on different phylogenetic depths, using empirical datasets. Our results show that the main critique of TBE stands in extreme cases with shallow branches and highly unbalanced sampling among clades, but that TBE is still robust in most cases, while FBP is inescapably negatively impacted by high taxon sampling. We suggest guidelines and good practices in TBE (and FBP) computing and interpretation. Collapse Key Words Felsenstein’s bootstrap phylogenetic trees support robustness taxon sampling transfer bootstrap Collapse MESH Headings Phylogeny Reproducibility of Results Collapse Grants ANR-19-P3IA-001 Paris Artificial Intelligence Research Institute Collapse
5	Predicting Impacts of Contact Tracing on Epidemiological Inference from Phylogenetic Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.567148. [PMID: 38076930 PMCID: PMC10705478 DOI: 10.1101/2023.11.30.567148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Abstract Robust sampling methods are foundational to many inference problems in the phylodynamic field, yet the impact of using contact tracing, a type of non-uniform sampling used in public health applications, is not well understood. To investigate and quantify how this non-uniform sampling method influences recovered phylogenetic tree structure, we developed a new simulation tool called SEEPS (Sequence Evolution and Epidemiological Process Simulator) that allows for the simulation of contact tracing and the resulting transmission tree, pathogen phylogeny, and corresponding virus genetic sequences. Importantly, SEEPS takes within-host evolution into account when generating pathogen phylogenies and sequences from transmission histories. Using SEEPS, we demonstrate that contact tracing can significantly impact the structure of the resulting tree as described by popular tree statistics. Contact tracing generates phylogenies that are less balanced than the underlying transmission process, less representative of the larger epidemiological process, and affects the internal/external branch length ratios that characterize specific epidemiological scenarios. We also examine a 2007-2008 Swedish HIV-1 outbreak and the broader 1998-2010 European HIV-1 epidemic to highlight the differences in contact tracing and expected phylogenies. Aided by SEEPS, we show that the Swedish outbreak was strongly influenced by contact tracing even after downsampling, while the broader European Union epidemic showed little evidence of universal contact tracing, agreeing with the known epidemiological information about sampling and spread. SEEPS is available at github.com/MolEvolEpid/SEEPS. Collapse Key Words Contact tracing HIV-1 phylodynamics phylogenetic inference phylogenetic trees Collapse MESH Headings Collapse Grants R01 AI087520 NIAID NIH HHS Collapse
6	An Unsupervised Classifier for Whole-Genome Phylogenies, the Maxwell© Tool. Int J Mol Sci 2023;24:16278. [PMID: 38003468 PMCID: PMC10671764 DOI: 10.3390/ijms242216278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/20/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open Abstract The development of phylogenetic trees based on RNA or DNA sequences generally requires a precise and limited choice of important RNAs, e.g., messenger RNAs of essential proteins or ribosomal RNAs (like 16S), but rarely complete genomes, making it possible to explain evolution and speciation. In this article, we propose revisiting a classic phylogeny of archaea from only the information on the succession of nucleotides of their entire genome. For this purpose, we use a new tool, the unsupervised classifier Maxwell, whose principle lies in the Burrows-Wheeler compression transform, and we show its efficiency in clustering whole archaeal genomes. Collapse Key Words Burrows–Wheeler compression transform Vitányi distance maxwell classifier normalized compression distance (NCD) phylogenetic trees unsupervised classifier Collapse MESH Headings Phylogeny Genome Archaea/genetics RNA, Ribosomal Base Sequence Collapse Grants Collapse
7	Contribution to the Knowledge of Gastrointestinal Nematodes in Roe Deer (Capreolus capreolus) from the Province of León, Spain: An Epidemiological and Molecular Study. Animals (Basel) 2023;13:3117. [PMID: 37835723 PMCID: PMC10571729 DOI: 10.3390/ani13193117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/14/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023] Open Abstract A study of gastrointestinal nematodes in roe deer was carried out in the regional hunting reserves of Riaño and Mampodre, Province of León, Spain, to provide information on their prevalence and intensity of infection in relation to the sampling areas, age of the animals, and body weight. Through a regulated necropsy of the animals, all of them harbored gastrointestinal nematodes in their digestive tract, with a mean intensity of parasitism of 638 ± 646.1 nematodes/infected animal. Eleven genera were found and 18 species of gastrointestinal nematodes were identified, three of them polymorphic: Trichostrongylus axei, Trichostrongylus vitrinus, Trichostrongylus capricola, Trichostrongylus colubriformis, Haemonchus contortus, Spiculopteragia spiculoptera/Spiculopteragia mathevossiani, Ostertagia leptospicularis/Ostertagia kolchida, Ostertagia (Grosspiculopteragia) occidentalis, Teladorsagia circumcincta/Teladorsagia trifurcate, Marshallagia marshalli, Nematodirus europaeus, Cooperia oncophora, Capillaria bovis, Oesophagostomum venulosum, and Trichuris ovis. All of them have already been cited in roe deer in Europe, but Marshallagia marshalli, Capillaria bovis, and Ostertagia (Grosspiculopteragia) occidentalis are reported for the first time in Spain in this host. The abomasum was the intestinal section, where the prevalence (98.9%) and mean intensity (x¯ = 370.7 ± 374.4 worms/roe deer; range 3-1762) were significantly higher, but no statistically significant differences were found when comparing the sampling areas and age of animals. The animals with lower body weight had a higher parasite load than those in better physical condition, finding, in this case, statistically significant differences (p = 0.0020). Seven genera and 14 species were identified. In the small intestine, 88% of the animals examined presented gastrointestinal nematodes, with an average intensity of x¯ = 131.7 ± 225.6 parasites/infected animal, ranging between 4-1254 worms. No statistically significant differences were found when the three parameters studied were compared. Four genera and seven species were identified. In the large intestine/cecum, 78.3% of the examined roe deer presented adult worms, with an average intensity of 6.3 ± 5.5 worms/infected animal; range 1-26 worms. Only statistically significant differences were observed when considering the mean intensity of parasitism and the sampling area (p = 0.0093). Two genera and two species were identified. Several of the species found in the study were studied molecularly, and with the sequences obtained compared with those deposited in GenBank, phylogenetic trees were prepared to determine their taxonomic status. Using coprological techniques, the existing correlation in the shedding of gastrointestinal nematode eggs in roe deer was investigated with that of semi-extensive sheep farms in the same study area to verify the existence of cross-transmission of these parasites between wild and domestic animals. The high values found in the studied parameters show that northern Spain is an area of high-intensity infection for roe deer. Collapse Key Words gastrointestinal nematodes intensity molecular studies phylogenetic trees prevalence roe deer species Collapse MESH Headings Collapse Grants This study was supported by a Research Project 2004-02580-C03-03/BOS from the Spanish "Ministerio de Ciencia y Tecnología" and for the "Grant FIS PI16/00002 (Instituto de Salud Carlos III and cofunded by European Union ERDF/ESF, "Investing in your future" Ministerio de Ciencia y Tecnología Collapse
8	The complete chloroplast genome of Aegle marmelos and its phylogenetic analysis. Mitochondrial DNA B Resour 2023;8:787-790. [PMID: 37521904 PMCID: PMC10375917 DOI: 10.1080/23802359.2023.2238934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 07/15/2023] [Indexed: 08/01/2023] Open Abstract Aegle marmelos (L.) Correa 1800, a plant belonging to the Rutaceae family, is extensively used in Tibetan medicine. We employed Illumina HiSeq reads to assemble the complete chloroplast (cp) genome of A. marmelos, which spans 144,538 bp. The genome comprises 114 genes, including 75 protein-coding genes, 31 tRNA genes, and 8 rRNA genes. It is characterized by four regions: The large single-copy (LSC) region (74,253 bp), the inverted repeat A (IRa) region (26,015 bp), the small single-copy (SSC) region (18,255 bp), and the inverted repeat B (IRb) region (26,015 bp). Phylogenomic analysis demonstrated a close relationship between A. marmelos and Citrus. The assembly of The cp genome in this study serves as a foundation for conservation efforts and phylogenetic investigations of A. marmelos, paving the way for future experimentation. Collapse Key Words Aegle marmelos Rutaceae chloroplast genome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
9	MAMMLE: A Framework for Phylogeny Estimation Based on Multiobjective Application-aware Multiple Sequence Alignment and Maximum Likelihood Ensemble. J Comput Biol 2023;30:245-249. [PMID: 36706434 DOI: 10.1089/cmb.2021.0533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open Abstract *Motivation:* Phylogenetic trees are often inferred from a multiple sequence alignment (MSA) where the tree accuracy is heavily impacted by the nature of estimated alignment. Carefully equipping an MSA tool with multiple application-aware objectives positively impacts its capability to yield better trees. *Results:* We introduce Multiobjective Application-aware Multiple Sequence Alignment and Maximum Likelihood Ensemble (MAMMLE), a framework for inferring better phylogenetic trees from unaligned sequences by hybridizing two MSA tools [i.e., Multiple Sequence Comparison by Log-Expectation (MUSCLE) and Multiple Alignment using Fast Fourier Transform (MAFFT)] with multiobjective optimization strategy and leveraging multiple maximum likelihood hypotheses. In our experiments, MAMMLE exhibits 5.57% (4.77%) median improvement (deterioration) over MUSCLE on 50.34% (37.41%) of instances. Collapse Key Words multiple alignment phylogenetic trees software framework Collapse MESH Headings Collapse Grants Collapse
10	The whole-genome sequencing of prevalent DENV-1 strains during the largest dengue virus outbreak in Xishuangbanna Dai autonomous prefecture in 2019. J Med Virol 2023;95:e28115. [PMID: 36059257 DOI: 10.1002/jmv.28115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 01/11/2023] Abstract In 2019, a serious dengue virus (DENV) infection broke out in the Xishuangbanna Dai Autonomous Prefecture, China. Therefore, we conducted a molecular epidemiological analysis in people that contracted DENV serotype 1 (DENV-1) during this year. We analyzed the molecular epidemiology of six DENV-1 epidemic strains in 2019 by full-length genome sequencing, amino acid mutation site analysis, evolutionary tree analysis, and recombination site comparison analysis. Through the analysis of amino acid mutation sites, it was found that DENV-1 strain (MW386867) was different from the other five epidemic DENV-1 strains in Xishuangbanna in 2019. MW386867 had unique mutation sites at six loci. The six epidemic DENV-1 strains in Xishuangbanna in 2019 were divided into two clusters. MW386867 was highly similar to the MG679800 (Myanmar 2017), MG679801 (Myanmar 2017), and KC172834 (Laos 2008), and the other five strains were highly similar to JQ045660 (Vietnam 2011), FJ176780 (GuangDong 2006). Genetic recombination analysis revealed that there was no recombination signal in the six epidemic DENV-1 strains in Xishuangbanna in 2019. We speculate that the DENV-1 epidemic in 2019 has a co-epidemic of local strains and cross-border strains. Collapse Key Words amino acid mutation sites dengue virus genetic recombination phylogenetic trees whole-genome sequencing Collapse MESH Headings Collapse Grants Collapse
11	Sequencing and analysis of the complete mitochondrial genome of Eothenomys eleusis Thomas 1911 from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2023;8:493-496. [PMID: 37057130 PMCID: PMC10088922 DOI: 10.1080/23802359.2023.2197087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023] Open Abstract The complete mitogenome sequence of Eothenomys eleusis Thomas 1911 was determined using PCR. A circular double-stranded structure makes up the mitochondrial genome of E. eleusis. The complete length of the mitochondrial genome is 16,419 bp. The mitochondrial genome of E. eleusis included 13 protein-coding genes, 1 control region, 22 tRNA genes, 2 rRNA genes and 1 origin of L strand replication. The total base composition of E. eleusis mitochondrial genome was A (32.6%), T (26.3%), G (13.6%) and C (27.5%). We found significant A-T skew in base composition, especially in control regions and protein-coding genes. E. eleusis was supported by bootstrap values of 100%. This study verifies the evolutionary status of E. eleusis in Myodini tribe of Cricetidae at the molecular level. The mitochondrial genome would be a significant supplement for the E. eleusis genetic background. Collapse Key Words Eothenomys eleusi Mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
12	The scaling of diversification rates with age is likely explained by sampling bias. Evolution 2022;76:1625-1637. [PMID: 35567800 DOI: 10.1111/evo.14515] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 04/07/2022] [Indexed: 01/22/2023] Abstract Numerous phylogenetic studies reported the existence of a pervasive scaling relationship between the ages of extant eukaryotic clades and their estimated diversification rates. The causes of this age-rate-scaling (ARS), whether biological and/or artifactual, remain unresolved. Here we fit diversification models to thousands of eukaryotic time-calibrated phylogenies to explore multiple potential causes of the ARS including parameter non-identifiability, model inadequacy, biases in taxonomic practice, and an important and ubiquitous form of sampling bias-preferentially analyzing larger extant clades. We distinguish between two mechanism by which such sampling biases can cause an ARS: First, by favoring clades that happen to be unusually large merely by chance (i.e., due to the stochastic nature of the cladogenic process), thus leading to rate overestimation, and second, by favoring clades that have truly higher diversification rates. We find that, of the proposed explanations, only sampling biases are likely to contribute to the observed ARS. We develop methods for fully correcting for sampling bias mechanism 1, and find that despite these corrections a substantial ARS remains. We then confirm using simulations that preferring trees with truly higher rates (mechanism 2) likely explains this residual ARS. Since we do not have a completely unbiased sample of clades, including extinct ones, for phylogenetic analyses, it is difficult to demonstrate unambiguously that sampling biases are the sole cause of the ARS. Sampling biases are, however, a parsimonious and plausible explanation for this widely observed macroevolutionary pattern, and this has implications for how we interpret the distribution of diversification rate estimates in extant clades. Collapse Key Words Diversification identifiability macroevolution phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
13	Distribution of hepatitis B virus genotypes and subgenotypes: A meta-analysis. Medicine (Baltimore) 2021;100:e27941. [PMID: 34918643 PMCID: PMC8678021 DOI: 10.1097/md.0000000000027941] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 11/04/2021] [Indexed: 02/07/2023] Open Abstract Hepatitis B virus (HBV) genotypes and subgenotypes have distinct geographical distributions and influence a number of clinical disease features and responses to treatment. There are many reports on the distribution of HBV genotypes, but great differences are present between studies. What's more, a meta-analysis of HBV genotype- and subgenotype-distribution by country is lacking.A comprehensive literature search was performed in PubMed and a systematic search of full-length HBV sequences and S gene sequences was conducted in the GenBank database. HBV genotypes were checked and subgenotypes were determined by phylogenetic comparison of full-length HBV sequences or S gene sequences. STATA 12.0 was used for the analysis for countries with multiple datasets. BEAST 2.5.2 was used for Bayesian phylogenetic analysis to infer the evolutionary time scales of HBV.This study includes 309 datasets from 110 countries, including 188 relevant studies, 58 full-length gene datasets, and 63 S gene datasets. The meta-analysis was performed on 274 datasets from 75 countries. The distribution of genotypes is more detailed than those described by previous studies. While the overall genotype distribution is similar to that reported in previous studies, some notable aspects were different. The main genotypes present in south-eastern Africa, North Africa, and West Africa are genotypes A, D, and E, respectively. Genotypes G and H are mainly distributed in Mexico. Genotype F is mainly distributed in central and South America, but genotypes A and D are also common in Brazil, Cuba, and Haiti.This study provides a more accurate description of the distribution of HBV genotypes and subgenotypes in different countries and suggests that the differences in genotype distribution may be related to ethnicity and human migration. Collapse Key Words distribution genotype hepatitis b virus meta-analysis phylogenetic trees subgenotype timetree Collapse MESH Headings Bayes Theorem DNA, Viral Genotype Hepatitis B/epidemiology Hepatitis B virus/genetics Hepatitis B virus/isolation & purification Humans Phylogeny Collapse Grants Collapse
14	Why extinction estimates from extant phylogenies are so often zero. Curr Biol 2021;31:3168-3173.e4. [PMID: 34019824 DOI: 10.1016/j.cub.2021.04.066] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/11/2021] [Accepted: 04/26/2021] [Indexed: 12/18/2022] Abstract Time-calibrated phylogenies of extant species ("extant timetrees") are widely used to estimate historical speciation and extinction rates by fitting stochastic birth-death models.¹ These approaches have long been controversial, as many phylogenetic studies report zero extinction in many taxa, contradicting the high extinction rates seen in the fossil record and the fact that the majority of species ever to have existed are now extinct.^2-9 To date, the causes of this discrepancy remain unresolved. Here, we provide a novel and simple explanation for these "zero-inflated" extinction estimates, based on the recent discovery that there exist many alternative "congruent" diversification scenarios that cannot be distinguished based solely on extant timetrees.¹⁰ Due to such congruencies, estimation methods tend to converge to some scenario congruent to (i.e., statistically indistinguishable from) the true diversification scenario, but not necessarily to the true diversification scenario itself. This congruent scenario may exhibit negative extinction rates, a biologically meaningless but mathematically feasible situation, in which case estimators will tend to stick to the boundary of zero extinction. Based on this explanation, we make multiple testable predictions, which we confirm using analyses of simulated trees and 121 empirical trees. In contrast to other proposed mechanisms for erroneous extinction rate estimates,⁵^,^11-14 our proposed mechanism specifically explains the zero inflation of previous extinction rate estimates in the absence of detectable model violations, even for large trees. Not only do our results likely resolve a long-standing mystery in phylogenetics, they demonstrate that model congruencies can have severe consequences in practice. Collapse Key Words birth-death model congruent extinction fossil record identifiability macroevolution phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
15	Conserved C-terminal motifs in odorant receptors instruct their cell surface expression and cAMP signaling. FASEB J 2021;35:e21274. [PMID: 33464692 DOI: 10.1096/fj.202000182rr] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 10/30/2020] [Accepted: 11/30/2020] [Indexed: 11/11/2022] Abstract The highly individual plasma membrane expression and cAMP signaling of odorant receptors have hampered their ligand assignment and functional characterization in test cell systems. Chaperones have been identified to support the cell surface expression of only a portion of odorant receptors, with mechanisms remaining unclear. The presence of amino acid motifs that might be responsible for odorant receptors' individual intracellular retention or cell surface expression, and thus, for cAMP signaling, is under debate: so far, no such protein motifs have been suggested. Here, we demonstrate the existence of highly conserved C-terminal amino acid motifs, which discriminate at least between class-I and class-II odorant receptors, with their numbers of motifs increasing during evolution, by comparing C-terminal protein sequences from 4808 receptors across eight species. Truncation experiments and mutation analysis of C-terminal motifs, largely overlapping with helix 8, revealed single amino acids and their combinations to have differential impact on the cell surface expression and on stimulus-dependent cAMP signaling of odorant receptors in NxG 108CC15 cells. Our results demonstrate class-specific and individual C-terminal motif equipment of odorant receptors, which instruct their functional expression in a test cell system, and in situ may regulate their individual cell surface expression and intracellular cAMP signaling. Collapse Key Words GPCR helix 8 intracellular transport luciferase assay phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
16	Sequencing and analysis of the complete mitochondrial genome of the lesser bandicoot rat (Bandicota bengalensis) from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2021;6:2063-2065. [PMID: 34212099 PMCID: PMC8218843 DOI: 10.1080/23802359.2021.1942273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022] Open Abstract The complete mitogenome sequence of the lesser bandicoot rat (Bandicota bengalensis Gray and Hardwicke, 1833) was determined using long PCR. The genome was 16,327 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, 1 origin of L strand replication and 1 control region. The overall base composition of the heavy strand is A (34.2%), C (24.9%), T (28.5%) and G (12.4%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ and Bayesian analyses yielded identical phylogenetic trees. This study verifies the evolutionary status of Bandicota bengalensis in Muridae at the molecular level. The mitochondrial genome would be a significant supplement for the Bandicota bengalensis genetic background. The two Bandicota species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Bandicota bengalensis Control region mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
17	Sequencing and analysis of the complete mitochondrial genome of Ochotona hyperborea from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2021;6:1805-1807. [PMID: 34124354 PMCID: PMC8174475 DOI: 10.1080/23802359.2021.1934137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 05/15/2021] [Indexed: 12/05/2022] Open Abstract The complete mitogenome sequence of Ochotona hyperborea was determined using long PCR. The genome was 17,063 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (31.1%), C (28.7%), T (26.3%), and G (13.9%). The base compositions present clearly the A-T skew, which is most obvious in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. This study verifies the evolutionary status of Ochotona hyperborea in Ochotonidae at the molecular level. The mitochondrial genome would be a significant supplement for the Ochotona hyperborea genetic background. The eight Ochotona species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Control region Ochotona hyperborea mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
18	Sequencing and analysis of the complete mitochondrial genome of Micromys erythrotis from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2021;6:1617-1620. [PMID: 34027072 PMCID: PMC8118456 DOI: 10.1080/23802359.2021.1926353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 05/02/2021] [Indexed: 11/04/2022] Open Abstract The complete mitogenome sequence of Micromys erythrotis was determined using long PCR. The genome was 16,238 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, 1 origin of L strand replication and 1 control region. The overall base composition of the heavy strand is A (33.7%), C (24.8%), T (29.1%) and G (12.4%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ and Bayesian analyses yielded identical phylogenetic trees. This study verifies the evolutionary status of Micromys erythrotis in Muridae at the molecular level. The mitochondrial genome would be a significant supplement for the Micromys erythrotis genetic background. Collapse Key Words Control region Micromys erythrotis mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
19	Parameter Identifiability for a Profile Mixture Model of Protein Evolution. J Comput Biol 2021;28:570-586. [PMID: 33960831 DOI: 10.1089/cmb.2020.0315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract A profile mixture (PM) model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend, in part, on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here, using algebraic methods, we show that a PM model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74. Collapse Key Words parameter identifiability phylogenetic trees profile mixture model Collapse MESH Headings Collapse Grants Collapse
20	Coevolutionary and Phylogenetic Analysis of Mimiviral Replication Machinery Suggest the Cellular Origin of Mimiviruses. Mol Biol Evol 2021;38:2014-2029. [PMID: 33570580 PMCID: PMC8097291 DOI: 10.1093/molbev/msab003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open Abstract Mimivirus is one of the most complex and largest viruses known. The origin and evolution of Mimivirus and other giant viruses have been a subject of intense study in the last two decades. The two prevailing hypotheses on the origin of Mimivirus and other viruses are the reduction hypothesis, which posits that viruses emerged from modern unicellular organisms; whereas the virus-first hypothesis proposes viruses as relics of precellular forms of life. In this study, to gain insights into the origin of Mimivirus, we have carried out extensive phylogenetic, correlation, and multidimensional scaling analyses of the putative proteins involved in the replication of its 1.2-Mb large genome. Correlation analysis and multidimensional scaling methods were validated using bacteriophage, bacteria, archaea, and eukaryotic replication proteins before applying to Mimivirus. We show that a large fraction of mimiviral replication proteins, including polymerase B, clamp, and clamp loaders are of eukaryotic origin and are coevolving. Although phylogenetic analysis places some components along the lineages of phage and bacteria, we show that all the replication-related genes have been homogenized and are under purifying selection. Collectively our analysis supports the idea that Mimivirus originated from a complex cellular ancestor. We hypothesize that Mimivirus has largely retained complex replication machinery reminiscent of its progenitor while losing most of the other genes related to processes such as metabolism and translation. Collapse Key Words DNA replication HGT LUCA LUCELLA MDS Mimivirus NCLDVs coevolution correlation analysis evolution evolutionary selection giant viruses phylogenetic phylogenetic trees purifying selection Collapse MESH Headings Biological Coevolution Gene Transfer, Horizontal Mimiviridae/genetics Multidimensional Scaling Analysis Phylogeny Selection, Genetic Viral Proteins/genetics Virus Replication/genetics Collapse Grants Department of Science and Technology IIT Bombay Research Fellowship Collapse
21	Antimicrobial Resistance, FlaA Sequencing, and Phylogenetic Analysis of Campylobacter Isolates from Broiler Chicken Flocks in Greece. Vet Sci 2021;8:vetsci8050068. [PMID: 33919370 PMCID: PMC8143292 DOI: 10.3390/vetsci8050068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/17/2021] [Accepted: 04/19/2021] [Indexed: 01/22/2023] Open Abstract Human campylobacteriosis caused by thermophilic Campylobacter species is the most commonly reported foodborne zoonosis. Consumption of contaminated poultry meat is regarded as the main source of human infection. This study was undertaken to determine the antimicrobial susceptibility and the molecular epidemiology of 205 Campylobacter isolates derived from Greek flocks slaughtered in three different slaughterhouses over a 14-month period. A total of 98.5% of the isolates were resistant to at least one antimicrobial agent. In terms of multidrug resistance, 11.7% of isolates were resistant to three or more groups of antimicrobials. Extremely high resistance to fluoroquinolones (89%), very high resistance to tetracycline (69%), and low resistance to macrolides (7%) were detected. FlaA sequencing was performed for the subtyping of 64 C. jejuni and 58 C. coli isolates. No prevalence of a specific flaA type was observed, indicating the genetic diversity of the isolates, while some flaA types were found to share similar antimicrobial resistance patterns. Phylogenetic trees were constructed using the neighbor-joining method. Seven clusters of the C. jejuni phylogenetic tree and three clusters of the C. coli tree were considered significant with bootstrap values >75%. Some isolates clustered together were originated from the same or adjacent farms, indicating transmission via personnel or shared equipment. These results are important and help further the understanding of the molecular epidemiology and antimicrobial resistance of Campylobacter spp. derived from poultry in Greece. Collapse Key Words Campylobacter spp. Greece antimicrobial resistance flaA typing phylogenetic trees poultry Collapse MESH Headings Collapse Grants Collapse
22	The complete chloroplast genome of a medical herb, Potentilla parvifolia Fisch. (Rosaceae), from Qinghai-Tibet Plateau in China. MITOCHONDRIAL DNA PART B-RESOURCES 2021;6:349-350. [PMID: 33659674 PMCID: PMC7872539 DOI: 10.1080/23802359.2020.1866447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Abstract Potentilla parvifolia Fisch. (Rosaceae) is one of the genuine medicinal materials in Qinghai-Tibet Plateau, China. Here we report the first chloroplast (cp) genome of P. parvifolia using Illumina NovaSeq 6000 platform. The length of its complete cp genome is 152,898 bp, containing four sub-regions; a large single copy region (LSC) of 84,160 bp and a small single copy region (SSC) of 18,128 bp are separated by a pair of inverted repeat regions (IRs) of 25,305bp. The complete cp genome of P. parvifolia contains 130 genes, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. The overall GC content of the cp genome is 37.2%. The phylogenetic analysis, based on 17 cp genomes, suggested that P. parvifolia is closely related to P. fruticosa L. and Fragaria species. Collapse Key Words Potentilla parvifolia Qinghai-Tibet Plateau Rosaceae chloroplast genome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
23	Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021;29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022] Abstract Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution. Collapse Key Words gene gain and loss metagenomics pangenome phylogenetic trees prokaryote genome evolution single-cell genomics Collapse MESH Headings Collapse Grants Collapse
24	In silico Phage Hunting: Bioinformatics Exercises to Identify and Explore Bacteriophage Genomes. Front Microbiol 2020;11:577634. [PMID: 33072043 PMCID: PMC7533560 DOI: 10.3389/fmicb.2020.577634] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 08/26/2020] [Indexed: 12/24/2022] Open Abstract Bioinformatics skills are increasingly relevant to research in most areas of the life sciences. The availability of genome sequences and large data sets provide unique opportunities to incorporate bioinformatics exercises into undergraduate microbiology courses. The goal of this project was to develop a teaching module to investigate the abundance and phylogenetic relationships amongst bacteriophages using a set of freely available bioinformatics tools. Computational identification and examination of bacteriophage genomes, followed by phylogenetic analyses, provides opportunities to incorporate core bioinformatics competencies in microbiology courses and enhance students’ bioinformatics skills. The first activity consisted of using PHASTER (PHAge Search Tool Enhanced Release), a bioinformatics tool that identifies bacteriophage sequences within bacterial chromosomes. Further computational analyses were conducted to align bacteriophage proteins, genomes, and determine phylogenetic relationships amongst these viruses. This part of the project was carried out using the Clustal omega, MAFFT (Multiple Alignment using Fast Fourier Transform), and Interactive Tree of Life (iTOL) programs for sequence alignments and phylogenetic analyses. The laboratory activities were field tested in undergraduate directed research, and microbiology classes. The learning objectives were assessed by comparing the scores of pre and post-tests and grading final presentations. Post-tests were higher than pre-test scores at or below p = 0.002. The data suggest in silico phage hunting improves students’ ability to search databases, interpret phylogenetic trees, and use bioinformatics tools to examine genome structure. This activity allows instructors to integrate key bioinformatic concepts in their curriculums and gives students the opportunity to participate in a research-directed learning environment in the classroom. Collapse Key Words bacteriophages bioinformatics experimental design (study designs) genomes phylogenetic trees research project Collapse MESH Headings Collapse Grants Collapse
25	Genetic Diversity of Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) From 1996 to 2017 in China. Front Microbiol 2020;11:618. [PMID: 32390968 PMCID: PMC7193098 DOI: 10.3389/fmicb.2020.00618] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/19/2020] [Indexed: 11/13/2022] Open Abstract Porcine reproductive and respiratory syndrome (PRRS) is one of the most devastating diseases of the global swine industry. The causative agent porcine reproductive and respiratory syndrome virus (PRRSV) was first isolated in China in 1996 and has evolved quickly during the last two decades. To fully understand virus diversity, epidemic situation in the field, and make future predictions, a total of 365 PRRSV strains were used for evolution and genome analysis in which 353 strains were isolated from mainland China. The results showed that high diversity was found among PRRSV isolates. Total PRRSV isolates could be divided into eight subgroups. Among these subgroups strains, Original HP-PRRSV, NADC30-like, and Intermediate PRRSV were the major epidemic PRRSV strains circling in the field and would play a major role in PRRS epidemic in the future. Deletions, insertions, and recombinations have occurred frequently in the PRRSV genome. Deletions were the main driving force of viral evolution before 2006 and may also contribute further to the virus' evolution in a relatively closed or low strain diversity circumstance. The recombinant strains could be divided into three groups: the Inner group, Extensional group, and Propagating group. The evolutionary directions of the isolates in the Extensional and Propagating groups have changed, and the routes of recombination in the Propagating group were analyzed and sorted into three types. The increases in recombinant strains and high rates of recombination in recent years indicate that recombination has played a very important role in the virus' evolution. Isolates, which incorporate the advantages of their parental strains, will influence PRRSV evolution and make adverse effects on PRRS control in the future. Collapse Key Words evolution genetic diversity phylogenetic trees porcine reproductive and respiratory syndrome virus (PRRSV) recombination Collapse MESH Headings Collapse Grants Collapse
26	An extended model for phylogenetic maximum likelihood based on discrete morphological characters. Stat Appl Genet Mol Biol 2020;19:/j/sagmb.ahead-of-print/sagmb-2019-0029/sagmb-2019-0029.xml. [PMID: 32078576 DOI: 10.1515/sagmb-2019-0029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Maximum likelihood is a common method of estimating a phylogenetic tree based on a set of genetic data. However, models of evolution for certain types of genetic data are highly flawed in their specification, and this misspecification can have an adverse impact on phylogenetic inference. Our attention here is focused on extending an existing class of models for estimating phylogenetic trees from discrete morphological characters. The main advance of this work is a model that allows unequal equilibrium frequencies in the estimation of phylogenetic trees from discrete morphological character data using likelihood methods. Possible extensions of the proposed model will also be discussed. Collapse Key Words Markov model maximum likelihood phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
27	Sequencing and analysis of the complete mitochondrial genome of Blarinella griselda from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2020;5:965-967. [PMID: 33366829 PMCID: PMC7748876 DOI: 10.1080/23802359.2020.1715305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Accepted: 01/07/2020] [Indexed: 11/05/2022] Open Abstract The complete mitogenome sequence of Blarinella griselda was determined using long PCR. The genome was 16,947 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, 1 origin of L strand replication and 1 control region. The overall base composition of the heavy strand is A (33.1%), C (22.6%), T (31.6%) and G (12.7%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ and Bayesian analyses yielded identical phylogenetic trees. This study verifies the evolutionary status of Blarinella griselda in Soricidae at the molecular level. The mitochondrial genome would be a significant supplement for the Blarinella griselda genetic background. The three Blarinella species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Blarinella griselda Control region mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
28	The Reproducibility of an Inferred Tree and the Diploidization of Gene Segregation after Genome Duplication. Genome Biol Evol 2020;12:3792-3796. [PMID: 31950994 PMCID: PMC7012300 DOI: 10.1093/gbe/evz272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2019] [Indexed: 12/03/2022] Open Abstract We previously introduced a numerical quantity called the stability (Ps) of an inferred tree and showed that for the tree to be reliable this stability as well as the reliability of the tree, which is usually computed as the bootstrap probability (Pb), must be high. However, if genome duplication occurs in a species, a gene family of the genome also duplicates, and for this reason alone some Ps values can be high in a tree of the duplicated gene families. In addition, the topology of the duplicated gene family can be similar to that of the original gene family if such gene families are identifiable. After genome duplication, however, the gene families are often partially deleted or partially duplicated, and the duplicated gene family may not show the same topology as that of the original family. It is therefore necessary to compute the similarity of the topologies of the duplicated and the original gene families. In this paper, we introduce another quantity called the reproducibility (Pr) for measuring the similarity of the two gene families. To show how to compute the Pr values, we first compute the Pb and Ps values for each of the MHC class II α and β chain gene families, which were apparently generated by genome duplication. We then compute the Pr values for the MHC class II α and β chain gene families. The Pr values for the α and β chain gene families are now low, and this suggests that the diploidization of gene segregation has occurred after the genome duplication. Currently higher animals, defined as animals with complex phenotypic characters, generally have a higher genome size, and this increase in genome size appears to have been caused by genome duplication and diploidization of gene segregation after genome duplication. Collapse Key Words MHC class II α and β chain gene families computer program RESTA diploidization gene segregation genome duplication phylogenetic trees reliability reproducibility stability Collapse MESH Headings Collapse Grants Collapse
29	Unique k-mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling. Int J Mol Sci 2020;21:ijms21030944. [PMID: 32023871 PMCID: PMC7037511 DOI: 10.3390/ijms21030944] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 01/21/2020] [Accepted: 01/28/2020] [Indexed: 02/07/2023] Open Abstract The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn's disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific "barcodes" for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis. Collapse Key Words alignment-free algorithms bacterial genomes genome barcodes human microbiome k-mers metagenomes phylogenetic trees phylotyping taxonomic profiling Collapse MESH Headings Algorithms Case-Control Studies Computational Biology Crohn Disease/genetics Crohn Disease/microbiology DNA Barcoding, Taxonomic/methods Escherichia coli/classification Escherichia coli/genetics Escherichia coli/isolation & purification Escherichia coli Infections/genetics Escherichia coli Infections/microbiology Genes, Bacterial Genome, Bacterial Humans Metagenome Microbiota Collapse Grants Collapse
30	The complete chloroplast genome of a medical herb, Halenia elliptica D.Don (Gentianaceae), from Qinghai-Tibet Plateau in China. MITOCHONDRIAL DNA PART B-RESOURCES 2019;4:3381-3382. [PMID: 33366003 PMCID: PMC7707369 DOI: 10.1080/23802359.2019.1674202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Abstract Halenia elliptica D.Don (Gentianaceae) is one of the genuine medicinal species in Qinghai-Tibet Plateau, China. Here we report the first chloroplast (cp) genome of H. elliptica using Illumina HiSeq X Ten platform. The length of its complete cp genome is 153,341 bp, containing four sub-regions; a large single-copy region (LSC) of 82,811 bp and a small single-copy region (SSC) of 18,278 bp, which are separated by a pair of inverted repeat regions (IRs) of 26,126bp each. The complete cp genome of H. elliptica contains 129 genes, including 84 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. The overall GC content of the cp genome is 38.1%. The phylogenetic analysis, based on 15 cp genomes, suggested that H. elliptica is closely related to Halenia corniculata (L.) Cornaz and Swertia species. Collapse Key Words Gentianaceae Halenia elliptica Qinghai-Tibet Plateau chloroplast genome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
31	Genomic Analyses Identify Novel Molecular Signatures Specific for the Caenorhabditis and other Nematode Taxa Providing Novel Means for Genetic and Biochemical Studies. Genes (Basel) 2019;10:genes10100739. [PMID: 31554175 PMCID: PMC6826867 DOI: 10.3390/genes10100739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 09/06/2019] [Accepted: 09/17/2019] [Indexed: 11/20/2022] Open Abstract The phylum Nematoda encompasses numerous free-living as well as parasitic members, including the widely used animal model Caenorhabditis elegans, with significant impact on human health, agriculture, and environment. In view of the importance of nematodes, it is of much interest to identify novel molecular characteristics that are distinctive features of this phylum, or specific taxonomic groups/clades within it, thereby providing innovative means for diagnostics as well as genetic and biochemical studies. Using genome sequences for 52 available nematodes, a robust phylogenetic tree was constructed based on concatenated sequences of 17 conserved proteins. The branching of species in this tree provides important insights into the evolutionary relationships among the studied nematode species. In parallel, detailed comparative analyses on protein sequences from nematodes (Caenorhabditis) species reported here have identified 52 novel molecular signatures (or synapomorphies) consisting of conserved signature indels (CSIs) in different proteins, which are uniquely shared by the homologs from either all genome-sequenced Caenorhabditis species or a number of higher taxonomic clades of nematodes encompassing this genus. Of these molecular signatures, 39 CSIs in proteins involved in diverse functions are uniquely present in all Caenorhabditis species providing reliable means for distinguishing this group of nematodes in molecular terms. The remainder of the CSIs are specific for a number of higher clades of nematodes and offer important insights into the evolutionary relationships among these species. The structural locations of some of the nematodes-specific CSIs were also mapped in the structural models of the corresponding proteins. All of the studied CSIs are localized within the surface-exposed loops of the proteins suggesting that they may potentially be involved in mediating novel protein–protein or protein–ligand interactions, which are specific for these groups of nematodes. The identified CSIs, due to their exclusivity for the indicated groups, provide reliable means for the identification of species within these nematodes groups in molecular terms. Further, due to the predicted roles of these CSIs in cellular functions, they provide important tools for genetic and biochemical studies in Caenorhabditis and other nematodes. Collapse Key Words Caenorhabditis elegans Chromadorea conserved signature indels evolutionary relationships among nematodes genome sequences molecular markers (synapomorphies) phylogenetic trees structural analysis of Caenorhabditis/nematodes-specific indels Collapse MESH Headings Collapse Grants Collapse
32	Sequencing and analysis of the complete mitochondrial genome of Crocidura tanakae from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2019;4:2791-2793. [PMID: 33365730 PMCID: PMC7706458 DOI: 10.1080/23802359.2019.1659118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 07/31/2019] [Indexed: 11/05/2022] Open Abstract The complete mitogenome sequence of Crocidura tanakae was determined using long PCR. The genome was 16,969 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, 1 origin of L strand replication and 1 control region. The overall base composition of the heavy strand is A (32.5%), C (22.3%), T (31.9%), and G (13.3%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The five Crocidura species formed a monophyletic group with the high bootstrap value (100%) in all examinations. This study verifies the evolutionary status of C. tanakae in Soricidae at the molecular level. The mitochondrial genome would be a significant supplement for the C. tanakae genetic background. Collapse Key Words Control region Crocidura tanakae mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
33	Near-Full-Length Genetic Characterization of a Novel HIV-1 Unique Recombinant with Similarities to A1, CRF01_AE, and CRFO2_AG Viruses in Yaoundé, Cameroon. AIDS Res Hum Retroviruses 2019;35:762-768. [PMID: 30860392 DOI: 10.1089/aid.2019.0042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open Abstract Variations in the HIV genome influence HIV/AIDS epidemiology. We report here a novel HIV-1 unique recombinant form (URF) isolated from an HIV-infected female (NACMR092) in Cameroon, based on the analyses of near-full-length viral genome (partial gag, full-length pol, env, tat, rev, vif, vpr, vpu, and nef genes, and partial 3'-long terminal repeat). Phylogeny, recombination breakpoints, and recombination map analyses showed that NACMR092 was infected with a mosaic URF that had eight breakpoints (two in gag, one in pol, one in vpr, two in env, and two in the nef regions), nine subgenomic regions, and included fragments that had important similarities with HIV-1 subtypes A1, CRF02_AG, and CRF01_AE. This novel mosaic URF underscores complex recombination events occurring between HIV-1 subtypes circulating in Cameroon. Continued monitoring and detection of such recombinants and accurate classification of HIV genotype is important for tracking viral molecular epidemiology and antigenic diversity. Collapse Key Words Cameroon HIV-1 NACMR092 near-full-length genome sequence phylogenetic trees recombination breakpoints Collapse MESH Headings Collapse Grants Collapse
34	The complete mitochondrial genome of the striped hamster (Cricetulus barabesis) China and its phylogenetic analysis. Mitochondrial DNA B Resour 2019;4:2593-2595. [PMID: 33365640 PMCID: PMC7706561 DOI: 10.1080/23802359.2019.1641440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/22/2019] [Indexed: 11/07/2022] Open Abstract The complete mitogenome sequence of the striped hamster was determined using long PCR. The genome was 16,282 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (33.7%), C (22.8%), T (30.5%), and G (13.0%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. Results of phylogenetic analysis showed that Cricetulus had close relationship with Meriones. This study verifies the evolutionary status of the striped hamster in Cricetulus at the molecular level. The mitochondrial genome would be a significant supplement for the striped hamster genetic background. Results of phylogenetic analysis showed that the striped hamster had close relationship with C. griseus in Cricetulus. Collapse Key Words Control region Cricetulus barabesis mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
35	Sequencing and analysis of the complete mitochondrial genome of Chodsigoa hoffmanni from China and its phylogenetic analysis. Mitochondrial DNA B Resour 2019;4:2438-2440. [PMID: 33365576 PMCID: PMC7687604 DOI: 10.1080/23802359.2019.1637294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Accepted: 06/22/2019] [Indexed: 11/22/2022] Open Abstract The complete mitogenome sequence of Chodsigoa hoffmanni was determined using long PCR. The genome was 17,138 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (32.8%), C (24.4%), T (29.8%), and G (13.0%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. Chodsigoa hoffmanni is the first species to have been reported on the mitochondrial genome in Chodsigoa genus. This study verifies the evolutionary status of C. hoffmanni in Soricidae at the molecular level. The mitochondrial genome would be a significant supplement for the C. hoffmanni genetic background. Collapse Key Words Chodsigoa hoffmanni Control region mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
36	[Molecular identification of cucumber mosaic virus in woad (Isatis tinctoria) with mosaic disease in Beijing]. ZHONGGUO ZHONG YAO ZA ZHI = ZHONGGUO ZHONGYAO ZAZHI = CHINA JOURNAL OF CHINESE MATERIA MEDICA 2018;43:2242-2245. [PMID: 29945374 DOI: 10.19540/j.cnki.cjcmm.20180329.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Indexed: 11/18/2022] Abstract To detect possible pathogenic virus(es) in woad (Isatis tinctoria) cultivated at Institute of Medicinal Plant Development in Beijing, reverse transcription(RT)-PCR was performed using total RNA of symptomatic woad leaves with primers for poty-, polero-, tobamovirus, broad bean wilt virus 2(BBWV2) and cucumber mosaic virus (CMV). A 657 bp fragment was amplified from symptomatic woad using CMV primers. Sequencing and BLAST analysis indicated that this fragment shared 99% nucleotide identity and 100% amino acid identity with CMV-Vi isolate. The isolate was named CMV-Isatis tinctorial (CMV-It). Phylogenetic analysis based on nucleotide sequences of CP genes showed that CMV-It clustered with CMV-K and belonged to subgroup I. To our knowledge, this is first identification of CMV in woad by RT-PCR and the CP gene was analyzed. This work provided data for research and control of woad mosaic disease. Collapse Key Words Isatis tinctoria RT-PCR detection coat protein gene cucumber mosaic virus phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
37	Complete Genome Characterization of the 2017 Dengue Outbreak in Xishuangbanna, a Border City of China, Burma and Laos. Front Cell Infect Microbiol 2018;8:148. [PMID: 29868504 PMCID: PMC5951998 DOI: 10.3389/fcimb.2018.00148] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Accepted: 04/20/2018] [Indexed: 11/13/2022] Open Abstract A dengue outbreak abruptly occurred at the border of China, Myanmar, and Laos in June 2017. By November 3rd 2017, 1184 infected individuals were confirmed as NS1-positivein Xishuangbanna, a city located at the border. To verify the causative agent, complete genome information was obtained through PCR and sequencing based on the viral RNAs extracted from patient samples. Phylogenetic trees were constructed by the maximum likelihood method (MEGA 6.0). Nucleotide and amino acid substitutions were analyzed by BioEdit, followed by RNA secondary structure prediction of untranslated regions (UTRs) and protein secondary structure prediction in coding sequences (CDSs). Strains YN2, YN17741, and YN176272 were isolated from local residents. Stains MY21 and MY22 were isolated from Burmese travelers. The complete genome sequences of the five isolates were 10,735 nucleotides in length. Phylogenetic analysis classified all five isolates as genotype I of DENV-1, while isolates of local residents and Burmese travelers belonged to different branches. The three locally isolates were most similar to the Dongguan strain in 2011, and the other two isolates from Burmese travelers were most similar to the Laos strain in 2008. Twenty-four amino acid substitutions were important in eight evolutionary tree branches. Comparison with DENV-1SS revealed 658 base substitutions in the local isolates, except for two mutations exclusive to YN17741, resulting in 87 synonymous mutations. Compared with the local isolates, 52 amino acid mutations occurred in the CDS of two isolates from Burmese travelers. Comparing MY21 with MY22, 17 amino acid mutations were observed, all these mutations occurred in the CDS of non-structured proteins (two in NS1, 10 in NS2, two in NS3, three in NS5). Secondary structure prediction revealed 46 changes in the potential nucleotide and protein binding sites of the CDSs in local isolates. RNA secondary structure prediction also showed base changes in the 3′UTR of local isolates, leading to two significant changes in the RNA secondary structure. To our knowledge, this study is the first complete genome analysis of isolates from the 2017 dengue outbreak that occurred at the border areas of China, Burma, and Laos. Collapse Key Words RNA secondary structure complete genome dengue outbreak phylogenetic trees protein secondary structure Collapse MESH Headings Collapse Grants Collapse
38	Phylogeny and polymorphism in the long control regions E6, E7, and L1 of HPV Type 56 in women from southwest China. Mol Med Rep 2018;17:7131-7141. [PMID: 29568922 PMCID: PMC5928666 DOI: 10.3892/mmr.2018.8743] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Accepted: 07/27/2017] [Indexed: 12/31/2022] Open Abstract Globally, human papillomavirus (HPV)‑56 accounts for a small proportion of all high‑risk HPV types; however, HPV‑56 is detected at a higher rate in Asia, particularly in southwest China. The present study analyzed polymorphisms, intratypic variants, and genetic variability in the long control regions (LCR), E6, E7, and L1 of HPV‑56 (n=75). The LCRs, E6, E7 and L1 were sequenced using a polymerase chain reaction and the sequences were submitted to GenBank. Maximum‑likelihood trees were constructed using Kimura's two‑parameter model, followed by secondary structure analysis and protein damaging prediction. Additionally, in order to assess the effect of variations in the LCR on putative binding sites for cellular proteins, MATCH server was used. Finally, the selection pressures of the E6‑E7 and L1 genes were estimated. A total of 18 point substitutions, a 42‑bp deletion and a 19‑bp deletion of LCR were identified. Some of those mutations are embedded in the putative binding sites for transcription factors. 18 single nucleotide changes occurred in the E6‑E7 sequence, 11/18 were non‑synonymous substitutions and 7/18 were synonymous mutations. A total 24 single nucleotide changes were identified in the L1 sequence, 6/24 being non‑synonymous mutations and 18/24 synonymous mutations. Selective pressure analysis predicted that the majority of mutations of HPV‑56 E6, E7 and L1 were of positive selection. The phylogenetic tree demonstrated that the isolates distributed in two lineages. Data on the prevalence and genetic variation of HPV‑56 types in southwest China may aid future studies on viral molecular mechanisms and contribute to future investigations of diagnostic probes and therapeutic vaccines. Collapse Key Words human papillomavirus-56 gene polymorphism phylogenetic trees selection pressures Collapse MESH Headings Adolescent Adult Aged Base Sequence Capsid Proteins/genetics China/epidemiology DNA, Viral/genetics Female Humans Middle Aged Oncogene Proteins, Viral/genetics Papillomaviridae/genetics Papillomavirus Infections/epidemiology Papillomavirus Infections/virology Phylogeny Polymorphism, Genetic Young Adult Collapse Grants Collapse
39	Sequencing and analysis of the complete mitochondrial genome of the taiga shrew (Sorex isodon) from China. Mitochondrial DNA B Resour 2018;3:466-468. [PMID: 33474206 PMCID: PMC7800816 DOI: 10.1080/23802359.2018.1462113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Accepted: 04/03/2018] [Indexed: 10/26/2022] Open Abstract The complete mitogenome sequence of the taiga shrew (Sorex isodon) was determined using long PCR. The genome was 17,008 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication and one control region. The overall base composition of the heavy strand is A (32.5%), C (24.5%), T (28.5%), and G (13.5%). The base compositions present clearly the A-T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of the taiga shrew. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The eight Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Control region mitogenome phylogenetic trees the taiga shrew Collapse MESH Headings Collapse Grants Collapse
40	Split Scores: A Tool to Quantify Phylogenetic Signal in Genome-Scale Data. Syst Biol 2018;66:620-636. [PMID: 28123114 DOI: 10.1093/sysbio/syw103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 10/28/2016] [Indexed: 11/14/2022] Open Abstract Detecting variation in the evolutionary process along chromosomes is increasingly important as whole-genome data become more widely available. For example, factors such as incomplete lineage sorting, horizontal gene transfer, and chromosomal inversion are expected to result in changes in the underlying gene trees along a chromosome, while changes in selective pressure and mutational rates for different genomic regions may lead to shifts in the underlying mutational process. We propose the split score as a general method for quantifying support for a particular phylogenetic relationship within a genomic data set. Because the split score is based on algebraic properties of a matrix of site pattern frequencies, it can be rapidly computed, even for data sets that are large in the number of taxa and/or in the length of the alignment, providing an advantage over other methods (e.g., maximum likelihood) that are often used to assess such support. Using simulation, we explore the properties of the split score, including its dependence on sequence length, branch length, size of a split and its ability to detect true splits in the underlying tree. Using a sliding window analysis, we show that split scores can be used to detect changes in the underlying evolutionary process for genome-scale data from primates, mosquitoes, and viruses in a computationally efficient manner. Computation of the split score has been implemented in the software package SplitSup. Collapse Key Words General Markov model genome-scale data analysis matrix flattenings phylogenetic trees singular value decomposition split scores Collapse MESH Headings Collapse Grants Collapse
41	A Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits. Mol Biol Evol 2017;34:1535-1542. [PMID: 28369572 PMCID: PMC5435079 DOI: 10.1093/molbev/msx055] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Most empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that arise when displaying branch values on trees after rerooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when rerooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed ten tree viewers and ten bioinformatics toolkits that can display and reroot trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements in eight tools. We suggest tools should provide options that explicitly force users to define the semantics of node labels. Collapse Key Words Newick format bioinformatics toolkits branch labels branch support values bugs phylogenetic trees software tree viewers tree visualization Collapse MESH Headings Collapse Grants Collapse
42	Sequencing and analysis of the complete mitochondrial genome of the Ussuri shrew (Sorex mirabilis) from China. MITOCHONDRIAL DNA PART B-RESOURCES 2017;2:645-647. [PMID: 33473932 PMCID: PMC7800820 DOI: 10.1080/23802359.2017.1375873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract The complete mitogenome sequence of the Ussuri shrew (Sorex mirabilis) was determined using long PCR. The genome was 17,315 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (32.6%), C (25.2%), T (28.8%), and G (13.4%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of the Ussuri shrew. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The five Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Control region mitogenome phylogenetic trees the Ussuri shrew Collapse MESH Headings Collapse Grants Collapse
43	Sequencing and analysis of the complete mitochondrial genome of the slender shrew (Sorex gracillimus) from China. MITOCHONDRIAL DNA PART B-RESOURCES 2017;2:642-644. [PMID: 33473931 PMCID: PMC7799885 DOI: 10.1080/23802359.2017.1375871] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract The complete mitogenome sequence of the slender shrew (Sorex gracillimus) was determined using long PCR. The genome was 17,002 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (32.5%), C (25.5%), T (28.5%), and G (13.5%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of the slender shrew. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The five Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Control region mitogenome phylogenetic trees the slender shrew Collapse MESH Headings Collapse Grants Collapse
44	Molecular evolution of acetohydroxyacid synthase in bacteria. Microbiologyopen 2017;6. [PMID: 28782269 PMCID: PMC5727371 DOI: 10.1002/mbo3.524] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 06/21/2017] [Accepted: 06/29/2017] [Indexed: 11/16/2022] Open Abstract Acetohydroxyacid synthase (AHAS) is the key enzyme in the biosynthetic pathways of branched chain amino acids in bacteria. Since it does not exist in animal and plant cells, AHAS is an attractive target for developing antimicrobials and herbicides. In some bacteria, there is a single copy of AHAS, while in others there are multiple copies. Therefore, it is necessary to investigate the origin and evolutionary pathway of various AHASs in bacteria. In this study, all the available protein sequences of AHAS in bacteria were investigated, and an evolutionary model of AHAS in bacteria is proposed, according to gene structure, organization and phylogeny. Multiple copies of AHAS in some bacteria might be evolved from the single copy of AHAS, the ancestor. Gene duplication, domain deletion and horizontal gene transfer might occur during the evolution of this enzyme. The results show the biological significance of AHAS, help to understand the functions of various AHASs in bacteria, and would be useful for developing industrial production strains of branched chain amino acids or novel antimicrobials. Collapse Key Words AHAS BCAA biosynthetic pathway acetohydroxyacid synthase molecular evolution phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
45	Sequencing and analysis of the complete mitochondrial genome of the masked shrew (Sorex caecutiens) from China. MITOCHONDRIAL DNA PART B-RESOURCES 2017;2:486-488. [PMID: 33473872 PMCID: PMC7799685 DOI: 10.1080/23802359.2017.1361354] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Abstract The complete mitogenome sequence of the masked shrew (Sorex caecutiens) was determined using long PCR. The genome was 17,096 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (32.9%), C (24.5%), T (29.3%), and G (13.3%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of the masked shrew. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The five Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Control region masked shrew mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
46	Sequencing and analysis of the complete mitochondrial genome of flat-skulled shrew (Sorex roboratus) from China. MITOCHONDRIAL DNA PART B-RESOURCES 2017;2:369-371. [PMID: 33473831 PMCID: PMC7800013 DOI: 10.1080/23802359.2017.1334517] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Abstract The complete mitogenome sequence of flat-skulled shrew (Sorex roboratus) was determined using long PCR. The genome was 17,153 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, 1 origin of L strand replication and 1 control region. The overall base composition of the heavy strand is A (33.1%), C (24.4%), T (29.4%), and G (13.1%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of flat-skulled shrew. Mitochondrial genome analyses based on MP, ML, NJ and Bayesian analyses yielded identical phylogenetic trees. The five Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations. Collapse Key Words Control region flat-skulled shrew mitogenome phylogenetic trees Collapse MESH Headings Collapse Grants Collapse
47	Predicting the Evolutionary Variability of the Influenza A Virus. Acta Naturae 2017;9:48-54. [PMID: 29104775 PMCID: PMC5662273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Indexed: 11/12/2022] Open Abstract The influenza A virus remains one of the most common and dangerous human health concerns due to its rapid evolutionary dynamics. Since the evolutionary changes of influenza A viruses can be traced in real time, the last decade has seen a surge in research on influenza A viruses due to an increase in experimental data (selection of escape mutants followed by examination of their phenotypic characteristics and generation of viruses with desired mutations using reverse genetics). Moreover, the advances in our understanding are also attributable to the development of new computational methods based on a phylogenetic analysis of influenza virus strains and mathematical (integro-differential equations, statistical methods, probability-theory-based methods) and simulation modeling. Continuously evolving highly pathogenic influenza A viruses are a serious health concern which necessitates a coupling of theoretical and experimental approaches to predict the evolutionary trends of the influenza A virus, with a focus on the H5 subtype. Collapse Key Words Influenza A virus computational modeling computational tools escape mutants phenotypic characteristics phylogenetic trees reverse genetics Collapse MESH Headings Collapse Grants Collapse
48	THE EFFECT OF BIASED INCLUSION OF TAXA ON THE CORRELATION BETWEEN DISCRETE CHARACTERS IN PHYLOGENETIC TREES. Evolution 2017;47:1182-1191. [PMID: 28564293 DOI: 10.1111/j.1558-5646.1993.tb02145.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/1992] [Accepted: 11/11/1992] [Indexed: 11/30/2022] Abstract In a published paper, a method for testing the correlation between two discrete characters was presented and applied to test whether in butterfly larvae origins of gregariousness are concentrated to lineages with aposematic coloration. The relationship was found to be nonsignificant. However, the butterfly data on which the test was applied had been compiled in another study to investigate evolutionary sequences and was biased, because there was an overrepresentation of aposematic, as compared to cryptic, branches in the sample. In the paper presented here, aposematic and cryptic clades of the original phylogeny were resolved to the same degree, and the resulting set of branches may be regarded as unbiased with respect to the hypothesis being tested. A method for testing the contingency of states in two characters was then applied to the new data set, resulting in a highly significant relationship between origins of gregariousness and aposematic coloration. I argue that when using statistical methods on phylogenetic data, it is crucial to resolve various parts of the phylogeny to the same comparable systematic unit in order not to get a distorted sample of taxa/branches. Collapse Key Words Aposematic coloration biased samples butterfly larvae correlation discrete characters gregariousness inclusion of taxa phylogenetic trees tree resolution Collapse MESH Headings Collapse Grants Collapse
49	PATTERNS IN PHYLOGENETIC TREE BALANCE WITH VARIABLE AND EVOLVING SPECIATION RATES. Evolution 2017;50:2141-2148. [PMID: 28565665 DOI: 10.1111/j.1558-5646.1996.tb03604.x] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/1996] [Accepted: 04/17/1996] [Indexed: 11/30/2022] Abstract Aspects of phylogenetic tree shape, and in particular tree balance, provide clues to the workings of the macroevolutionary process. I use a simulation approach to explore patterns in tree balance for several models of the evolutionary process under which speciation rates vary through the history of diversifying clades. I demonstrate that when speciation rates depend on an evolving trait of individuals, and are therefore "heritable" along evolutionary lineages, the resulting phylogenies become imbalanced. However, imbalance also results from some (but not all) models of "nonheritable" speciation rate variation. The degree of imbalance increases with the magnitude of speciation rate variation, and then for gradual evolution (but not punctuated equilibria) reaches an asymptote short of the theoretical maximum. Very high levels of rate variation are required to produce imbalance matching that found in real data (estimated phylogenies from the systematic literature). I discuss implications of the simulation results for our understanding of macroevolution. Collapse Key Words Macroevolution phylogenetic trees punctuated equilibria speciation rates tree balance tree topology Collapse MESH Headings Collapse Grants Collapse
50	PHYLOGENETIC ESTIMATION OF PLASMID EXCHANGE IN BACTERIA. Evolution 2017;46:641-656. [PMID: 28568654 DOI: 10.1111/j.1558-5646.1992.tb02072.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/1990] [Accepted: 10/09/1991] [Indexed: 11/29/2022] Abstract The existence of differential horizontal gene transfer may be assessed by comparing the phylogenetic trees derived from two different genes. We use this concept to estimate quantitatively the amount of plasmid exchange that has occurred in a bacterial population. By means of computer simulations we studied the effect of gene transfer on the topological distortion between two phylogenetic trees: one obtained from an euchromosomal gene and another from a plasmid-borne sequence, which may be subjected to horizontal transfer. The basic assumptions of our simulations were (a) that plasmid exchange had occurred recently (after the last population split); and (b) that either the amount of chromosomal horizontal exchange was negligible or that it was only a fraction of the amount of plasmid exchange in which case we will be estimating relative amounts of plasmid transfer. We found that the topological difference between two such trees is a function of the number of plasmid exchange events that have occurred. It can be explained by a logistic model that relates the average distortion index between two trees (d_T ) to the number of transfer events (x). The behavior remains the same under different conditions that were tested (symmetry of the topology, number of taxa in the tree, effect of reconstruction errors, mutation after plasmid transfer). We have also tried our method on empirical data from the literature and estimated the amount of gene transfer that may have occurred among Sym plasmids in agricultural field populations of Rhizobium leguminosarum biovar phaseoli. We found that between 15.77 to 29.98% of all genetic types in these populations have been either the source or the target of a plasmid transfer event. When the comparisons were made among trees derived exclusively from plasmid probes this value dropped to 2.00%. Phylogenetic trees derived from symbiotic and nonsymbiotic sequences were also used to infer the number of gene transfer events among 11 isolates from R. galegae. The estimated number of transfer events of symbiotic sequences was 10.515 (although we do not know out of how many genetic types). We concluded that intraspecific transfer of symbiotic sequences is widespread in these two species of the genus Rhizobium. Collapse Key Words Bacterial evolution Rhizobium. computer simulation phylogenetic trees plasmid exchange Collapse MESH Headings Collapse Grants Collapse