1
|
Kang Q, Schardl CL, Moore N, Yoshida R. CURatio: Genome-wide phylogenomic analysis method using ratios of total branch lengths. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 17:10.1109/TCBB.2018.2878564. [PMID: 30387738 PMCID: PMC7372714 DOI: 10.1109/tcbb.2018.2878564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Evolutionary hypotheses provide important underpinnings of biological and medical sciences, and comprehensive, genome-wide understanding of evolutionary relationships among organisms are needed to test and refine such hypotheses. Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matching topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic material in prokaryotes. Nevertheless, many genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in poly-dimensional "tree space". Unusual evolutionary histories or effects of selection may result in "outlier" genes with phylogenies that fall outside the main distribution(s) of trees in tree space. We present a new phylogenomic method, CURatio, which uses ratios of total branch lengths in gene trees to help identify phylogenetic outliers in a given set of ortholog groups from multiple genomes. An advantage of CURatio over other methods is that genes absent from and/or duplicated in some genomes can be included in the analysis. We conducted a simulation study under the coalescent model, and showed that, given sufficient species depth and topological difference, these ratios are significantly higher for the "outlier" gene phylogenies. Also, we applied CURatio to a set of annotated genomes of the fungal family, Clavicipitaceae, and identified alkaloid biosynthesis genes as outliers, probably due to a history of duplication and loss. The source code is available at https://github.com/QiwenKang/CURatio, and the empirical data set on Clavicipitaceae and simulated data set are available at Mendeley https://data.mendeley.com/datasets/mrxts7wjrr/1.
Collapse
|
2
|
Rácz A, Andrić F, Bajusz D, Héberger K. Binary similarity measures for fingerprint analysis of qualitative metabolomic profiles. Metabolomics 2018; 14:29. [PMID: 29568246 PMCID: PMC5846857 DOI: 10.1007/s11306-018-1327-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 01/18/2018] [Indexed: 11/15/2022]
Abstract
INTRODUCTION Contemporary metabolomic fingerprinting is based on multiple spectrometric and chromatographic signals, used either alone or combined with structural and chemical information of metabolic markers at the qualitative and semiquantitative level. However, signal shifting, convolution, and matrix effects may compromise metabolomic patterns. Recent increase in the use of qualitative metabolomic data, described by the presence (1) or absence (0) of particular metabolites, demonstrates great potential in the field of metabolomic profiling and fingerprint analysis. OBJECTIVES The aim of this study is a comprehensive evaluation of binary similarity measures for the elucidation of patterns among samples of different botanical origin and various metabolomic profiles. METHODS Nine qualitative metabolomic data sets covering a wide range of natural products and metabolomic profiles were applied to assess 44 binary similarity measures for the fingerprinting of plant extracts and natural products. The measures were analyzed by the novel sum of ranking differences method (SRD), searching for the most promising candidates. RESULTS Baroni-Urbani-Buser (BUB) and Hawkins-Dotson (HD) similarity coefficients were selected as the best measures by SRD and analysis of variance (ANOVA), while Dice (Di1), Yule, Russel-Rao, and Consonni-Todeschini 3 ranked the worst. ANOVA revealed that concordantly and intermediately symmetric similarity coefficients are better candidates for metabolomic fingerprinting than the asymmetric and correlation based ones. The fingerprint analysis based on the BUB and HD coefficients and qualitative metabolomic data performed equally well as the quantitative metabolomic profile analysis. CONCLUSION Fingerprint analysis based on the qualitative metabolomic profiles and binary similarity measures proved to be a reliable way in finding the same/similar patterns in metabolomic data as that extracted from quantitative data.
Collapse
Affiliation(s)
- Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary
| | - Filip Andrić
- Department of Analytical Chemistry, University of Belgrade - Faculty of Chemistry, Studentski trg. 12-16, 11000, Belgrade, Serbia.
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary
| |
Collapse
|
3
|
Weyenberg G, Huggins PM, Schardl CL, Howe DK, Yoshida R. kdetrees: Non-parametric estimation of phylogenetic tree distributions. Bioinformatics 2014; 30:2280-7. [PMID: 24764459 PMCID: PMC4176058 DOI: 10.1093/bioinformatics/btu258] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/04/2014] [Accepted: 04/22/2014] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.
Collapse
Affiliation(s)
- Grady Weyenberg
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Peter M Huggins
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Christopher L Schardl
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Daniel K Howe
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Ruriko Yoshida
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| |
Collapse
|