1
|
Somarelli JA, Ware KE, Kostadinov R, Robinson JM, Amri H, Abu-Asab M, Fourie N, Diogo R, Swofford D, Townsend JP. PhyloOncology: Understanding cancer through phylogenetic analysis. Biochim Biophys Acta Rev Cancer 2016; 1867:101-108. [PMID: 27810337 DOI: 10.1016/j.bbcan.2016.10.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/14/2016] [Accepted: 10/26/2016] [Indexed: 11/30/2022]
Abstract
Despite decades of research and an enormity of resultant data, cancer remains a significant public health problem. New tools and fresh perspectives are needed to obtain fundamental insights, to develop better prognostic and predictive tools, and to identify improved therapeutic interventions. With increasingly common genome-scale data, one suite of algorithms and concepts with potential to shed light on cancer biology is phylogenetics, a scientific discipline used in diverse fields. From grouping subsets of cancer samples to tracing subclonal evolution during cancer progression and metastasis, the use of phylogenetics is a powerful systems biology approach. Well-developed phylogenetic applications provide fast, robust approaches to analyze high-dimensional, heterogeneous cancer data sets. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.
Collapse
Affiliation(s)
- Jason A Somarelli
- Duke Cancer Institute and the Department of Medicine, Duke University Medical Center, Durham, NC 27710, United States.
| | - Kathryn E Ware
- Duke Cancer Institute and the Department of Medicine, Duke University Medical Center, Durham, NC 27710, United States
| | - Rumen Kostadinov
- Pediatric Oncology, School of Medicine, Johns Hopkins University, United States
| | - Jeffrey M Robinson
- Anatomy Department, College of Medicine, Howard University, Washington, DC 20059, United States; Digestive Disorders Unit, National Institute of Nursing Research, NIH, Bethesda, MD 20892, United States
| | - Hakima Amri
- Department of Biochemistry and Cellular and Molecular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Mones Abu-Asab
- Section of Ultrastructural Biology, National Eye Institute, NIH, Bethesda, MD 20892, United States
| | - Nicolaas Fourie
- Digestive Disorders Unit, National Institute of Nursing Research, NIH, Bethesda, MD 20892, United States
| | - Rui Diogo
- Anatomy Department, College of Medicine, Howard University, Washington, DC 20059, United States
| | - David Swofford
- Department of Biology, Duke University Trinity College of Arts and Sciences, Durham, NC 27710, United States
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale University, United States; Department of Ecology and Evolutionary Biology, Yale University, United States; Department of Program in Computational Biology and Bioinformatics, Yale University, United States.
| |
Collapse
|
2
|
Kurt Lienau E, DeSalle R, Allard M, Brown EW, Swofford D, Rosenfeld JA, Sarkar IN, Planet PJ. The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life. Cladistics 2011; 27:417-427. [PMID: 34875790 DOI: 10.1111/j.1096-0031.2010.00337.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. © The Willi Hennig Society 2010.
Collapse
Affiliation(s)
- E Kurt Lienau
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA.,Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA
| | - Marc Allard
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Eric W Brown
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - David Swofford
- Duke Institute for Genomes and Science Policy, 366 BioSci, Duke University, Durham, NC 27708, USA
| | - Jeffrey A Rosenfeld
- Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Indra N Sarkar
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| | - Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Pediatrics, Children's Hospital of New York, Columbia University, College of Physicians and Surgeons, New York, NY 10032, USA
| |
Collapse
|
3
|
Abstract
This unit provides a general description of reconstructing evolutionary trees using PAUP* 4.0. The protocol takes users through an example analysis of mitochondrial DNA sequence data using the parsimony and the likelihood criteria to infer optimal trees. The protocol also discusses searching options available in PAUP* and demonstrates how to import non-NEXUS formats. Finally, a general discussion is given regarding the pros and cons of the "model-free" and "model-based" methods used throughout the protocol.
Collapse
|
4
|
Abstract
The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a compound Poisson process. Events of substitution rate change are placed onto a phylogenetic tree according to a Poisson process. When an event of substitution rate change occurs, the current rate of substitution is modified by a gamma-distributed random variable. Parameters of the model can be estimated using Bayesian inference. We use Markov chain Monte Carlo integration to evaluate the posterior probability distribution because the posterior probability involves high dimensional integrals and summations. Specifically, we use the Metropolis-Hastings-Green algorithm with 11 different move types to evaluate the posterior distribution. We demonstrate the method by analyzing a complete mtDNA sequence data set from 23 mammals. The model presented here has several potential advantages over other models that have been proposed to relax the clock because it is parametric and does not assume that rates change only at speciation events. This model should prove useful for estimating divergence times when substitution rates vary across lineages.
Collapse
Affiliation(s)
- J P Huelsenbeck
- Department of Biology, University of Rochester, New York 14627, USA.
| | | | | |
Collapse
|