1
|
Mai U, Charvel E, Mirarab S. Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model. Syst Biol 2024; 73:823-838. [PMID: 38970346 PMCID: PMC11524793 DOI: 10.1093/sysbio/syae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 06/13/2024] [Accepted: 07/03/2024] [Indexed: 07/08/2024] Open
Abstract
Dating phylogenetic trees to obtain branch lengths in time units is essential for many downstream applications but has remained challenging. Dating requires inferring substitution rates that can change across the tree. While we can assume to have information about a small subset of nodes from the fossil record or sampling times (for fast-evolving organisms), inferring the ages of the other nodes essentially requires extrapolation and interpolation. Assuming a distribution of branch rates, we can formulate dating as a constrained maximum likelihood (ML) estimation problem. While ML dating methods exist, their accuracy degrades in the face of model misspecification, where the assumed parametric statistical distribution of branch rates vastly differs from the true distribution. Notably, most existing methods assume rigid, often unimodal, branch rate distributions. A second challenge is that the likelihood function involves an integral over the continuous domain of the rates, often leading to difficult non-convex optimization problems. To tackle both challenges, we propose a new method called Molecular Dating using Categorical-models (MD-Cat). MD-Cat uses a categorical model of rates inspired by non-parametric statistics and can approximate a large family of models by discretizing the rate distribution into k categories. Under this model, we can use the Expectation-Maximization algorithm to co-estimate rate categories and branch lengths in time units. Our model has fewer assumptions about the true distribution of branch rates than parametric models such as Gamma or LogNormal distribution. Our results on two simulated and real datasets of Angiosperms and HIV and a wide selection of rate distributions show that MD-Cat is often more accurate than the alternatives, especially on datasets with exponential or multimodal rate distributions.
Collapse
Affiliation(s)
- Uyen Mai
- Department of Computer Science and Engineering, UC San Diego, CA 92093, USA
| | - Eduardo Charvel
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| |
Collapse
|
2
|
Mello B, Schrago CG. Modeling Substitution Rate Evolution across Lineages and Relaxing the Molecular Clock. Genome Biol Evol 2024; 16:evae199. [PMID: 39332907 PMCID: PMC11430275 DOI: 10.1093/gbe/evae199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2024] [Indexed: 09/29/2024] Open
Abstract
Relaxing the molecular clock using models of how substitution rates change across lineages has become essential for addressing evolutionary problems. The diversity of rate evolution models and their implementations are substantial, and studies have demonstrated their impact on divergence time estimates can be as significant as that of calibration information. In this review, we trace the development of rate evolution models from the proposal of the molecular clock concept to the development of sophisticated Bayesian and non-Bayesian methods that handle rate variation in phylogenies. We discuss the various approaches to modeling rate evolution, provide a comprehensive list of available software, and examine the challenges and advancements of the prevalent Bayesian framework, contrasting them to faster non-Bayesian methods. Lastly, we offer insights into potential advancements in the field in the era of big data.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21941-617, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21941-617, Brazil
| |
Collapse
|
3
|
Mai U, Hu G, Raphael BJ. Maximum likelihood phylogeographic inference of cell motility and cell division from spatial lineage tracing data. Bioinformatics 2024; 40:i228-i236. [PMID: 38940146 PMCID: PMC11211844 DOI: 10.1093/bioinformatics/btae221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Recently developed spatial lineage tracing technologies induce somatic mutations at specific genomic loci in a population of growing cells and then measure these mutations in the sampled cells along with the physical locations of the cells. These technologies enable high-throughput studies of developmental processes over space and time. However, these applications rely on accurate reconstruction of a spatial cell lineage tree describing both past cell divisions and cell locations. Spatial lineage trees are related to phylogeographic models that have been well-studied in the phylogenetics literature. We demonstrate that standard phylogeographic models based on Brownian motion are inadequate to describe the spatial symmetric displacement (SD) of cells during cell division. RESULTS We introduce a new model-the SD model for cell motility that includes symmetric displacements of daughter cells from the parental cell followed by independent diffusion of daughter cells. We show that this model more accurately describes the locations of cells in a real spatial lineage tracing of mouse embryonic stem cells. Combining the spatial SD model with an evolutionary model of DNA mutations, we obtain a phylogeographic model for spatial lineage tracing. Using this model, we devise a maximum likelihood framework-MOLLUSC (Maximum Likelihood Estimation Of Lineage and Location Using Single-Cell Spatial Lineage tracing Data)-to co-estimate time-resolved branch lengths, spatial diffusion rate, and mutation rate. On both simulated and real data, we show that MOLLUSC accurately estimates all parameters. In contrast, the Brownian motion model overestimates spatial diffusion rate in all test cases. In addition, the inclusion of spatial information improves accuracy of branch length estimation compared to sequence data alone. On real data, we show that spatial information has more signal than sequence data for branch length estimation, suggesting augmenting lineage tracing technologies with spatial information is useful to overcome the limitations of genome-editing in developmental systems. AVAILABILITY AND IMPLEMENTATION The python implementation of MOLLUSC is available at https://github.com/raphael-group/MOLLUSC.
Collapse
Affiliation(s)
- Uyen Mai
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| | - Gary Hu
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| |
Collapse
|
4
|
Arasti S, Tabaghi P, Tabatabaee Y, Mirarab S. Branch Length Transforms using Optimal Tree Metric Matching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.13.566962. [PMID: 38746464 PMCID: PMC11092445 DOI: 10.1101/2023.11.13.566962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The abundant discordance between evolutionary relationships across the genome has rekindled interest in ways of comparing and averaging trees on a shared leaf set. However, most attempts at reconciling trees have focused on tree topology, producing metrics for comparing topologies and methods for computing median tree topologies. Using branch lengths, however, has been more elusive, due to several challenges. Species tree branch lengths can be measured in many units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times. Thus, branch lengths are highly heterogeneous across both the genome and the tree. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we make progress on the question of mapping one tree to another, incorporating both topology and branch length. We define a series of computational problems to formalize finding the best transformation of one tree to another while maintaining its topology and other constraints. We show that all these problems can be solved in quadratic time and memory using a linear algebraic formulation coupled with dynamic programming preprocessing. Our formulations lead to convex optimization problems, with efficient and theoretically optimal solutions. While many applications can be imagined for this framework, we apply it to measure species tree branch lengths in the unit of the expected number of substitutions per site while allowing divergence from ultrametricity across the tree. In these applications, our method matches or surpasses other methods designed directly for solving those problems. Thus, our approach provides a versatile toolkit that finds applications in similar evolutionary questions. Code availability The software is available at https://github.com/shayesteh99/TCMM.git . Data availability Data are available on Github https://github.com/shayesteh99/TCMM-Data.git .
Collapse
|
5
|
Wang Z, Wang Y, Ji Y, Yang Z, Pei Y, Dai J, Zhang Y, Zhou F. Hypoconnectivity of the Amygdala in Patients with Low-Back-Related Leg Pain Linked to Individual Mechanical Pain Sensitivity: A Resting-State Functional MRI Study. J Pain Res 2023; 16:3775-3784. [PMID: 38026465 PMCID: PMC10640821 DOI: 10.2147/jpr.s425874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Purpose To explore resting-state functional connectivity (rsFC) of the amygdala in patients with low-back-related leg pain (LBLP). Patients and Methods For this prospective study, a total of 35 LBLP patients and 30 healthy controls (HCs) were included and underwent functional MRI and clinical assessments. Then, patients with LBLP were divided into acute LBLP (aLBLP) and chronic LBLP (cLBLP) subgroups. We further evaluated the between-group rsFC differences using left and right amygdala seeds in a whole-brain voxel analysis strategy. Finally, we performed correlation analysis between the rsFC values of altered regions and clinical indices. Results Compared to HCs, hypoconnectivity of the amygdala was observed in LBLP patients (P < 0.01, with correction). The amygdala's rsFC pattern was different between aLBLP and cLBLP patients: decreased the amygdala's FC to the right putamen, to the right paracentral lobule (PCL), or to the right posterior temporal lobe in aLBLP patients, while right amygdala to the bilateral anterior cingulate cortex (ACC) and the left postcentral gyrus (PoCG) in cLBLP patients. Correlation analysis showed that lower rsFC of the left amygdala to the right PCL was correlated with the von Frey filament (vF) test values of the left lumbar (p = 0.025) and right lumbar (p = 0.019) regions, and rsFC of the right amygdala to the left PoCG was correlated with lower vF test values of the left lumbar (p = 0.017), right lumbar spine (p = 0.003); to right PoCG was correlated with calf (p = 0.015); the rsFC of the right amygdala to bilateral ACC was negatively correlated with the pain rating index (p = 0.003). Conclusion LBLP patients showed amygdala hypoconnectivity, and the altered pattern of amygdala rsFC was different in the acute and chronic phases. Moreover, the amygdala hypoconnectivity was related to individual mechanical sensitivity (vF test) in LBLP patients.
Collapse
Affiliation(s)
- Ziyun Wang
- Department of Radiology, The First Affiliated Hospital, Nanchang University, Nanchang, 330006, People’s Republic of China
- Neuroradiology Laboratory, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006, People’s Republic of China
| | - Yao Wang
- Department of Radiology, The First Affiliated Hospital, Nanchang University, Nanchang, 330006, People’s Republic of China
- Neuroradiology Laboratory, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006, People’s Republic of China
| | - Yuqi Ji
- Department of Radiology, The First Affiliated Hospital, Nanchang University, Nanchang, 330006, People’s Republic of China
- Neuroradiology Laboratory, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006, People’s Republic of China
| | - Ziwei Yang
- Department of Radiology, The First Affiliated Hospital, Nanchang University, Nanchang, 330006, People’s Republic of China
- Neuroradiology Laboratory, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006, People’s Republic of China
| | - Yixiu Pei
- Department of Radiology, The Affiliated Ganzhou Hospital of Nanchang University, Ganzhou, Jiangxi, 341000, People’s Republic of China
| | - Jiankun Dai
- MR Advanced Application, GE Healthcare, Beijing, 100176, People’s Republic of China
| | - Yong Zhang
- Department of Pain Clinic, The First Affiliated Hospital, Nanchang University, Nanchang, Jiangxi Province, 330006, People’s Republic of China
| | - Fuqing Zhou
- Department of Radiology, The First Affiliated Hospital, Nanchang University, Nanchang, 330006, People’s Republic of China
- Neuroradiology Laboratory, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006, People’s Republic of China
| |
Collapse
|
6
|
Winther RG, Willerslev E. Wilson and Sarich (1969): The birth of a molecular evolution research paradigm. Proc Natl Acad Sci U S A 2023; 120:e2220473120. [PMID: 36893264 PMCID: PMC10243126 DOI: 10.1073/pnas.2220473120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023] Open
Affiliation(s)
- Rasmus Grønfeldt Winther
- Humanities Division, University of California, Santa Cruz, CA95064
- GeoGenetics Section, Globe Institute, University of Copenhagen, 1350Copenhagen K, Denmark
| | - Eske Willerslev
- GeoGenetics Section, Globe Institute, University of Copenhagen, 1350Copenhagen K, Denmark
- Department of Zoology, University of Cambridge, CambridgeCB2 3EJ, United Kingdom
| |
Collapse
|