1
|
McConaughy S, Amundsen K, Hyten D. Effects of demographic history on recombination hotspots in soybean. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 119:1030-1038. [PMID: 38781098 DOI: 10.1111/tpj.16814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/27/2024] [Accepted: 05/03/2024] [Indexed: 05/25/2024]
Abstract
Recombination is the primary mechanism underlying genetic improvement in populations and allows plant breeders to create new allelic combinations for agronomic improvement. Soybean [Glycine max (L.) Merr.] has gone through multiple genetic bottlenecks that have significantly affected its genetic diversity, linkage disequilibrium, and altered allele frequencies. To investigate the impact of genetic bottlenecks on recombination hotspots in soybeans, historical recombination was studied in three soybean populations. The populations were wild soybean [Glycine soja (Sieb. and Zucc.)], landraces, and North American elite soybean cultivars that have been genotyped with the SoySNP50K BeadChip. While each population after a genetic bottleneck had an increased average haplotype block size, they did not have a significant difference in the number of hotspots between each population. Instead, the increase in observed haplotype block size is likely due to an elimination of individuals that contained historical recombination at hotspots which decreased the observed rate of recombination for the hotspot after each genetic bottleneck. Conversely, heterochromatic DNA which has an increased haplotype block size compared to euchromatic DNA had a significantly different number of hotspots but not a significant difference in the average hotspot recombination rate. Previously identified genomic motifs associated with hotspots were also associated with hotspots found in the historical populations suggesting a common mechanism. This characterization of historical recombination hotspots in soybeans provides further insights into the effect genetic bottlenecks and selection have on recombination hotspots.
Collapse
Affiliation(s)
- Samantha McConaughy
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, Nebraska, 68503, USA
| | - Keenan Amundsen
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, Nebraska, 68503, USA
| | - David Hyten
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, Nebraska, 68503, USA
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, Nebraska, 68503, USA
| |
Collapse
|
2
|
Guzmán-Vargas L, Zabaleta-Ortega A, Guzmán-Sáenz A. Simplicial complex entropy for time series analysis. Sci Rep 2023; 13:22696. [PMID: 38123652 PMCID: PMC10733285 DOI: 10.1038/s41598-023-49958-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
The complex behavior of many systems in nature requires the application of robust methodologies capable of identifying changes in their dynamics. In the case of time series (which are sensed values of a system during a time interval), several methods have been proposed to evaluate their irregularity. However, for some types of dynamics such as stochastic and chaotic, new approaches are required that can provide a better characterization of them. In this paper we present the simplicial complex approximate entropy, which is based on the conditional probability of the occurrence of elements of a simplicial complex. Our results show that this entropy measure provides a wide range of values with details not easily identifiable with standard methods. In particular, we show that our method is able to quantify the irregularity in simulated random sequences and those from low-dimensional chaotic dynamics. Furthermore, it is possible to consistently differentiate cardiac interbeat sequences from healthy subjects and from patients with heart failure, as well as to identify changes between dynamical states of coupled chaotic maps. Our results highlight the importance of the structures revealed by the simplicial complexes, which holds promise for applications of this approach in various contexts.
Collapse
Affiliation(s)
- Lev Guzmán-Vargas
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, 07340, Mexico City, Mexico.
| | - Alvaro Zabaleta-Ortega
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, 07340, Mexico City, Mexico
| | - Aldo Guzmán-Sáenz
- Topological Data Analysis in Genomics, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
| |
Collapse
|
3
|
Migdałek G, Żelawski M. Measuring population-level plant gene flow with topological data analysis. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
4
|
Vipond O, Bull JA, Macklin PS, Tillmann U, Pugh CW, Byrne HM, Harrington HA. Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors. Proc Natl Acad Sci U S A 2021; 118:e2102166118. [PMID: 34625491 PMCID: PMC8522280 DOI: 10.1073/pnas.2102166118] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2021] [Indexed: 12/29/2022] Open
Abstract
Highly resolved spatial data of complex systems encode rich and nonlinear information. Quantification of heterogeneous and noisy data-often with outliers, artifacts, and mislabeled points-such as those from tissues, remains a challenge. The mathematical field that extracts information from the shape of data, topological data analysis (TDA), has expanded its capability for analyzing real-world datasets in recent years by extending theory, statistics, and computation. An extension to the standard theory to handle heterogeneous data is multiparameter persistent homology (MPH). Here we provide an application of MPH landscapes, a statistical tool with theoretical underpinnings. MPH landscapes, computed for (noisy) data from agent-based model simulations of immune cells infiltrating into a spheroid, are shown to surpass existing spatial statistics and one-parameter persistent homology. We then apply MPH landscapes to study immune cell location in digital histology images from head and neck cancer. We quantify intratumoral immune cells and find that infiltrating regulatory T cells have more prominent voids in their spatial patterns than macrophages. Finally, we consider how TDA can integrate and interrogate data of different types and scales, e.g., immune cell locations and regions with differing levels of oxygenation. This work highlights the power of MPH landscapes for quantifying, characterizing, and comparing features within the tumor microenvironment in synthetic and real datasets.
Collapse
Affiliation(s)
- Oliver Vipond
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Joshua A Bull
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Philip S Macklin
- Nuffield Department of Medicine Research Building, University of Oxford, Oxford OX3 7FZ, United Kingdom
| | - Ulrike Tillmann
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
| | - Christopher W Pugh
- Nuffield Department of Medicine Research Building, University of Oxford, Oxford OX3 7FZ, United Kingdom;
| | - Helen M Byrne
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
| | - Heather A Harrington
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
| |
Collapse
|
5
|
Loughrey C, Fitzpatrick P, Orr N, Jurek-Loughrey A. The topology of data: Opportunities for cancer research. Bioinformatics 2021; 37:3091-3098. [PMID: 34320632 PMCID: PMC8504620 DOI: 10.1093/bioinformatics/btab553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/14/2021] [Accepted: 07/28/2021] [Indexed: 01/20/2023] Open
Abstract
Motivation Topological methods have recently emerged as a reliable and interpretable framework for extracting information from high-dimensional data, leading to the creation of a branch of applied mathematics called Topological Data Analysis (TDA). Since then, TDA has been progressively adopted in biomedical research. Biological data collection can result in enormous datasets, comprising thousands of features and spanning diverse datatypes. This presents a barrier to initial data analysis as the fundamental structure of the dataset becomes hidden, obstructing the discovery of important features and patterns. TDA provides a solution to obtain the underlying shape of datasets over continuous resolutions, corresponding to key topological features independent of noise. TDA has the potential to support future developments in healthcare as biomedical datasets rise in complexity and dimensionality. Previous applications extend across the fields of neuroscience, oncology, immunology and medical image analysis. TDA has been used to reveal hidden subgroups of cancer patients, construct organizational maps of brain activity and classify abnormal patterns in medical images. The utility of TDA is broad and to understand where current achievements lie, we have evaluated the present state of TDA in cancer data analysis. Results This article aims to provide an overview of TDA in Cancer Research. A brief introduction to the main concepts of TDA is provided to ensure that the article is accessible to readers who are not familiar with this field. Following this, a focussed literature review on the field is presented, discussing how TDA has been applied across heterogeneous datatypes for cancer research.
Collapse
Affiliation(s)
- Ciara Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| | - Padraig Fitzpatrick
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| | - Nick Orr
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, BT9 7AE, United Kingdom
| | - Anna Jurek-Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| |
Collapse
|
6
|
Abstract
Viral recombination is a major evolutionary mechanism driving adaptation processes, such as the ability of host-switching. Understanding global patterns of recombination could help to identify underlying mechanisms and to evaluate the potential risks of rapid adaptation. Conventional approaches (e.g., those based on linkage disequilibrium) are computationally demanding or even intractable when sequence alignments include hundreds of sequences, common in viral data sets. We present a comprehensive analysis of recombination across 30 genomic alignments from viruses infecting humans. In order to scale the analysis and avoid the computational limitations of conventional approaches, we apply newly developed topological data analysis methods able to infer recombination rates for large data sets. We show that viruses, such as ZEBOV and MARV, consistently displayed low levels of recombination, whereas high levels of recombination were observed in Sarbecoviruses, HBV, HEV, Rhinovirus A, and HIV. We observe that recombination is more common in positive single-stranded RNA viruses than in negatively single-stranded RNA ones. Interestingly, the comparison across multiple viruses suggests an inverse correlation between genome length and recombination rate. Positional analyses of recombination breakpoints along viral genomes, combined with our approach, detected at least 39 nonuniform patterns of recombination (i.e., cold or hotspots) in 18 viral groups. Among these, noteworthy hotspots are found in MERS-CoV and Sarbecoviruses (at spike, Nucleocapsid and ORF8). In summary, we have developed a fast pipeline to measure recombination that, combined with other approaches, has allowed us to find both common and lineage-specific patterns of recombination among viruses with potential relevance in viral adaptation.
Collapse
Affiliation(s)
- Juan Ángel Patiño-Galindo
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Ioan Filip
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Raul Rabadan
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
7
|
From molecules to populations: appreciating and estimating recombination rate variation. Nat Rev Genet 2020; 21:476-492. [DOI: 10.1038/s41576-020-0240-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/15/2020] [Indexed: 02/07/2023]
|
8
|
Amézquita EJ, Quigley MY, Ophelders T, Munch E, Chitwood DH. The shape of things to come: Topological data analysis and biology, from molecules to organisms. Dev Dyn 2020; 249:816-833. [PMID: 32246730 PMCID: PMC7383827 DOI: 10.1002/dvdy.175] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 11/11/2022] Open
Abstract
Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features-connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub-disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data-driven era where the meaningful interpretation of large data sets is a limiting factor.
Collapse
Affiliation(s)
- Erik J Amézquita
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Michelle Y Quigley
- Department of Horticulture, Michigan State University, East Lansing, Michigan, USA
| | - Tim Ophelders
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Elizabeth Munch
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA.,Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Daniel H Chitwood
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA.,Department of Horticulture, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|