1
|
Li X, Zhu Q, Zhao C, Qian X, Zhang X, Duan X, Lin W. Tipping Point Detection Using Reservoir Computing. RESEARCH (WASHINGTON, D.C.) 2023; 6:0174. [PMID: 37404384 PMCID: PMC10317016 DOI: 10.34133/research.0174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 05/26/2023] [Indexed: 07/06/2023]
Abstract
Detection in high fidelity of tipping points, the emergence of which is often induced by invisible changes in internal structures or/and external interferences, is paramountly beneficial to understanding and predicting complex dynamical systems (CDSs). Detection approaches, which have been fruitfully developed from several perspectives (e.g., statistics, dynamics, and machine learning), have their own advantages but still encounter difficulties in the face of high-dimensional, fluctuating datasets. Here, using the reservoir computing (RC), a recently notable, resource-conserving machine learning method for reconstructing and predicting CDSs, we articulate a model-free framework to accomplish the detection only using the time series observationally recorded from the underlying unknown CDSs. Specifically, we encode the information of the CDS in consecutive time durations of finite length into the weights of the readout layer in an RC, and then we use the learned weights as the dynamical features and establish a mapping from these features to the system's changes. Our designed framework can not only efficiently detect the changing positions of the system but also accurately predict the intensity change as the intensity information is available in the training data. We demonstrate the efficacy of our supervised framework using the dataset produced by representative physical, biological, and real-world systems, showing that our framework outperforms those traditional methods on the short-term data produced by the time-varying or/and noise-perturbed systems. We believe that our framework, on one hand, complements the major functions of the notable RC intelligent machine and, on the other hand, becomes one of the indispensable methods for deciphering complex systems.
Collapse
Affiliation(s)
- Xin Li
- College of Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Qunxi Zhu
- Research Institute of Intelligent Complex Systems and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Chengli Zhao
- College of Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Xuzhe Qian
- Research Institute of Intelligent Complex Systems and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- School of Mathematical Sciences, SCMS, SCAM, and CCSB, Fudan University, Shanghai 200433, China
| | - Xue Zhang
- College of Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Xiaojun Duan
- College of Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Wei Lin
- Research Institute of Intelligent Complex Systems and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- School of Mathematical Sciences, SCMS, SCAM, and CCSB, Fudan University, Shanghai 200433, China
| |
Collapse
|
2
|
Romano G, Rigaill G, Runge V, Fearnhead P. Detecting Abrupt Changes in the Presence of Local Fluctuations and Autocorrelated Noise. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1909598] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Gaetano Romano
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| | - Guillem Rigaill
- Université Paris-Saclay, CNRS, INRAE, Univ Evry, Institute of Plant Sciences Paris-Saclay (IPS2), Orsay, France
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| | - Vincent Runge
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| | - Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| |
Collapse
|
3
|
Hermann P, Heissl A, Tiemann-Boege I, Futschik A. LDJump: Estimating variable recombination rates from population genetic data. Mol Ecol Resour 2019; 19:623-638. [PMID: 30666785 PMCID: PMC6519033 DOI: 10.1111/1755-0998.12994] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Revised: 12/13/2018] [Accepted: 01/11/2019] [Indexed: 11/27/2022]
Abstract
As recombination plays an important role in evolution, its estimation and the identification of hotspot positions is of considerable interest. We propose a novel approach for estimating population recombination rates based on genotyping or sequence data that involves a sequential multiscale change point estimator. Our method also permits demography to be taken into account. It uses several summary statistics within a regression model fitted on suitable scenarios. Our proposed method is accurate, computationally fast, and provides a parsimonious solution by ensuring a type I error control against too many changes in the recombination rate. An application to human genome data suggests a good congruence between our estimated and experimentally identified hotspots. Our method is implemented in the R‐package LDJump, which is freely available at https://github.com/PhHermann/LDJump.
Collapse
Affiliation(s)
- Philipp Hermann
- Department of Applied Statistics, Johannes Kepler University Linz, Linz, Austria
| | - Angelika Heissl
- Institute of Biophysics, Johannes Kepler University Linz, Linz, Austria
| | | | - Andreas Futschik
- Department of Applied Statistics, Johannes Kepler University Linz, Linz, Austria
| |
Collapse
|
4
|
Affiliation(s)
- Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster, United Kingdom
| | - Robert Maidstone
- Department of Mathematics and Statistics, Lancaster University, Lancaster, United Kingdom
- STOR-i Doctoral Training Centre, Lancaster University, Lancaster, United Kingdom
| | - Adam Letchford
- Department of Management Science, Lancaster University, Lancaster, United Kingdom
| |
Collapse
|
5
|
Affiliation(s)
- Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| | - Guillem Rigaill
- Institute of Plant Sciences Paris-Saclay, UMR 9213/UMR1403, CNRS, INRA, Université Paris-Sud, Université d’Evry, Université Paris-Diderot, Sorbonne Paris-Cité, Paris, France
- Laboratoire de Mathématiques at Modélisation d’Evry (LaMME), Université d’Evry Val d’Essonne, UMR CNRS 8071, ENSIIE, USC INRA, Paris, France
| |
Collapse
|
6
|
|
7
|
Chang PL, Kopania E, Keeble S, Sarver BAJ, Larson E, Orth A, Belkhir K, Boursot P, Bonhomme F, Good JM, Dean MD. Whole exome sequencing of wild-derived inbred strains of mice improves power to link phenotype and genotype. Mamm Genome 2017; 28:416-425. [PMID: 28819774 DOI: 10.1007/s00335-017-9704-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 06/23/2017] [Indexed: 12/30/2022]
Abstract
The house mouse is a powerful model to dissect the genetic basis of phenotypic variation, and serves as a model to study human diseases. Despite a wealth of discoveries, most classical laboratory strains have captured only a small fraction of genetic variation known to segregate in their wild progenitors, and existing strains are often related to each other in complex ways. Inbred strains of mice independently derived from natural populations have the potential to increase power in genetic studies with the addition of novel genetic variation. Here, we perform exome-enrichment and high-throughput sequencing (~8× coverage) of 26 wild-derived strains known in the mouse research community as the "Montpellier strains." We identified 1.46 million SNPs in our dataset, approximately 19% of which have not been detected from other inbred strains. This novel genetic variation is expected to contribute to phenotypic variation, as they include 18,496 nonsynonymous variants and 262 early stop codons. Simulations demonstrate that the higher density of genetic variation in the Montpellier strains provides increased power for quantitative genetic studies. Inasmuch as the power to connect genotype to phenotype depends on genetic variation, it is important to incorporate these additional genetic strains into future research programs.
Collapse
Affiliation(s)
- Peter L Chang
- Molecular and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Emily Kopania
- Molecular and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.,Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Sara Keeble
- Molecular and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.,Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Brice A J Sarver
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Erica Larson
- Division of Biological Sciences, University of Montana, Missoula, MT, USA.,Department of Biological Sciences, University of Denver, Denver, CO, 80210, USA
| | - Annie Orth
- Institut des Sciences de l'Evolution, CNRS UMR554, Université de Montpellier, Montpellier, France
| | - Khalid Belkhir
- Institut des Sciences de l'Evolution, CNRS UMR554, Université de Montpellier, Montpellier, France
| | - Pierre Boursot
- Institut des Sciences de l'Evolution, CNRS UMR554, Université de Montpellier, Montpellier, France
| | - François Bonhomme
- Institut des Sciences de l'Evolution, CNRS UMR554, Université de Montpellier, Montpellier, France
| | - Jeffrey M Good
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
| |
Collapse
|
8
|
Drosophila simulans: A Species with Improved Resolution in Evolve and Resequence Studies. G3-GENES GENOMES GENETICS 2017; 7:2337-2343. [PMID: 28546383 PMCID: PMC5499140 DOI: 10.1534/g3.117.043349] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The combination of experimental evolution with high-throughput sequencing of pooled individuals—i.e., evolve and resequence (E&R)—is a powerful approach to study adaptation from standing genetic variation under controlled, replicated conditions. Nevertheless, E&R studies in Drosophila melanogaster have frequently resulted in inordinate numbers of candidate SNPs, particularly for complex traits. Here, we contrast the genomic signature of adaptation following ∼60 generations in a novel hot environment for D. melanogaster and D. simulans. For D. simulans, the regions carrying putatively selected loci were far more distinct, and thus harbored fewer false positives, than those in D. melanogaster. We propose that species without segregating inversions and higher recombination rates, such as D. simulans, are better suited for E&R studies that aim to characterize the genetic variants underlying the adaptive response.
Collapse
|
9
|
Pein F, Sieling H, Munk A. Heterogeneous change point inference. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12202] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | | | - Axel Munk
- Georg-August-Universität Göttingen and Max Planck Institute for Biophysical Chemistry; Göttingen Germany
| |
Collapse
|
10
|
Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution. Genetics 2016; 204:723-735. [PMID: 27542959 PMCID: PMC5068858 DOI: 10.1534/genetics.116.191197] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 07/30/2016] [Indexed: 01/22/2023] Open
Abstract
The effective population size (Ne) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term Ne. They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to Ne. Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of Ne, which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate Ne estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide Ne estimates, we extend our method using a recursive partitioning approach to estimate Ne locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their Ne estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.
Collapse
|
11
|
Maidstone R, Hocking T, Rigaill G, Fearnhead P. On optimal multiple changepoint algorithms for large data. STATISTICS AND COMPUTING 2016; 27:519-533. [PMID: 32355427 PMCID: PMC7175693 DOI: 10.1007/s11222-016-9636-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 02/01/2016] [Indexed: 06/11/2023]
Abstract
Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the time-series. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two new algorithms for segmenting data: FPOP and SNIP. Empirical results show that FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.
Collapse
Affiliation(s)
- Robert Maidstone
- STOR-i Centre for Doctoral Training, Lancaster University, Lancaster, UK
| | - Toby Hocking
- McGill University and Genome Quebec Innovation Center, Quebec, Canada
| | - Guillem Rigaill
- Institute of Plant Sciences Paris-Saclay, UMR 9213/UMR1403, CNRS, INRA, Université Paris-Sud, Université d’Evry, Université Paris-Diderot, Sorbonne Paris-Cité, Paris, France
| | - Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| |
Collapse
|
12
|
Algama M, Keith JM. Investigating genomic structure using changept: A Bayesian segmentation model. Comput Struct Biotechnol J 2014; 10:107-15. [PMID: 25349679 PMCID: PMC4204429 DOI: 10.1016/j.csbj.2014.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though they may nevertheless be of interest to biologists. One technique for investigating the composition of genomes is to segment sequences into compositionally homogenous blocks. This technique, known as 'sequence segmentation' or 'change-point analysis', is used to identify patterns of variation across genomes such as GC-rich and GC-poor regions, coding and non-coding regions, slowly evolving and rapidly evolving regions and many other types of variation. In this mini-review we outline many of the genome segmentation methods currently available and then focus on a Bayesian DNA segmentation algorithm, with examples of its various applications.
Collapse
Affiliation(s)
- Manjula Algama
- School of Mathematical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Jonathan M Keith
- School of Mathematical Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
13
|
Affiliation(s)
- Klaus Frick
- Interstate University of Applied Sciences of Technology; Buchs Switzerland
| | - Axel Munk
- University of Göttingen; Göttingen Germany
- Max Planck Institute for Biophysical Chemistry; Göttingen Germany
| | | |
Collapse
|