1
|
Rak R, Rak E. Multifractality of Complex Networks Is Also Due to Geometry: A Geometric Sandbox Algorithm. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1324. [PMID: 37761623 PMCID: PMC10527770 DOI: 10.3390/e25091324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 09/06/2023] [Accepted: 09/10/2023] [Indexed: 09/29/2023]
Abstract
Over the past three decades, describing the reality surrounding us using the language of complex networks has become very useful and therefore popular. One of the most important features, especially of real networks, is their complexity, which often manifests itself in a fractal or even multifractal structure. As a generalization of fractal analysis, the multifractal analysis of complex networks is a useful tool for identifying and quantitatively describing the spatial hierarchy of both theoretical and numerical fractal patterns. Nowadays, there are many methods of multifractal analysis. However, all these methods take into account only the fact of connection between nodes (and eventually the weight of edges) and do not take into account the real positions (coordinates) of nodes in space. However, intuition suggests that the geometry of network nodes' position should have a significant impact on the true fractal structure. Many networks identified in nature (e.g., air connection networks, energy networks, social networks, mountain ridge networks, networks of neurones in the brain, and street networks) have their own often unique and characteristic geometry, which is not taken into account in the identification process of multifractality in commonly used methods. In this paper, we propose a multifractal network analysis method that takes into account both connections between nodes and the location coordinates of nodes (network geometry). We show the results for different geometrical variants of the same network and reveal that this method, contrary to the commonly used method, is sensitive to changes in network geometry. We also carry out tests for synthetic as well as real-world networks.
Collapse
Affiliation(s)
- Rafał Rak
- Institute of Physics, College of Natural Sciences, University of Rzeszów, Pigonia 1, 35-310 Rzeszów, Poland
| | - Ewa Rak
- Institute of Mathematics, College of Natural Sciences, University of Rzeszów, Pigonia 1, 35-310 Rzeszów, Poland;
| |
Collapse
|
2
|
Chebbi Babchia Z, Elloumi Oueslati A. Wavelet-based multifractal analysis of C.elegans sequences based on FCGS signal. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
3
|
Ding Y, Liu JL, Li X, Tian YC, Yu ZG. Computationally efficient sandbox algorithm for multifractal analysis of large-scale complex networks with tens of millions of nodes. Phys Rev E 2021; 103:043303. [PMID: 34005996 DOI: 10.1103/physreve.103.043303] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/08/2021] [Indexed: 11/07/2022]
Abstract
Among various algorithms of multifractal analysis (MFA) for complex networks, the sandbox MFA algorithm behaves with the best computational efficiency. However, the existing sandbox algorithm is still computationally expensive for MFA of large-scale networks with tens of millions of nodes. It is also not clear whether MFA results can be improved by a largely increased size of a theoretical network. To tackle these challenges, a computationally efficient sandbox algorithm (CESA) is presented in this paper for MFA of large-scale networks. Distinct from the existing sandbox algorithm that uses the shortest-path distance matrix to obtain the required information for MFA of networks, our CESA employs the compressed sparse row format of the adjacency matrix and the breadth-first search technique to directly search the neighbor nodes of each layer of center nodes, and then to retrieve the required information. A theoretical analysis reveals that the CESA reduces the time complexity of the existing sandbox algorithm from cubic to quadratic, and also improves the space complexity from quadratic to linear. Then the CESA is demonstrated to be effective, efficient, and feasible through the MFA results of (u,v)-flower model networks from the fifth to the 12th generations. It enables us to study the multifractality of networks of the size of about 11 million nodes with a normal desktop computer. Furthermore, we have also found that increasing the size of (u,v)-flower model network does improve the accuracy of MFA results. Finally, our CESA is applied to a few typical real-world networks of large scale.
Collapse
Affiliation(s)
- Yuemin Ding
- Department of Energy and Process Engineering, Norwegian University of Science and Technology, 7049 Trondheim, Norway.,School of Computer Science, Queensland University of Technology, Brisbane QLD 4000, Australia
| | - Jin-Long Liu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Xiaohui Li
- School of Computer Science, Queensland University of Technology, Brisbane QLD 4000, Australia.,School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, Hubei 430081, China
| | - Yu-Chu Tian
- School of Computer Science, Queensland University of Technology, Brisbane QLD 4000, Australia
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
| |
Collapse
|
4
|
Anitas EM. Small-Angle Scattering and Multifractal Analysis of DNA Sequences. Int J Mol Sci 2020; 21:ijms21134651. [PMID: 32629908 PMCID: PMC7369734 DOI: 10.3390/ijms21134651] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 06/28/2020] [Accepted: 06/28/2020] [Indexed: 12/26/2022] Open
Abstract
The arrangement of A, C, G and T nucleotides in large DNA sequences of many prokaryotic and eukaryotic cells exhibit long-range correlations with fractal properties. Chaos game representation (CGR) of such DNA sequences, followed by a multifractal analysis, is a useful way to analyze the corresponding scaling properties. This approach provides a powerful visualization method to characterize their spatial inhomogeneity, and allows discrimination between mono- and multifractal distributions. However, in some cases, two different arbitrary point distributions, may generate indistinguishable multifractal spectra. By using a new model based on multiplicative deterministic cascades, here it is shown that small-angle scattering (SAS) formalism can be used to address such issue, and to extract additional structural information. It is shown that the box-counting dimension given by multifractal spectra can be recovered from the scattering exponent of SAS intensity in the fractal region. This approach is illustrated for point distributions of CGR data corresponding to Escherichia coli, Phospholamban and Mouse mitochondrial DNA, and it is shown that for the latter two cases, SAS allows extraction of the fractal iteration number and the scaling factor corresponding to "ACGT" square, or to recover the number of bases. The results are compared with a model based on multiplicative deterministic cascades, and respectively with one which takes into account the existence of forbidden sequences in DNA. This allows a classification of the DNA sequences in terms of random and deterministic fractals structures emerging in CGR.
Collapse
Affiliation(s)
- Eugen Mircea Anitas
- Joint Institute for Nuclear Research, Dubna 141980, Russia;
- Horia Hulubei, National Institute of Physics and Nuclear Engineering, 077125 Bucharest-Magurele, Romania
| |
Collapse
|
5
|
Behnia S, Fathizadeh S, Javanshour E, Nemati F. Light-Driven Modulation of Electrical Current through DNA Sequences: Engineering of a Molecular Optical Switch. J Phys Chem B 2020; 124:3261-3270. [DOI: 10.1021/acs.jpcb.0c00073] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- S. Behnia
- Department of Physics, Urmia University of Technology, Urmia 5716693187, Iran
| | - S. Fathizadeh
- Department of Physics, Urmia University of Technology, Urmia 5716693187, Iran
| | - E. Javanshour
- Department of Physics, Urmia University of Technology, Urmia 5716693187, Iran
| | - F. Nemati
- Department of Physics, Urmia University of Technology, Urmia 5716693187, Iran
| |
Collapse
|
6
|
Jiang S, Li BG, Yu ZG, Wang F, Anh V, Zhou Y. Multifractal temporally weighted detrended cross-correlation analysis of multivariate time series. CHAOS (WOODBURY, N.Y.) 2020; 30:023134. [PMID: 32113234 DOI: 10.1063/1.5129574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 02/04/2020] [Indexed: 06/10/2023]
Abstract
Fractal and multifractal properties of various systems have been studied extensively. In this paper, first, the multivariate multifractal detrend cross-correlation analysis (MMXDFA) is proposed to investigate the multifractal features in multivariate time series. MMXDFA may produce oscillations in the fluctuation function and spurious cross correlations. In order to overcome these problems, we then propose the multivariate multifractal temporally weighted detrended cross-correlation analysis (MMTWXDFA). In relation to the multivariate detrended cross-correlation analysis and multifractal temporally weighted detrended cross-correlation analysis, an innovation of MMTWXDFA is the application of the signed Manhattan distance to calculate the local detrended covariance function. To evaluate the performance of the MMXDFA and MMTWXDFA methods, we apply them on some artificially generated multivariate series. Several numerical tests demonstrate that both methods can identify their fractality, but MMTWXDFA can detect long-range cross correlations and simultaneously quantify the levels of cross correlation between two multivariate series more accurately.
Collapse
Affiliation(s)
- Shan Jiang
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Bao-Gen Li
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Fang Wang
- College of Information and Science Technology, Hunan Agricultural University, Changsha, Hunan 410128, China
| | - Vo Anh
- Faculty of Science, Engineering and Technology, Swinburne University of Technology, PO Box 218, Hawthorn, Victoria 3122, Australia
| | - Yu Zhou
- Institute of Future Cities and Department of Geography and Resource Management, The Chinese University of Hong Kong, Shatin, Hong Kong, China
| |
Collapse
|
7
|
Yang L, Wei P, Zhong C, Meng Z, Wang P, Tang YY. A Fractal Dimension and Empirical Mode Decomposition-Based Method for Protein Sequence Analysis. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419400202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In bioinformatics, the biological functions of proteins and their interactions can often be analyzed by the similarity of their sequences. In this paper, the authors combine the fractal dimension, empirical mode decomposition (EMD), and sliding window for protein sequence comparison. First, the protein sequence is characterized and digitized into a signal, and then the signal characteristics are obtained by using EMD and fractal dimension. Each protein sequence can be decomposed into Intrinsic Mode Functions (IMFs). The fixed window’s fractal dimension is applied to each IMF and the original signal to extract the protein sequence characteristics. Experiments have shown that the feature extracted by this hybrid method is superior to the EMD method alone.
Collapse
Affiliation(s)
- Lina Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Pu Wei
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Zuqiang Meng
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Patrick Wang
- Computer and Information Science, Northeastern University, Boston, USA
| | - Yuan Yan Tang
- Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC), Beihang University, Beijing, P. R. China
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, P. R. China
| |
Collapse
|
8
|
Duarte-Sanchez JE, Velasco-Medina J, Moreno PA. Hardware Accelerator for the Multifractal Analysis of DNA Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1611-1624. [PMID: 28749355 DOI: 10.1109/tcbb.2017.2731339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The multifractal analysis has allowed to quantify the genetic variability and non-linear stability along the human genome sequence. It has some implications in explaining several genetic diseases given by some chromosome abnormalities, among other genetic particularities. The multifractal analysis of a genome is carried out by dividing the complete DNA sequence in smaller fragments and calculating the generalized dimension spectrum of each fragment using the chaos game representation and the box-counting method. This is a time consuming process because it involves the processing of large data sets using floating-point representation. In order to reduce the computation time, we designed an application-specific processor, here called multifractal processor, which is based on our proposed hardware-oriented algorithm for calculating efficiently the generalized dimension spectrum of DNA sequences. The multifractal processor was implemented on a low-cost SoC-FPGA and was verified by processing a complete human genome. The execution time and numeric results of the Multifractal processor were compared with the results obtained from the software implementation executed in a 20-core workstation, achieving a speed up of 2.6x and an average error of 0.0003 percent.
Collapse
|
9
|
ALUminating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Alu Repeats in the Development of Atherosclerotic Vascular Disease. Int J Mol Sci 2018; 19:ijms19061734. [PMID: 29895733 PMCID: PMC6032270 DOI: 10.3390/ijms19061734] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 06/04/2018] [Accepted: 06/09/2018] [Indexed: 12/12/2022] Open
Abstract
Atherosclerosis (ATH) and coronary artery disease (CAD) are chronic inflammatory diseases with an important genetic background; they derive from the cumulative effect of multiple common risk alleles, most of which are located in genomic noncoding regions. These complex diseases behave as nonlinear dynamical systems that show a high dependence on their initial conditions; thus, long-term predictions of disease progression are unreliable. One likely possibility is that the nonlinear nature of ATH could be dependent on nonlinear correlations in the structure of the human genome. In this review, we show how chaos theory analysis has highlighted genomic regions that have shared specific structural constraints, which could have a role in ATH progression. These regions were shown to be enriched with repetitive sequences of the Alu family, genomic parasites that have colonized the human genome, which show a particular secondary structure and are involved in the regulation of gene expression. Here, we show the impact of Alu elements on the mechanisms that regulate gene expression, especially highlighting the molecular mechanisms via which the Alu elements alter the inflammatory response. We devote special attention to their relationship with the long noncoding RNA (lncRNA); antisense noncoding RNA in the INK4 locus (ANRIL), a risk factor for ATH; their role as microRNA (miRNA) sponges; and their ability to interfere with the regulatory circuitry of the (nuclear factor kappa B) NF-κB response. We aim to characterize ATH as a nonlinear dynamic system, in which small initial alterations in the expression of a number of repetitive elements are somehow amplified to reach phenotypic significance.
Collapse
|
10
|
|
11
|
Xue Y, Bogdan P. Reliable Multi-Fractal Characterization of Weighted Complex Networks: Algorithms and Implications. Sci Rep 2017; 7:7487. [PMID: 28790321 PMCID: PMC5548933 DOI: 10.1038/s41598-017-07209-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 06/23/2017] [Indexed: 11/12/2022] Open
Abstract
Through an elegant geometrical interpretation, the multi-fractal analysis quantifies the spatial and temporal irregularities of the structural and dynamical formation of complex networks. Despite its effectiveness in unweighted networks, the multi-fractal geometry of weighted complex networks, the role of interaction intensity, the influence of the embedding metric spaces and the design of reliable estimation algorithms remain open challenges. To address these challenges, we present a set of reliable multi-fractal estimation algorithms for quantifying the structural complexity and heterogeneity of weighted complex networks. Our methodology uncovers that (i) the weights of complex networks and their underlying metric spaces play a key role in dictating the existence of multi-fractal scaling and (ii) the multi-fractal scaling can be localized in both space and scales. In addition, this multi-fractal characterization framework enables the construction of a scaling-based similarity metric and the identification of community structure of human brain connectome. The detected communities are accurately aligned with the biological brain connectivity patterns. This characterization framework has no constraint on the target network and can thus be leveraged as a basis for both structural and dynamic analysis of networks in a wide spectrum of applications.
Collapse
Affiliation(s)
- Yuankun Xue
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 90007, CA, USA
| | - Paul Bogdan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 90007, CA, USA.
| |
Collapse
|
12
|
Wei YL, Yu ZG, Zou HL, Anh V. Multifractal temporally weighted detrended cross-correlation analysis to quantify power-law cross-correlation and its application to stock markets. CHAOS (WOODBURY, N.Y.) 2017; 27:063111. [PMID: 28679233 DOI: 10.1063/1.4985637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A new method-multifractal temporally weighted detrended cross-correlation analysis (MF-TWXDFA)-is proposed to investigate multifractal cross-correlations in this paper. This new method is based on multifractal temporally weighted detrended fluctuation analysis and multifractal cross-correlation analysis (MFCCA). An innovation of the method is applying geographically weighted regression to estimate local trends in the nonstationary time series. We also take into consideration the sign of the fluctuations in computing the corresponding detrended cross-covariance function. To test the performance of the MF-TWXDFA algorithm, we apply it and the MFCCA method on simulated and actual series. Numerical tests on artificially simulated series demonstrate that our method can accurately detect long-range cross-correlations for two simultaneously recorded series. To further show the utility of MF-TWXDFA, we apply it on time series from stock markets and find that power-law cross-correlation between stock returns is significantly multifractal. A new coefficient, MF-TWXDFA cross-correlation coefficient, is also defined to quantify the levels of cross-correlation between two time series.
Collapse
Affiliation(s)
- Yun-Lan Wei
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Hai-Long Zou
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane Q4001, Australia
| |
Collapse
|
13
|
Yang WF, Yu ZG, Anh V. Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation. Mol Phylogenet Evol 2015; 96:102-111. [PMID: 26724405 DOI: 10.1016/j.ympev.2015.12.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 12/17/2015] [Accepted: 12/18/2015] [Indexed: 01/18/2023]
Abstract
UNLABELLED Traditional methods for sequence comparison and phylogeny reconstruction rely on pair wise and multiple sequence alignments. But alignment could not be directly applied to whole genome/proteome comparison and phylogenomic studies due to their high computational complexity. Hence alignment-free methods became popular in recent years. Here we propose a fast alignment-free method for whole genome/proteome comparison and phylogeny reconstruction using higher order Markov model and chaos game representation. In the present method, we use the transition matrices of higher order Markov models to characterize amino acid or DNA sequences for their comparison. The order of the Markov model is uniquely identified by maximizing the average Shannon entropy of conditional probability distributions. Using one-dimensional chaos game representation and linked list, this method can reduce large memory and time consumption which is due to the large-scale conditional probability distributions. To illustrate the effectiveness of our method, we employ it for fast phylogeny reconstruction based on genome/proteome sequences of two species data sets used in previous published papers. Our results demonstrate that the present method is useful and efficient. AVAILABILITY AND IMPLEMENTATION The source codes for our algorithm to get the distance matrix and genome/proteome sequences can be downloaded from ftp://121.199.20.25/. The software Phylip and EvolView we used to construct phylogenetic trees can be referred from their websites.
Collapse
Affiliation(s)
- Wei-Feng Yang
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; Department of Mathematics and Physics, Hunan Institute of Engineering, Hunan 411104, PR China.
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| |
Collapse
|
14
|
Song YQ, Liu JL, Yu ZG, Li BG. Multifractal analysis of weighted networks by a modified sandbox algorithm. Sci Rep 2015; 5:17628. [PMID: 26634304 PMCID: PMC4669438 DOI: 10.1038/srep17628] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 11/03/2015] [Indexed: 01/13/2023] Open
Abstract
Complex networks have attracted growing attention in many fields. As a generalization of fractal analysis, multifractal analysis (MFA) is a useful way to systematically describe the spatial heterogeneity of both theoretical and experimental fractal patterns. Some algorithms for MFA of unweighted complex networks have been proposed in the past a few years, including the sandbox (SB) algorithm recently employed by our group. In this paper, a modified SB algorithm (we call it SBw algorithm) is proposed for MFA of weighted networks. First, we use the SBw algorithm to study the multifractal property of two families of weighted fractal networks (WFNs): "Sierpinski" WFNs and "Cantor dust" WFNs. We also discuss how the fractal dimension and generalized fractal dimensions change with the edge-weights of the WFN. From the comparison between the theoretical and numerical fractal dimensions of these networks, we can find that the proposed SBw algorithm is efficient and feasible for MFA of weighted networks. Then, we apply the SBw algorithm to study multifractal properties of some real weighted networks - collaboration networks. It is found that the multifractality exists in these weighted networks, and is affected by their edge-weights.
Collapse
Affiliation(s)
- Yu-Qin Song
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
- College of Science, Hunan University of technology, Zhuzhou, Hunan 412007, China
| | - Jin-Long Liu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Q4001, Australia
| | - Bao-Gen Li
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| |
Collapse
|
15
|
Liu JL, Yu ZG, Anh V. Determination of multifractal dimensions of complex networks by means of the sandbox algorithm. CHAOS (WOODBURY, N.Y.) 2015; 25:023103. [PMID: 25725639 DOI: 10.1063/1.4907557] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Complex networks have attracted much attention in diverse areas of science and technology. Multifractal analysis (MFA) is a useful way to systematically describe the spatial heterogeneity of both theoretical and experimental fractal patterns. In this paper, we employ the sandbox (SB) algorithm proposed by Tél et al. (Physica A 159, 155-166 (1989)), for MFA of complex networks. First, we compare the SB algorithm with two existing algorithms of MFA for complex networks: the compact-box-burning algorithm proposed by Furuya and Yakubo (Phys. Rev. E 84, 036118 (2011)), and the improved box-counting algorithm proposed by Li et al. (J. Stat. Mech.: Theor. Exp. 2014, P02020 (2014)) by calculating the mass exponents τ(q) of some deterministic model networks. We make a detailed comparison between the numerical and theoretical results of these model networks. The comparison results show that the SB algorithm is the most effective and feasible algorithm to calculate the mass exponents τ(q) and to explore the multifractal behavior of complex networks. Then, we apply the SB algorithm to study the multifractal property of some classic model networks, such as scale-free networks, small-world networks, and random networks. Our results show that multifractality exists in scale-free networks, that of small-world networks is not obvious, and it almost does not exist in random networks.
Collapse
Affiliation(s)
- Jin-Long Liu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane Q4001, Australia
| |
Collapse
|
16
|
Liu JL, Yu ZG, Anh V. Topological properties and fractal analysis of a recurrence network constructed from fractional Brownian motions. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:032814. [PMID: 24730906 DOI: 10.1103/physreve.89.032814] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2013] [Indexed: 06/03/2023]
Abstract
Many studies have shown that we can gain additional information on time series by investigating their accompanying complex networks. In this work, we investigate the fundamental topological and fractal properties of recurrence networks constructed from fractional Brownian motions (FBMs). First, our results indicate that the constructed recurrence networks have exponential degree distributions; the average degree exponent 〈λ〉 increases first and then decreases with the increase of Hurst index H of the associated FBMs; the relationship between H and 〈λ〉 can be represented by a cubic polynomial function. We next focus on the motif rank distribution of recurrence networks, so that we can better understand networks at the local structure level. We find the interesting superfamily phenomenon, i.e., the recurrence networks with the same motif rank pattern being grouped into two superfamilies. Last, we numerically analyze the fractal and multifractal properties of recurrence networks. We find that the average fractal dimension 〈dB〉 of recurrence networks decreases with the Hurst index H of the associated FBMs, and their dependence approximately satisfies the linear formula 〈dB〉≈2-H, which means that the fractal dimension of the associated recurrence network is close to that of the graph of the FBM. Moreover, our numerical results of multifractal analysis show that the multifractality exists in these recurrence networks, and the multifractality of these networks becomes stronger at first and then weaker when the Hurst index of the associated time series becomes larger from 0.4 to 0.95. In particular, the recurrence network with the Hurst index H=0.5 possesses the strongest multifractality. In addition, the dependence relationships of the average information dimension 〈D(1)〉 and the average correlation dimension 〈D(2)〉 on the Hurst index H can also be fitted well with linear functions. Our results strongly suggest that the recurrence network inherits the basic characteristic and the fractal nature of the associated FBM series.
Collapse
Affiliation(s)
- Jin-Long Liu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China and School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001, Australia
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001, Australia
| |
Collapse
|
17
|
Fast Comparison of Microbial Genomes Using the Chaos Games Representation for Metagenomic Applications. ACTA ACUST UNITED AC 2013. [DOI: 10.1016/j.procs.2013.05.304] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
18
|
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans. Adv Bioinformatics 2012; 2012:287486. [PMID: 22919380 PMCID: PMC3418640 DOI: 10.1155/2012/287486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 05/16/2012] [Accepted: 06/07/2012] [Indexed: 11/18/2022] Open
Abstract
The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences.
Collapse
|
19
|
Pirino D, Rigosa J, Ledda A, Ferretti L. Detecting correlations among functional-sequence motifs. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:066124. [PMID: 23005179 DOI: 10.1103/physreve.85.066124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Indexed: 06/01/2023]
Abstract
Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.
Collapse
|
20
|
Pandit A, Dasanna AK, Sinha S. Multifractal analysis of HIV-1 genomes. Mol Phylogenet Evol 2011; 62:756-63. [PMID: 22155711 DOI: 10.1016/j.ympev.2011.11.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Revised: 10/29/2011] [Accepted: 11/18/2011] [Indexed: 10/14/2022]
Abstract
Pathogens like HIV-1, which evolve into many closely related variants displaying differential infectivity and evolutionary dynamics in a short time scale, require fast and accurate classification. Conventional whole genome sequence alignment-based methods are computationally expensive and involve complex analysis. Alignment-free methodologies are increasingly being used to effectively differentiate genomic variations between viral species. Multifractal analysis, which explores the self-similar nature of genomes, is an alignment-free methodology that has been applied to study such variations. However, whether multifractal analysis can quantify variations between closely related genomes, such as the HIV-1 subtypes, is an open question. Here we address the above by implementing the multifractal analysis on four retroviral genomes (HIV-1, HIV-2, SIVcpz, and HTLV-1), and demonstrate that individual multifractal properties can differentiate between different retrovirus types easily. However, the individual multifractal measures do not resolve within-group variations for different known subtypes of HIV-1 M group. We show here that these known subtypes can instead be classified correctly using a combination of the crucial multifractal measures. This method is simple and computationally fast in comparison to the conventional alignment-based methods for whole genome phylogenetic analysis.
Collapse
Affiliation(s)
- Aridaman Pandit
- Mathematical Modeling and Computational Biology Group, Centre for Cellular and Molecular Biology (CSIR), Hyderabad 500007, India
| | | | | |
Collapse
|
21
|
Moreno PA, Vélez PE, Martínez E, Garreta LE, Díaz N, Amador S, Tischer I, Gutiérrez JM, Naik AK, Tobar F, García F. The human genome: a multifractal analysis. BMC Genomics 2011; 12:506. [PMID: 21999602 PMCID: PMC3277318 DOI: 10.1186/1471-2164-12-506] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 10/14/2011] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode. RESULTS We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed. CONCLUSIONS Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful.
Collapse
Affiliation(s)
- Pedro A Moreno
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Santiago de Cali, Colombia
| | - Patricia E Vélez
- Profesora del Departamento de Biología, FACNED, Universidad del Cauca, Popayán, Colombia
- Escuela de Ciencias Básicas. Facultad de Salud, Universidad del Valle, Santiago de Cali, Colombia
| | - Ember Martínez
- Departamento de Sistemas, Universidad del Cauca, Popayán, Colombia
| | - Luis E Garreta
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Santiago de Cali, Colombia
| | - Néstor Díaz
- Departamento de Sistemas, Universidad del Cauca, Popayán, Colombia
| | - Siler Amador
- Departamento de Sistemas, Universidad del Cauca, Popayán, Colombia
| | - Irene Tischer
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Santiago de Cali, Colombia
| | - José M Gutiérrez
- Instituto de Física de Cantabria, Universidad de Cantabria-CSIC, Santander, España
| | | | - Fabián Tobar
- Escuela de Ciencias Básicas. Facultad de Salud, Universidad del Valle, Santiago de Cali, Colombia
| | - Felipe García
- Escuela de Ciencias Básicas. Facultad de Salud, Universidad del Valle, Santiago de Cali, Colombia
| |
Collapse
|
22
|
Behnia S, Akhshani A, Panahi M, Mobaraki A, Ghaderian M. Multifractal analysis of thermal denaturation based on the Peyrard-Bishop-Dauxois model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:031918. [PMID: 22060414 DOI: 10.1103/physreve.84.031918] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Revised: 08/15/2011] [Indexed: 05/31/2023]
Abstract
The theory of DNA dynamics is exceedingly complex and not easily explained. In the past two decades, by adapting methods of statistical physics, the dynamics of DNA in contact with a thermal bath is widely studied. In this paper, the thermal denaturation of DNA in the framework of the Peyrard-Bishop-Dauxois (PBD) model through the Rényi dimension is investigated. As a result, the Rényi dimension spectrum of the melting transition process reveals the multifractal nature of the dynamics of the Peyrard-Bishop-Dauxois model. Also, it can be concluded that the Rényi dimension (D(q)) at negative values of q is the characteristic signature of pre-melting and thermal denaturation of DNA. Furthermore, this approach is in excellent agreement with previous experimental studies.
Collapse
Affiliation(s)
- S Behnia
- Department of Physics, Urmia University of Technology, Orumieh, Iran.
| | | | | | | | | |
Collapse
|
23
|
Provata A, Beck C. Multifractal analysis of nonhyperbolic coupled map lattices: application to genomic sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:066210. [PMID: 21797464 DOI: 10.1103/physreve.83.066210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Indexed: 05/31/2023]
Abstract
Symbolic sequences generated by coupled map lattices (CMLs) can be used to model the chaotic-like structure of genomic sequences. In this study it is shown that diffusively coupled Chebyshev maps of order 4 (corresponding to a shift of four symbols) very closely reproduce the multifractal spectrum D(q) of human genomic sequences for coupling constant α = 0.35 ± 0.01 if q > 0. The presence of rare configurations causes deviations for q < 0, which disappear if the rare event statistics of the CML is modified. Such rare configurations are known to play specific functional roles in genomic sequences serving as promoters or regulatory elements.
Collapse
Affiliation(s)
- A Provata
- Institute of Physical Chemistry, National Center for Scientific Research Demokritos, GR-15310 Athens, Greece
| | | |
Collapse
|
24
|
Yu ZG, Anh V, Wang Y, Mao D, Wanliss J. Modeling and simulation of the horizontal component of the geomagnetic field by fractional stochastic differential equations in conjunction with empirical mode decomposition. ACTA ACUST UNITED AC 2010. [DOI: 10.1029/2009ja015206] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Zu-Guo Yu
- Discipline of Mathematical Science, Faculty of Science and Technology; Queensland University of Technology; Brisbane Australia
| | - Vo Anh
- Discipline of Mathematical Science, Faculty of Science and Technology; Queensland University of Technology; Brisbane Australia
| | - Yang Wang
- Department of Mathematics; Michigan State University; East Lansing Michigan USA
| | - Dong Mao
- Department of Mathematics; Michigan State University; East Lansing Michigan USA
| | | |
Collapse
|
25
|
Vélez PE, Garreta LE, Martínez E, Díaz N, Amador S, Tischer I, Gutiérrez JM, Moreno PA. The Caenorhabditis elegans genome: a multifractal analysis. GENETICS AND MOLECULAR RESEARCH 2010; 9:949-65. [PMID: 20506082 DOI: 10.4238/vol9-2gmr756] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
The Caenorhabditis elegans genome has several regular and irregular characteristics in its nucleotide composition; these are observed within and between chromosomes. To study these particularities, we carried out a multifractal analysis, which requires a large number of exponents to characterize scaling properties. We looked for a relationship between the genetic information content of the chromosomes and multifractal parameters and found less multifractality compared to the human genome. Differences in multifractality among chromosomes and in regions of chromosomes, and two group averages of chromosome regions were observed. All these differences were mainly dependent on differences in the contents of repetitive DNA. Based on these properties, we propose a nonlinear model for the structure of the C. elegans genome, with some biological implications. These results suggest that examining differences in multifractality is a viable approach for measuring local variations of genomic information contents along chromosomes. This approach could be extended to other genomes in order to characterize structural and functional regions of chromosomes.
Collapse
Affiliation(s)
- P E Vélez
- Departamento de Biología, Universidad del Cauca, Popayán, Colombia
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Kong SG, Fan WL, Chen HD, Wigger J, Torda AE, Lee HC. Quantitative measure of randomness and order for complete genomes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:061911. [PMID: 19658528 DOI: 10.1103/physreve.79.061911] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Revised: 04/14/2009] [Indexed: 05/28/2023]
Abstract
We propose an order index, phi, which gives a quantitative measure of randomness and order of complete genomic sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length. The 786 complete genomic sequences in GenBank were found to have phi values in a very narrow range, phig=0.031(-0.015)+0.028. We show this implies that genomes are halfway toward being completely random, or, at the "edge of chaos." We further show that artificial "genomes" converted from literary classics have phi 's that almost exactly coincide with phig, but sequences of low information content do not. We infer that phig represents a high information-capacity "fixed point" in sequence space, and that genomes are driven to it by the dynamics of a robust growth and evolution process. We show that a growth process characterized by random segmental duplication can robustly drive genomes to the fixed point.
Collapse
Affiliation(s)
- Sing-Guan Kong
- Department of Physics, Graduate Institute of Biophysics, National Central University, Chungli, Taiwan 32001, Republic of China
| | | | | | | | | | | |
Collapse
|
27
|
Yu ZG, Anh V, Eastes R. Multifractal analysis of geomagnetic storm and solar flare indices and their class dependence. ACTA ACUST UNITED AC 2009. [DOI: 10.1029/2008ja013854] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Zu-Guo Yu
- School of Mathematical Sciences; Queensland University of Technology; Brisbane Queensland Australia
- School of Mathematics and Computational Science; Xiangtan University; Hunan China
| | - Vo Anh
- School of Mathematical Sciences; Queensland University of Technology; Brisbane Queensland Australia
- Florida Space Institute; University of Central Florida; Orlando Florida USA
| | - Richard Eastes
- Florida Space Institute; University of Central Florida; Orlando Florida USA
| |
Collapse
|
28
|
Paar V, Pavin N, Basar I, Rosandić M, Gluncić M, Paar N. Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinformatics 2008; 9:466. [PMID: 18980673 PMCID: PMC2661002 DOI: 10.1186/1471-2105-9-466] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 11/03/2008] [Indexed: 11/28/2022] Open
Abstract
Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats. Results We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor n for nmer) and higher harmonics. In general, nmer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/fβ – noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations. Conclusion DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of nmer HOR, i.e., the number n of monomers contained in consensus HOR.
Collapse
Affiliation(s)
- Vladimir Paar
- Faculty of Science, University of Zagreb, Bijenicka 32, Zagreb, Croatia.
| | | | | | | | | | | |
Collapse
|
29
|
|
30
|
Yang JY, Zhou Y, Yu ZG, Anh V, Zhou LQ. Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides. BMC Bioinformatics 2008; 9:113. [PMID: 18294399 PMCID: PMC2292139 DOI: 10.1186/1471-2105-9-113] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 02/24/2008] [Indexed: 01/29/2023] Open
Abstract
Background Promoter region plays an important role in determining where the transcription of a particular gene should be initiated. Computational prediction of eukaryotic Pol II promoter sequences is one of the most significant problems in sequence analysis. Existing promoter prediction methods are still far from being satisfactory. Results We attempt to recognize the human Pol II promoter sequences from the non-promoter sequences which are made up of exon and intron sequences. Four methods are used: two kinds of multifractal analysis performed on the numeric sequences obtained from the dinucleotide free energy, Z curve analysis and global descriptor of the promoter/non-promoter primary sequences. A total of 141 parameters are extracted from these methods and categorized into seven groups (methods). They are used to generate certain spaces and then each promoter/non-promoter sequence is represented by a point in the corresponding space. All the 120 possible combinations of the seven methods are tested. Based on Fisher's linear discriminant algorithm, with a relatively smaller number of parameters (96 and 117), we get satisfactory discriminant accuracies. Particularly, in the case of 117 parameters, the accuracies for the training and test sets reach 90.43% and 89.79%, respectively. A comparison with five other existing methods indicates that our methods have a better performance. Using the global descriptor method (36 parameters), 17 of the 18 experimentally verified promoter sequences of human chromosome 22 are correctly identified. Conclusion The high accuracies achieved suggest that the methods of this paper are useful for understanding the difficult problem of promoter prediction.
Collapse
Affiliation(s)
- Jian-Yi Yang
- School of Mathematics and Computational Science, Xiangtan University, Hunan 411105, China.
| | | | | | | | | |
Collapse
|
31
|
Deschavanne P, Tufféry P. Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 2007; 90:615-25. [PMID: 18067866 DOI: 10.1016/j.biochi.2007.11.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 11/09/2007] [Indexed: 11/25/2022]
Abstract
Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications.
Collapse
Affiliation(s)
- P Deschavanne
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM UMR-S 726, Université Paris 7, 75251 Paris Cedex 05, France
| | | |
Collapse
|
32
|
Analysis of long-range correlation in sequences data of proteins. JOURNAL OF THE SERBIAN CHEMICAL SOCIETY 2007. [DOI: 10.2298/jsc0704383i] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The results presented here suggest the existence of correlations in the sequence data of proteins. 32 proteins, both globular and fibrous, both monomeric and polymeric, were analyzed. The primary structures of these proteins were treated as time series. Three spatial series of data for each sequence of a protein were generated from numerical correspondences between each amino acid and a physical property associated with it, i.e., its electric charge, its polar character and its dipole moment. For each series, the spectral coefficient, the scaling exponent and the Hurst coefficient were determined. The values obtained for these coefficients revealed non-randomness in the series of data. .
Collapse
|
33
|
Xu M, Tan W. Intermediate processes and critical phenomena: Theory, method and progress of fractional operators and their applications to modern mechanics. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/s11433-006-0257-2] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Kriksin YA, Khalatur PG, Khokhlov AR. Recognition of complex patterned substrates by heteropolymer chains consisting of multiple monomer types. J Chem Phys 2006; 124:174904. [PMID: 16689601 DOI: 10.1063/1.2191849] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We propose a statistical mechanical model of surface pattern recognition by heteropolymers with quenched monomer sequence distribution. The chemically heterogeneous pattern consists of different adsorption sites specifically distributed on a surface. The heteropolymer sequence is complementary with respect to the pattern. The concepts of recognition probability and recognition temperature are introduced. The algorithm for calculating the recognition probability is based on efficient recurrence procedures for evaluating the single-chain partition function of a chain macromolecule consisting of multiple monomer types, which interact with multiple types of adsorption sites. The temperature dependencies of the recognition probability are discussed. We address the critical role of the commensurability between the heteropolymer sequence and the distribution of the surface adsorbing sites on the polymer adsorption. Also, we address the question of how many types of monomer units in the heteropolymer are required for unambiguous recognition of compact target patterns. It is shown that perfect pattern recognition can be achieved for the strong-adsorption regime in the case of specifically structured compact patterns with multifunctional adsorption sites and heteropolymers with multiple monomer types when the degeneracy of the ground state is suppressed. The pattern recognition ability increases with the number of different types of monomer units and complementary adsorption sites. For random heteropolymers and patterns, the free energy change associated with the recognition process decreases linearly with increasing this number. Correlated random heteropolymers are capable of recognizing related patterns on a random background.
Collapse
Affiliation(s)
- Yuri A Kriksin
- Institute for Mathematical Modeling, Russian Academy of Sciences, Moscow 125047, Russia
| | | | | |
Collapse
|
35
|
Yu ZG, Anh VV, Lau KS, Zhou LQ. Clustering of protein structures using hydrophobic free energy and solvent accessibility of proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 73:031920. [PMID: 16605571 DOI: 10.1103/physreve.73.031920] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Revised: 01/18/2006] [Indexed: 05/08/2023]
Abstract
The hydrophobic free energy and solvent accessibility of amino acids are used to study the relationship between the primary structure and structural classification of large proteins. A measure representation and a Z curve representation of protein sequences are proposed. Fractal analysis of the measure and Z curve representations of proteins and multifractal analysis of their hydrophobic free energy and solvent accessibility sequences indicate that the protein sequences possess correlations and multifractal scaling. The parameters from the fractal and multifractal analyses on these sequences are used to construct some parameter spaces. Each protein is represented by a point in these spaces. A method is proposed to distinguish and cluster proteins from the alpha, beta, alpha + beta, and alpha/beta structural classes in these parameter spaces. Fisher's linear discriminant algorithm is used to give a quantitative assessment of our clustering on the selected proteins. Numerical results indicate that the discriminant accuracies are satisfactory. In particular, they reach 94.12% and 88.89% in separating proteins from {alpha, alpha + beta, alpha/beta} proteins in a three-dimensional space.
Collapse
Affiliation(s)
- Z G Yu
- Program in Statistics and Operations Research, Queensland University of Technology, GPO Box 2434, Brisbane, Queensland 4001, Australia.
| | | | | | | |
Collapse
|
36
|
Chapus C, Dufraigne C, Edwards S, Giron A, Fertil B, Deschavanne P. Exploration of phylogenetic data using a global sequence analysis method. BMC Evol Biol 2005; 5:63. [PMID: 16280081 PMCID: PMC1310607 DOI: 10.1186/1471-2148-5-63] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2005] [Accepted: 11/09/2005] [Indexed: 11/13/2022] Open
Abstract
Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis.
Collapse
Affiliation(s)
- Charles Chapus
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U 726, Case 7113, Tour 53-54, 2 place Jussieu, 75005 Paris, France
- Current address: Dept. of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138 USA
| | | | - Scott Edwards
- Dept. of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138 USA
| | - Alain Giron
- Inserm U494, 91 bd de l'Hopital 75634 Paris CEDEX 13, France
| | - Bernard Fertil
- Inserm U494, 91 bd de l'Hopital 75634 Paris CEDEX 13, France
| | - Patrick Deschavanne
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U 726, Case 7113, Tour 53-54, 2 place Jussieu, 75005 Paris, France
| |
Collapse
|
37
|
Wanliss JA, Anh VV, Yu ZG, Watson S. Multifractal modeling of magnetic storms via symbolic dynamics analysis. ACTA ACUST UNITED AC 2005. [DOI: 10.1029/2004ja010996] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- J. A. Wanliss
- Department of Physical Sciences; Embry-Riddle Aeronautical University; Daytona Beach Florida USA
| | - V. V. Anh
- School of Mathematical Sciences; Queensland University of Technology; Brisbane, Queensland Australia
| | - Z.-G. Yu
- School of Mathematical Sciences; Queensland University of Technology; Brisbane, Queensland Australia
| | - S. Watson
- Florida Space Institute; Kennedy Space Center Florida USA
| |
Collapse
|
38
|
Kulkarni OC, Vigneshwar R, Jayaraman VK, Kulkarni BD. Identification of coding and non-coding sequences using local Holder exponent formalism. Bioinformatics 2005; 21:3818-23. [PMID: 16118261 DOI: 10.1093/bioinformatics/bti639] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. The discovery of existence of distinct scaling relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to exploit the differences in the local singularity distributions for characterization and classification of coding and non-coding sequences. RESULTS The local singularity density distribution in the coding and non-coding sequences of four genomes was first estimated using the wavelet transform modulus maxima methodology. Support vector machines classifier was then trained with the extracted features. The trained classifier is able to provide an average test accuracy of 97.7%. The local singularity features in a DNA sequence can be exploited for successful identification of coding and non-coding sequences. CONTACT Available on request from bd.kulkarni@ncl.res.in.
Collapse
|
39
|
Zhou LQ, Yu ZG, Deng JQ, Anh V, Long SC. A fractal method to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation. J Theor Biol 2005; 232:559-67. [PMID: 15588636 DOI: 10.1016/j.jtbi.2004.09.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2004] [Revised: 09/09/2004] [Accepted: 09/09/2004] [Indexed: 10/26/2022]
Abstract
A fractal method to distinguish coding and non-coding sequences in a complete genome is proposed, based on different statistical behaviors between these two kinds of sequences. We first propose a number sequence representation of DNA sequences. Multifractal analysis is then performed on the measure representation of the obtained number sequence. The three exponents C(-1), C1 and C2 are selected from the result of multifractal analysis. Each DNA may be represented by a point in the three-dimensional space generated by these three-component vectors. It is shown that points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes are roughly distributed in different regions. Fisher's discriminant algorithm can be used to separate these two regions in the spanned space. If the point (C(-1),C1,C2) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. For all 51 prokaryotes we considered , the average discriminant accuracies pc,pnc,qc and qnc reach 72.28%, 84.65%, 72.53% and 84.18%, respectively.
Collapse
Affiliation(s)
- Li-Qian Zhou
- School of Mathematics and Computing Science, Xiangtan University,Hunan 411105, China
| | | | | | | | | |
Collapse
|
40
|
Yu ZG, Anh V, Lau KS. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. J Theor Biol 2004; 226:341-8. [PMID: 14643648 DOI: 10.1016/j.jtbi.2003.09.009] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Similar to the chaos game representation (CGR) of DNA sequences proposed by Jeffrey (Nucleic Acid Res. 18 (1990) 2163), a new CGR of protein sequences based on the detailed HP model is proposed. Multifractal and correlation analyses of the measures based on the CGR of protein sequences from complete genomes are performed. The Dq spectra of all organisms studied are multifractal-like and sufficiently smooth for the Cq curves to be meaningful. The Cq curves of bacteria resemble a classical phase transition at a critical point. The correlation distance of the difference between the measure based on the CGR of protein sequences and its fractal background is also proposed to construct a more precise phylogenetic tree of bacteria.
Collapse
Affiliation(s)
- Zu-Guo Yu
- Program in Statistics and Operations Research, Queensland University of Technology, G.P.O. Box 2434, QLD 4001, Brisbane, Australia
| | | | | |
Collapse
|
41
|
Yu ZG, Anh V, Lau KS. Multifractal and correlation analyses of protein sequences from complete genomes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 68:021913. [PMID: 14525012 DOI: 10.1103/physreve.68.021913] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2003] [Indexed: 05/24/2023]
Abstract
A measure representation of protein sequences similar to the measure representation of DNA sequences proposed in our previous paper [Yu et al., Phys. Rev. E 64, 031903 (2001)] and another induced measure are introduced. Multifractal analysis is then performed on these two kinds of measures of a large number of protein sequences derived from corresponding complete genomes. From the values of the D(q) (generalized dimensions) spectra and related C(q) (analogous specific heat) curves, it is concluded that these protein sequences are not completely random sequences. For substrings with length K=5, the D(q) spectra of all organisms studied are multifractal-like and sufficiently smooth for the C(q) curves to be meaningful. The C(q) curves of all bacteria resemble a classical phase transition at a critical point. But the "analogous" phase transitions of higher organisms studied exhibit the shape of double-peaked specific heat function. But for the classification problem, the multifractal property is not sufficient. When the measure representations of protein sequences from complete genomes are considered as time series, a method based on correlation analysis after removing some memory from the time series is proposed to construct a phylogenetic tree. This construction is shown to be reasonably satisfactory.
Collapse
Affiliation(s)
- Zu-Guo Yu
- Program in Statistics and Operations Research, Queensland University of Technology, GPO Box 2434, Brisbane Q4001, Australia.
| | | | | |
Collapse
|
42
|
Abstract
Here we present a study of statistical correlations among different positions in DNA sequences and their implications by directly using the autocorrelation function. Such an analysis is possible now because of the availability of large sequences or even complete genomes of many organisms. After describing the way in which the autocorrelation function can be applied to DNA-sequence analysis, we show that long-range correlations, implying scale independence, appear in several bacterial genomes as well as in long human chromosome contigs. The source for such correlations in bacteria, which may extend up to 60 kb in Bacillus subtilis, may be related to massive lateral transfer of compositionally biased genes from other genomes. In the human genome, correlations extend for more than five decades and may be related to the evolution of the 'neogenome', a modern evolutionary acquisition composed by GC-rich isochores displaying long-range correlations and scale invariance.
Collapse
Affiliation(s)
- P Bernaola-Galván
- Departamento de Física Aplicada II, E.T.S.I. de Telecomunicación, Universidad de Málaga, Málaga, Spain.
| | | | | | | |
Collapse
|
43
|
Anh VV, Lau KS, Yu ZG. Recognition of an organism from fragments of its complete genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 66:031910. [PMID: 12366155 DOI: 10.1103/physreve.66.031910] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2002] [Revised: 06/13/2002] [Indexed: 05/23/2023]
Abstract
This paper considers the problem of matching a fragment to an organism using its complete genome. Our method is based on the probability measure representation of a genome. We first demonstrate that these probability measures can be modeled as recurrent iterated function systems (RIFS) consisting of four contractive similarities. Our hypothesis is that the multifractal characteristics of the probability measure of a complete genome, as captured by the RIFS, is preserved in its reasonably long fragments. We compute the RIFS of fragments of various lengths and random starting points, and compare with that of the original sequence for recognition using the Euclidean distance. A demonstration on five randomly selected organisms supports the above hypothesis.
Collapse
Affiliation(s)
- V V Anh
- Centre in Statistical Science and Industrial Mathematics, Queensland University of Technology, P. O. Box 2434, Brisbane Q4001, Australia.
| | | | | |
Collapse
|