Frid Y, Gusfield D. An improved Four-Russians method and sparsified Four-Russians algorithm for RNA folding.
Algorithms Mol Biol 2016;
11:22. [PMID:
27499801 PMCID:
PMC4974819 DOI:
10.1186/s13015-016-0081-9]
[Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 06/28/2016] [Indexed: 12/02/2022] Open
Abstract
Background
The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(n^3)$$\end{document}O(n3)-time dynamic programming method. Recently three methodologies—Valiant, Four-Russians, and Sparsification—have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$0 \le L \le n / 2$$\end{document}0≤L≤n/2 and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n \le Z \le n^2 / 2$$\end{document}n≤Z≤n2/2, and the method reduces the algorithmic running time to O(LZ). While the Four-Russians method utilizes tabling partial results.
Results
In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding Four-Russians \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Theta \left(\frac{n^3}{\log ^2 n}\right)$$\end{document}Θn3log2n-time algorithm, to utilize an on-demand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest on-demand Four-Russians methods. This combined method has worst-case running time of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(\tilde{L}\tilde{Z})$$\end{document}O(L~Z~), where \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\frac{{L}}{\log n} \le \tilde{L}\le min\left({L},\frac{n}{\log n}\right)$$\end{document}Llogn≤L~≤minL,nlogn and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\frac{{Z}}{\log n}\le \tilde{Z} \le min\left({Z},\frac{n^2}{\log n}\right)$$\end{document}Zlogn≤Z~≤minZ,n2logn. Third we update the Four-Russians formulation to achieve an on-demand \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O( n^2/ \log ^2n )$$\end{document}O(n2/log2n)-time parallel algorithm. This then leads to an asymptotic speedup of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(\tilde{L}\tilde{Z_j})$$\end{document}O(L~Zj~) where \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min\left({Z_j},\frac{n}{\log n}\right)$$\end{document}Zjlogn≤Zj~≤minZj,nlogn and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$Z_j$$\end{document}Zj the number of subsequence with the endpoint j belonging to the optimal folding set.
Conclusions
The on-demand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the base-pair maximization variant and a more biologically informative scoring scheme, we show that this Sparse Four-Russians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
Collapse