1
|
Verma S, Paliwal S. Recent Developments and Applications of Biocatalytic and Chemoenzymatic Synthesis for the Generation of Diverse Classes of Drugs. Curr Pharm Biotechnol 2024; 25:448-467. [PMID: 37885105 DOI: 10.2174/0113892010238984231019085154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 08/26/2023] [Accepted: 09/19/2023] [Indexed: 10/28/2023]
Abstract
Biocatalytic and chemoenzymatic biosynthesis are powerful methods of organic chemistry that use enzymes to execute selective reactions and allow the efficient production of organic compounds. The advantages of these approaches include high selectivity, mild reaction conditions, and the ability to work with complex substrates. The utilization of chemoenzymatic techniques for the synthesis of complicated compounds has lately increased dramatically in the area of organic chemistry. Biocatalytic technologies and modern synthetic methods are utilized synergistically in a multi-step approach to a target molecule under this paradigm. Chemoenzymatic techniques are promising for simplifying access to essential bioactive compounds because of the remarkable regio- and stereoselectivity of enzymatic transformations and the reaction diversity of modern organic chemistry. Enzyme kits may include ready-to-use, reproducible biocatalysts. Its use opens up new avenues for the synthesis of active therapeutic compounds and aids in drug development by synthesizing active components to construct scaffolds in a targeted and preparative manner. This study summarizes current breakthroughs as well as notable instances of biocatalytic and chemoenzymatic synthesis. To assist organic chemists in the use of enzymes for synthetic applications, it also provides some basic guidelines for selecting the most appropriate enzyme for a targeted reaction while keeping aspects like cofactor requirement, solvent tolerance, use of whole cell or isolated enzymes, and commercial availability in mind.
Collapse
Affiliation(s)
- Swati Verma
- Department of Pharmacy, ITS College of Pharmacy, Muradnagar, Ghaziabad, India
- Department of Pharmacy, Banasthali Vidyapith, Banasthali, 304022, Rajasthan, India
| | - Sarvesh Paliwal
- Department of Pharmacy, Banasthali Vidyapith, Banasthali, 304022, Rajasthan, India
| |
Collapse
|
2
|
Lazebnik T, Simon-Keren L. Cancer-inspired genomics mapper model for the generation of synthetic DNA sequences with desired genomics signatures. Comput Biol Med 2023; 164:107221. [PMID: 37478715 DOI: 10.1016/j.compbiomed.2023.107221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 07/23/2023]
Abstract
Genome data are crucial in modern medicine, offering significant potential for diagnosis and treatment. Thanks to technological advancements, many millions of healthy and diseased genomes have already been sequenced; however, obtaining the most suitable data for a specific study, and specifically for validation studies, remains challenging with respect to scale and access. Therefore, in silico genomics sequence generators have been proposed as a possible solution. However, the current generators produce inferior data using mostly shallow (stochastic) connections, detected with limited computational complexity in the training data. This means they do not take the appropriate biological relations and constraints, that originally caused the observed connections, into consideration. To address this issue, we propose cancer-inspired genomics mapper model (CGMM), that combines genetic algorithm (GA) and deep learning (DL) methods to tackle this challenge. CGMM mimics processes that generate genetic variations and mutations to transform readily available control genomes into genomes with the desired phenotypes. We demonstrate that CGMM can generate synthetic genomes of selected phenotypes such as ancestry and cancer that are indistinguishable from real genomes of such phenotypes, based on unsupervised clustering. Our results show that CGMM outperforms four current state-of-the-art genomics generators on two different tasks, suggesting that CGMM will be suitable for a wide range of purposes in genomic medicine, especially for much-needed validation studies.
Collapse
Affiliation(s)
- Teddy Lazebnik
- Department of Cancer Biology, Cancer Institute, University College London, London, UK.
| | | |
Collapse
|
3
|
Ma L, Shao Z, Li L, Huang J, Wang S, Lin Q, Li J, Gong M, Nandi AK. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
4
|
Perez-Rodriguez J, de Haro-Garcia A, Garcia-Pedrajas N. Floating Search Methodology for Combining Classification Models for Site Recognition in DNA Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2471-2482. [PMID: 32078558 DOI: 10.1109/tcbb.2020.2974221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recognition of the functional sites of genes, such as translation initiation sites, donor and acceptor splice sites and stop codons, is a relevant part of many current problems in bioinformatics. The best approaches use sophisticated classifiers, such as support vector machines. However, with the rapid accumulation of sequence data, methods for combining many sources of evidence are necessary as it is unlikely that a single classifier can solve this problem with the best possible performance. A major issue is that the number of possible models to combine is large and the use of all of these models is impractical. In this paper we present a methodology for combining many sources of information to recognize any functional site using "floating search", a powerful heuristics applicable when the cost of evaluating each solution is high. We present experiments on four functional sites in the human genome, which is used as the target genome, and use another 20 species as sources of evidence. The proposed methodology shows significant improvement over state-of-the-art methods. The results show an advantage of the proposed method and also challenge the standard assumption of using only genomes not very close and not very far from the human to improve the recognition of functional sites.
Collapse
|
5
|
Pyser J, Chakrabarty S, Romero EO, Narayan ARH. State-of-the-Art Biocatalysis. ACS CENTRAL SCIENCE 2021; 7:1105-1116. [PMID: 34345663 PMCID: PMC8323117 DOI: 10.1021/acscentsci.1c00273] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Indexed: 05/03/2023]
Abstract
The use of enzyme-mediated reactions has transcended ancient food production to the laboratory synthesis of complex molecules. This evolution has been accelerated by developments in sequencing and DNA synthesis technology, bioinformatic and protein engineering tools, and the increasingly interdisciplinary nature of scientific research. Biocatalysis has become an indispensable tool applied in academic and industrial spheres, enabling synthetic strategies that leverage the exquisite selectivity of enzymes to access target molecules. In this Outlook, we outline the technological advances that have led to the field's current state. Integration of biocatalysis into mainstream synthetic chemistry hinges on increased access to well-characterized enzymes and the permeation of biocatalysis into retrosynthetic logic. Ultimately, we anticipate that biocatalysis is poised to enable the synthesis of increasingly complex molecules at new levels of efficiency and throughput.
Collapse
Affiliation(s)
- Joshua
B. Pyser
- Department
of Chemistry, Life Sciences Institute, and Program in Chemical Biology, University of Michigan, , 210 Washtenaw Avenue, Ann Arbor, Michigan 48109, United
States
| | - Suman Chakrabarty
- Department
of Chemistry, Life Sciences Institute, and Program in Chemical Biology, University of Michigan, , 210 Washtenaw Avenue, Ann Arbor, Michigan 48109, United
States
| | - Evan O. Romero
- Department
of Chemistry, Life Sciences Institute, and Program in Chemical Biology, University of Michigan, , 210 Washtenaw Avenue, Ann Arbor, Michigan 48109, United
States
| | - Alison R. H. Narayan
- Department
of Chemistry, Life Sciences Institute, and Program in Chemical Biology, University of Michigan, , 210 Washtenaw Avenue, Ann Arbor, Michigan 48109, United
States
| |
Collapse
|
6
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
7
|
Khitmoh N, Smanchat S, Tongsima S. Stretch Profile: A pruning technique to accelerate DNA sequence search. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
8
|
Eren K, Murrell B. RIFRAF: a frame-resolving consensus algorithm. Bioinformatics 2019; 34:3817-3824. [PMID: 29850783 DOI: 10.1093/bioinformatics/bty426] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 05/22/2018] [Indexed: 01/08/2023] Open
Abstract
Motivation Protein coding genes can be studied using long-read next generation sequencing. However, high rates of indel sequencing errors are problematic, corrupting the reading frame. Even the consensus of multiple independent sequence reads retains indel errors. To solve this problem, we introduce Reference-Informed Frame-Resolving multiple-Alignment Free template inference algorithm (RIFRAF), a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF uses a novel structure, analogous to a two-layer hidden Markov model: the consensus is optimized to maximize alignment scores with both the set of noisy reads and with a reference. The template-to-reads component of the model encodes the preponderance of indels, and is sensitive to the per-base quality scores, giving greater weight to more accurate bases. The reference-to-template component of the model penalizes frame-destroying indels. A local search algorithm proceeds in stages to find the best consensus sequence for both objectives. Results Using Pacific Biosciences SMRT sequences from an HIV-1 env clone, NL4-3, we compare our approach to other consensus and frame correction methods. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. It was able to perfectly reconstruct over 80% of consensus sequences from as few as three reads, whereas the best alternative required twice as many. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones. Availability and implementation RIFRAF is implemented in Julia, and source code is publicly available at https://github.com/MurrellGroup/Rifraf.jl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kemal Eren
- Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Ben Murrell
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
9
|
Al-Sahaf H, Bi Y, Chen Q, Lensen A, Mei Y, Sun Y, Tran B, Xue B, Zhang M. A survey on evolutionary machine learning. J R Soc N Z 2019. [DOI: 10.1080/03036758.2019.1609052] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Harith Al-Sahaf
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Ying Bi
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Qi Chen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Andrew Lensen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Yi Mei
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Yanan Sun
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Binh Tran
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
10
|
Computational Methods for the Discovery of Metabolic Markers of Complex Traits. Metabolites 2019; 9:metabo9040066. [PMID: 30987289 PMCID: PMC6523328 DOI: 10.3390/metabo9040066] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 03/19/2019] [Accepted: 04/01/2019] [Indexed: 12/21/2022] Open
Abstract
Metabolomics uses quantitative analyses of metabolites from tissues or bodily fluids to acquire a functional readout of the physiological state. Complex diseases arise from the influence of multiple factors, such as genetics, environment and lifestyle. Since genes, RNAs and proteins converge onto the terminal downstream metabolome, metabolomics datasets offer a rich source of information in a complex and convoluted presentation. Thus, powerful computational methods capable of deciphering the effects of many upstream influences have become increasingly necessary. In this review, the workflow of metabolic marker discovery is outlined from metabolite extraction to model interpretation and validation. Additionally, current metabolomics research in various complex disease areas is examined to identify gaps and trends in the use of several statistical and computational algorithms. Then, we highlight and discuss three advanced machine-learning algorithms, specifically ensemble learning, artificial neural networks, and genetic programming, that are currently less visible, but are budding with high potential for utility in metabolomics research. With an upward trend in the use of highly-accurate, multivariate models in the metabolomics literature, diagnostic biomarker panels of complex diseases are more recently achieving accuracies approaching or exceeding traditional diagnostic procedures. This review aims to provide an overview of computational methods in metabolomics and promote the use of up-to-date machine-learning and computational methods by metabolomics researchers.
Collapse
|
11
|
Paul A, Sil J. Optimized time-lag differential method for constructing gene regulatory network. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.11.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
12
|
Evolutionary algorithms for species distribution modelling: A review in the context of machine learning. Ecol Modell 2019. [DOI: 10.1016/j.ecolmodel.2018.11.013] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Rhee JK, Kim SJ, Zhang BT. Identifying DNA Methylation Modules Associated with a Cancer by Probabilistic Evolutionary Learning. IEEE COMPUT INTELL M 2018. [DOI: 10.1109/mci.2018.2840659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
14
|
Gerard MF, Stegmayer G, Milone DH. Evolutionary algorithm for metabolic pathways synthesis. Biosystems 2016; 144:55-67. [PMID: 27080162 DOI: 10.1016/j.biosystems.2016.04.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Revised: 03/01/2016] [Accepted: 04/01/2016] [Indexed: 11/19/2022]
Abstract
Metabolic pathway building is an active field of research, necessary to understand and manipulate the metabolism of organisms. There are different approaches, mainly based on classical search methods, to find linear sequences of reactions linking two compounds. However, an important limitation of these methods is the exponential increase of search trees when a large number of compounds and reactions is considered. Besides, such models do not take into account all substrates for each reaction during the search, leading to solutions that lack biological feasibility in many cases. This work proposes a new evolutionary algorithm that allows searching not only linear, but also branched metabolic pathways, formed by feasible reactions that relate multiple compounds simultaneously. Tests performed using several sets of reactions show that this algorithm is able to find feasible linear and branched metabolic pathways.
Collapse
Affiliation(s)
- Matias F Gerard
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL/CONICET, Argentina.
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL/CONICET, Argentina.
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL/CONICET, Argentina.
| |
Collapse
|
15
|
Pérez-Rodríguez J, García-Pedrajas N. Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences. BMC Bioinformatics 2016; 17:117. [PMID: 26945666 PMCID: PMC4779560 DOI: 10.1186/s12859-016-0968-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 02/22/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recognizing the different functional parts of genes, such as promoters, translation initiation sites, donors, acceptors and stop codons, is a fundamental task of many current studies in Bioinformatics. Currently, the most successful methods use powerful classifiers, such as support vector machines with various string kernels. However, with the rapid evolution of our ability to collect genomic information, it has been shown that combining many sources of evidence is fundamental to the success of any recognition task. With the advent of next-generation sequencing, the number of available genomes is increasing very rapidly. Thus, methods for making use of such large amounts of information are needed. RESULTS In this paper, we present a methodology for combining tens or even hundreds of different classifiers for an improved performance. Our approach can include almost a limitless number of sources of evidence. We can use the evidence for the prediction of sites in a certain species, such as human, or other species as needed. This approach can be used for any of the functional recognition tasks cited above. However, to provide the necessary focus, we have tested our approach in two functional recognition tasks: translation initiation site and stop codon recognition. We have used the entire human genome as a target and another 20 species as sources of evidence and tested our method on five different human chromosomes. The proposed method achieves better accuracy than the best state-of-the-art method both in terms of the geometric mean of the specificity and sensitivity and the area under the receiver operating characteristic and precision recall curves. Furthermore, our approach shows a more principled way for selecting the best genomes to be combined for a given recognition task. CONCLUSIONS Our approach has proven to be a powerful tool for improving the performance of functional site recognition, and it is a useful method for combining many sources of evidence for any recognition task in Bioinformatics. The results also show that the common approach of heuristically choosing the species to be used as source of evidence can be improved because the best combinations of genomes for recognition were those not usually selected. Although the experiments were performed for translation initiation site and stop codon recognition, any other recognition task may benefit from our methodology.
Collapse
Affiliation(s)
- Javier Pérez-Rodríguez
- Department of Computing and Numerical Analysis, University of Córdoba, Córdoba, 14071, Campus de Rabanales, Spain.
| | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Córdoba, 14071, Campus de Rabanales, Spain.
| |
Collapse
|
16
|
|
17
|
Gerard MF, Stegmayer G, Milone DH. EvoMS: An evolutionary tool to find de novo metabolic pathways. Biosystems 2015; 134:43-7. [PMID: 26092635 DOI: 10.1016/j.biosystems.2015.04.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Revised: 04/17/2015] [Accepted: 04/21/2015] [Indexed: 11/19/2022]
Abstract
The evolutionary metabolic synthesizer (EvoMS) is an evolutionary tool capable of finding novel metabolic pathways linking several compounds through feasible reactions. It allows system biologists to explore different alternatives for relating specific metabolites, offering the possibility of indicating the initial compound or allowing the algorithm to automatically select it. Searching process can be followed graphically through several plots of the evolutionary process. Metabolic pathways found are displayed in a web browser as directed graphs. In all cases, solutions are networks of reactions that produce linear or branched metabolic pathways which are feasible from the specified set of available compounds. Source code of EvoMS is available at http://sourceforge.net/projects/sourcesinc/files/evoms/. Subsets of reactions are provided, as well as four examples for searching metabolic pathways among several compounds. Available as a web service at http://fich.unl.edu.ar/sinc/web-demo/evoms/.
Collapse
Affiliation(s)
- Matias F Gerard
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL/CONICET, Argentina.
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL/CONICET, Argentina.
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL/CONICET, Argentina.
| |
Collapse
|
18
|
Biochemical systems identification by a random drift particle swarm optimization approach. BMC Bioinformatics 2014; 15 Suppl 6:S1. [PMID: 25078435 PMCID: PMC4158603 DOI: 10.1186/1471-2105-15-s6-s1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Finding an efficient method to solve the parameter estimation problem (inverse
problem) for nonlinear biochemical dynamical systems could help promote the
functional understanding at the system level for signalling pathways. The problem
is stated as a data-driven nonlinear regression problem, which is converted into a
nonlinear programming problem with many nonlinear differential and algebraic
constraints. Due to the typical ill conditioning and multimodality nature of the
problem, it is in general difficult for gradient-based local optimization methods
to obtain satisfactory solutions. To surmount this limitation, many stochastic
optimization methods have been employed to find the global solution of the
problem. Results This paper presents an effective search strategy for a particle swarm optimization
(PSO) algorithm that enhances the ability of the algorithm for estimating the
parameters of complex dynamic biochemical pathways. The proposed algorithm is a
new variant of random drift particle swarm optimization (RDPSO), which is used to
solve the above mentioned inverse problem and compared with other well known
stochastic optimization methods. Two case studies on estimating the parameters of
two nonlinear biochemical dynamic models have been taken as benchmarks, under both
the noise-free and noisy simulation data scenarios. Conclusions The experimental results show that the novel variant of RDPSO algorithm is able to
successfully solve the problem and obtain solutions of better quality than other
global optimization methods used for finding the solution to the inverse problems
in this study.
Collapse
|
19
|
Abstract
Evolutionary Computation (EC) is a branch of Artificial Intelligence which encompasses heuristic optimization methods loosely based on biological evolutionary processes. These methods are efficient in finding optimal or near-optimal solutions in large, complex non-linear search spaces. While evolutionary algorithms (EAs) are comparatively slow in comparison to deterministic or sampling approaches, they are also inherently parallelizable. As technology shifts towards multicore and cloud computing, this overhead becomes less relevant, provided a parallel framework is used. In this chapter the authors discuss how to implement and run parallel evolutionary algorithms in the popular statistical programming language R. R has become the de facto language for statistical programming and it is widely used in biostatistics and bioinformatics due to the availability of thousands of packages to manipulate and analyze data. It is also extremely easy to parallelize routines within R, which makes it a perfect environment for evolutionary algorithms. EC is a large field of research, and many different algorithms have been proposed. While there is no single silver bullet that can handle all classes of problems, an algorithm that is extremely simple, efficient, and with good generalization properties is Differential Evolution (DE). Herein the authors discuss step-by-step how to implement DE in R and how to parallelize it. They then illustrate with a toy genome-wide association study (GWAS) how to identify candidate regions associated with a quantitative trait of interest.
Collapse
Affiliation(s)
- Cedric Gondro
- The Centre for Genetic Analysis and Applications, University of New England, Australia
| | - Paul Kwan
- University of New England, Australia
| |
Collapse
|
20
|
Ray SS, Pal SK. RNA secondary structure prediction using soft computing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:2-17. [PMID: 23702539 DOI: 10.1109/tcbb.2012.159] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.
Collapse
|
21
|
Won KJ, Saunders C, Prügel-Bennett A. Evolving fisher kernels for biological sequence classification. EVOLUTIONARY COMPUTATION 2012; 21:83-105. [PMID: 22181969 DOI: 10.1162/evco_a_00065] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Fisher kernels have been successfully applied to many problems in bioinformatics. However, their success depends on the quality of the generative model upon which they are built. For Fisher kernel techniques to be used on novel problems, a mechanism for creating accurate generative models is required. A novel framework is presented for automatically creating domain-specific generative models that can be used to produce Fisher kernels for support vector machines (SVMs) and other kernel methods. The framework enables the capture of prior knowledge and addresses the issue of domain-specific kernels, both of which are current areas that are lacking in many kernel-based methods. To obtain the generative model, genetic algorithms are used to evolve the structure of hidden Markov models (HMMs). A Fisher kernel is subsequently created from the HMM, and used in conjunction with an SVM, to improve the discriminative power. This paper investigates the effectiveness of the proposed method, named GA-SVM. We show that its performance is comparable if not better than other state of the art methods in classifying secretory protein sequences of malaria. More interestingly, it showed better results than the sequence-similarity-based approach, without the need for additional homologous sequence information in protein enzyme family classification. The experiments clearly demonstrate that the GA-SVM is a novel way to find features with good performance from biological sequences, that does not require extensive tuning of a complex model.
Collapse
Affiliation(s)
- K-J Won
- Department of Genetics, Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania, Translational Research Center, 12-111, 3400 Civic Center Blvd., Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
22
|
García-Pedrajas N, de Haro-García A. Scaling up data mining algorithms: review and taxonomy. PROGRESS IN ARTIFICIAL INTELLIGENCE 2012. [DOI: 10.1007/s13748-011-0004-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
|
24
|
Ole Carstensen N, Dieterich JM, Hartke B. Design of optimally switchable molecules by genetic algorithms. Phys Chem Chem Phys 2011; 13:2903-10. [DOI: 10.1039/c0cp01065k] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
25
|
Sîrbu A, Ruskin HJ, Crane M. Comparison of evolutionary algorithms in gene regulatory network model inference. BMC Bioinformatics 2010; 11:59. [PMID: 20105328 PMCID: PMC2831005 DOI: 10.1186/1471-2105-11-59] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Accepted: 01/27/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
Collapse
Affiliation(s)
- Alina Sîrbu
- Centre for Scientific Computing and Complex Systems Modelling, Dublin City University, Dublin 9, Ireland
| | - Heather J Ruskin
- Centre for Scientific Computing and Complex Systems Modelling, Dublin City University, Dublin 9, Ireland
| | - Martin Crane
- Centre for Scientific Computing and Complex Systems Modelling, Dublin City University, Dublin 9, Ireland
| |
Collapse
|
26
|
|
27
|
|
28
|
|
29
|
Ray SS, Bandyopadhyay S, Pal SK. Gene ordering in partitive clustering using microarray expressions. J Biosci 2007; 32:1019-25. [PMID: 17914244 DOI: 10.1007/s12038-007-0101-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.
Collapse
Affiliation(s)
- Shubhra Sankar Ray
- Center for Soft Computing Research: A National Facility, Indian Statistical Institute, Kolkata 700 108, India.
| | | | | |
Collapse
|